CN109325326A - Data desensitization method, device, equipment and medium when unstructured data accesses - Google Patents
Data desensitization method, device, equipment and medium when unstructured data accesses Download PDFInfo
- Publication number
- CN109325326A CN109325326A CN201810937005.5A CN201810937005A CN109325326A CN 109325326 A CN109325326 A CN 109325326A CN 201810937005 A CN201810937005 A CN 201810937005A CN 109325326 A CN109325326 A CN 109325326A
- Authority
- CN
- China
- Prior art keywords
- data
- text
- desensitization
- desensitize
- sensitive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
Applicable data desensitization of the present invention and data sharing technology field, provide data desensitization method when a kind of access of unstructured data, device, equipment and medium, this method comprises: requesting after the approval the declaring for data access authority that user sends, obtain corresponding access privilege, user is received again to the inquiry request of unstructured data in database, inquiry request is audited according to access privilege, after the approval, according to inquiry request, data query operation is carried out in the database for possessing access privilege, obtain the corresponding data that do not desensitize, it is identified with the engine that desensitizes by sensitive data to obtained do not desensitize data progress sensitive data identification and desensitization process, obtain corresponding desensitization data, will desensitize, data return to user, realize access of the user to unstructured data in database, from And it is effectively prevented the leakage of sensitive information in unstructured data, improve the degree of protection to sensitive information in unstructured data.
Description
Technical field
When being accessed the invention belongs to data desensitization with data sharing technology field more particularly to a kind of unstructured data
Data desensitization method, device, equipment and medium.
Background technique
With the arrival of big data era, the big data shared platform of Information transmission and machine learning has become daily
Important information exchange platform in office, Communication and cooperation interaction, however, carrying out information transmission and data by network
In shared procedure, the leakage of data-privacy and the secondary profiteering of key data sets are caused, therefore, how in protection data-privacy
Under the premise of to third-party application public data become big data shared platform critical issue.
Solve the problems, such as that this mode is generally divided into two kinds at present:
First is that the anti-data-leakage system of building big data shared platform, for example, being in patent publication No.
A kind of cloud data safe guard method is disclosed in the application for a patent for invention file of CN104767745A, is in patent publication No.
The data model and its operation system of data-oriented opening and shares are disclosed in the application for a patent for invention file of CN107633181A,
And data prevention method, device are disclosed in the application for a patent for invention file that patent publication No. is CN103209174B and is
System, these existing data sharing platforms or guard system are in terms of protecting sensitive information, only according to user right etc.
Grade and role obtain corresponding data, and data protection mode is inflexible, and these systems do not provide specific sensitive content
Detection and identification method;
Second is that sensitive information detection and data desensitization system, for example, in the invention that patent publication No. is CN102970298B
A kind of method for preventing from divulging a secret, equipment and system are disclosed in patent application document, are CN102467628A in patent publication No.
Application for a patent for invention file in disclose the data guard method based on browser kernel Interception Technology, be in patent publication No.
The leakage-preventing method of structural data assets based on classification is disclosed in the application for a patent for invention file of CN104809405A,
The sensitive information detection that above-mentioned patent is mentioned is directed to file-level in sensitive information detection with data desensitization technical solution,
Can only judge whether file includes sensitive information, can not detect the specific sensitive content for including in file, and be based on content
Detection be also matched with canonical based on, can only be directed to text data, picture and text document (Word file, PDF cannot be detected
File etc.) in sensitive information.
Summary of the invention
The purpose of the present invention is to provide a kind of unstructured data access when data desensitization method, device, equipment and
Storage medium, it is intended to solve that a kind of effective unstructured data desensitization method can not be provided due to the prior art, lead to non-knot
In structure data the problem of sensitive information leakage.
On the one hand, data desensitization method when being accessed the present invention provides a kind of unstructured data, the method includes
Following step:
Receive the data access authority that user sends declares request, request is declared according to described, to the use of the user
Family access authority is audited, to obtain corresponding access privilege;
Request is declared after the approval when described, receives the user to unstructured data in the database pre-established
Inquiry request, and the inquiry request is audited according to the access privilege;
When the inquiry request after the approval, according to the inquiry request, carry out data query in the database
Operation, obtains the corresponding data that do not desensitize;
Sensitive number is carried out to the data that do not desensitize described in obtaining by the identification of the sensitive data that pre-establishes and desensitization engine
According to identification and desensitization process, corresponding desensitization data are obtained, the obtained desensitization data are returned into the user, realize institute
State access of the user to unstructured data in the database.
On the other hand, data desensitization device when being accessed the present invention provides a kind of unstructured data, described device packet
It includes:
Permission declares audit unit, and the data access authority for receiving user's transmission declares request, according to the Shen
It submits a report asking for and asks, the access privilege of the user is audited, to obtain corresponding access privilege;
Access request audits unit, for declaring request after the approval when described, receives the user to pre-establishing
Database in unstructured data inquiry request, and the inquiry request is examined according to the access privilege
Core;
Data query unit, for working as the inquiry request after the approval, according to the inquiry request, in the data
Data query operation is carried out in library, obtains the corresponding data that do not desensitize;And
Data desensitization unit, for by the identification of the sensitive data that pre-establishes with the engine that desensitizes to not taking off described in obtaining
Quick data carry out sensitive data identification and desensitization process, obtain corresponding desensitization data, the obtained desensitization data are returned
Back to the user, access of the user to unstructured data in the database is realized.
On the other hand, the present invention also provides a kind of calculating equipment, including memory, processor and it is stored in described deposit
In reservoir and the computer program that can run on the processor, the processor are realized such as when executing the computer program
Step described in data desensitization method when above-mentioned unstructured data accesses.
On the other hand, the present invention also provides a kind of computer readable storage medium, the computer readable storage mediums
It is stored with computer program, number when realized when the computer program is executed by processor such as the access of above-mentioned unstructured data
According to step described in desensitization method.
The request of declaring for the data access authority that the present invention sends user is audited, and after the approval, is corresponded to
Access privilege, and receive user to the inquiry request of unstructured data in database, according to access privilege pair
Inquiry request is audited, and after the approval, according to inquiry request, data is carried out in the database for possessing access privilege
Inquiry operation obtains the corresponding data that do not desensitize, by sensitive data identify with desensitize engine to obtain do not desensitize data into
The identification of row sensitive data and desensitization process, obtain corresponding desensitization data, obtained desensitization data are returned to user, realize
Access of the user to unstructured data in database, so that it is effectively prevented the leakage of sensitive information in unstructured data,
Improve the degree of protection to sensitive information in unstructured data.
Detailed description of the invention
Fig. 1 is the implementation process of the data desensitization method when unstructured data that the embodiment of the present invention one provides accesses
Figure;
Pass through sensitive number in data desensitization method when Fig. 2 is unstructured data provided by Embodiment 2 of the present invention access
With desensitization engine the data that do not desensitize are carried out with the implementation flow chart of sensitive data identification and desensitization process according to identification;
Fig. 3 is the carrying out in identification and desensitization the data that do not desensitize in image data of the offer of the embodiment of the present invention three
The implementation flow chart of detection, identification and the desensitization of sensitive information;
Fig. 4 is the structural representation of the data desensitization device when unstructured data that the embodiment of the present invention four provides accesses
Figure;
Fig. 5 is that the preferred structure of the data desensitization device when unstructured data that the embodiment of the present invention four provides accesses shows
It is intended to;
Image data is de- in data desensitization device when Fig. 6 is the unstructured data access of the offer of the embodiment of the present invention five
The structural schematic diagram of unit;And
Fig. 7 is the structural schematic diagram for the calculating equipment that the embodiment of the present invention six provides.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Specific implementation of the invention is described in detail below in conjunction with specific embodiment:
Embodiment one:
The realization stream of data desensitization method when the unstructured data that Fig. 1 shows the offer of the embodiment of the present invention one accesses
Journey, for ease of description, only parts related to embodiments of the present invention are shown, and details are as follows:
In step s101, receive the data access authority that user sends and declare request, request declared according to this, to
The access privilege at family is audited, to obtain corresponding access privilege.
The embodiment of the present invention is suitable for data sharing and interaction platform, system or equipment, for example, personal computer, service
Device etc..User declares module declaration data access authority to preset third-party application permission, and third-party application permission declares mould
Root tuber declares request according to this that user submits, and whether audit user has data query permission and the data access model of user
It encloses, when after the approval, obtaining corresponding access privilege, which is that user accesses number in correspondence database
According to permission, and access privilege is submitted to the third-party application authority set that pre-establishes and is saved, pass back through information, if
Audit does not pass through, then returns and do not pass through information.
In step s 102, when declaring request after the approval, user is received to non-structural in the database pre-established
Change the inquiry request of data, and inquiry request is audited according to access privilege.
In embodiments of the present invention, when declaring request after the approval, user writes the database for meeting and pre-establishing and wants
The data query script or program asked, to be visited according to the data query script or program unstructured data in database
It asks, inquire, before the data in official visit database, the inquiry request submitted according to access privilege to user (is counted
According to query script or program) it is audited, to determine whether the user possesses the permission of access database, to guarantee user only
The database for possessing corresponding access privilege can be accessed, wherein unstructured data, which refers to, does not have predefined data model
Or the data information of predefined mode tissue is not pressed, including text document (PDF document, Word document, WPS document, LaTeX text
Shelves etc.), image data, audio-visual data, medical data etc..
In step s 103, when inquiry request after the approval, according to inquiry request, carry out data query in the database
Operation, obtains the corresponding data that do not desensitize.
In embodiments of the present invention, after determining that the user possesses the access authority of the database, according to the use received
Family carries out data query operation to the inquiry request of unstructured data in the database in the database, obtain it is corresponding not
Desensitize data.
In step S104, by the identification of the sensitive data that pre-establishes and desensitization engine to obtain do not desensitize data into
The identification of row sensitive data and desensitization process, obtain corresponding desensitization data, obtained desensitization data are returned to user, realize
Access of the user to unstructured data in database.
In embodiments of the present invention, it is identified with desensitization engine by the sensitive data pre-established to the obtained number that do not desensitize
According to sensitive data identification and desensitization process is carried out, corresponding desensitization data are obtained, and obtained desensitization data are returned into user,
To realize access of the user to unstructured data in database, wherein sensitive data include personal information (for example, phone, live
Location, ID card No. etc.), personal account number's password (including mailbox account number cipher, bank logon password etc.), personal asset etc..
In embodiments of the present invention, the request of declaring of the data access authority sent to user is audited, and audit passes through
Afterwards, corresponding access privilege is obtained, and receives user to the inquiry request of unstructured data in database, according to user
Access authority audits inquiry request, after the approval, according to inquiry request, in the database for possessing access privilege
Middle carry out data query operation, obtains the corresponding data that do not desensitize, is identified with the engine that desensitizes by sensitive data to obtaining not
The data that desensitize carry out sensitive data identification and desensitization process, obtain corresponding desensitization data, and obtained desensitization data are returned
To user, access of the user to unstructured data in database is realized, to be effectively prevented in unstructured data sensitive
The leakage of information improves the degree of protection to sensitive information in unstructured data.
Embodiment two:
Fig. 2 shows step S104 in embodiment one provided by Embodiment 2 of the present invention to be identified and be desensitized by sensitive data
Engine carries out the implementation process of sensitive data identification and desensitization process to the data that do not desensitize, for ease of description, illustrate only with
The relevant part of the embodiment of the present invention, details are as follows:
In step s 201, it is identified by sensitive data and data type analysis is carried out to the data that do not desensitize with desensitization engine,
Data are not desensitized as image data or text document data using determination.
In embodiments of the present invention, the data that do not desensitize are carried out with desensitization engine by the sensitive data identification pre-established
Data type analysis determines that the data that do not desensitize are image data or text document data according to the result of analysis.
It in step S202, is analyzed according to data type of the step S201 to the data that do not desensitize, determines the number that do not desensitize
According to for image data or text document data, step S203 is executed when the data that do not desensitize analyzed are image data, when
The data that do not desensitize analyzed execute step S204 when being text document data.
In step S203, by trained pictorial information identification in advance with desensitization model to the sensitivity in image data
Information is identified and is desensitized, to complete the desensitization operation to the data that do not desensitize.
In embodiments of the present invention, pictorial information identification and desensitization model include text detection model and Text region mould
Type is identified and is desensitized to the sensitive information in image data by text detection model and the completion of Text region model.
Before the model that identifies and desensitize by pictorial information is identified and is desensitized to the sensitive information in image data,
Preferably, (Long Short-Term Memory, LSTM) network pair is remembered in conjunction with VGG-16 convolutional neural networks and shot and long term
Text detection model is trained, in conjunction with convolutional neural networks (Convolutional Neural Network, CNN) algorithm and
LSTM network is trained Text region model, to improve the recognition accuracy to text information in image data.
In embodiments of the present invention, it is preferable that text detection model is trained by following step realization:
(1.1) pass through the visual geometric group convolutional neural networks (Visual with 13 convolutional layers, 3 full articulamentums
Geometry Group-16, VGG-16) picture feature extraction is carried out to pre-generated training image data collection, it is corresponded to
Characteristic pattern;
(1.2) suggest network (Region Proposal Network, RPN) in VGG-16 convolutional Neural net by region
Text suggestion areas (anchors) is generated on the corresponding image data of the last layer characteristic pattern that network obtains;
(1.3) by LSTM network according to obtained characteristic pattern learning text spatial context information, then pass through one
Full articulamentum merges the feature learnt;
(1.4) each text suggestion areas is predicted according to fused feature, if text suggestion areas is pre-
It is text filed for surveying, then returns to the corresponding position of text suggestion areas, otherwise abandons text suggestion areas.
In embodiments of the present invention, the training to text detection model is realized by step (1.1)~(1.4), to mention
The high subsequent accuracy to positioning text filed in image data.
It is further preferred that during to text detection model training, the parameter packet of text detection model training recurrence
Text box classification numerical value, the transverse and longitudinal coordinate of text box central point and the height of text box and width are included, thus by disposably returning
Return text filed size position, realize text filed whole prediction, reduces accidentally frame and increase text at the same time
The recurrence of region horizontal position has furthered and has belonged to the distance between the text box of one text row, avoided scrappy text box
Occur.
In embodiments of the present invention, it is another preferably, by following step realization Text region model is trained:
(2.1) by the data self-generating algorithm based on background collection, font set and corpus to pre- Mr. in step (1.1)
At training image data collection expanded, generate training picture EDS extended data set;
(2.2) picture feature extraction is carried out to training picture EDS extended data set by CNN algorithm, obtains corresponding feature
Figure;
(2.3) text context letter is extracted by LSTM network on the last layer characteristic pattern obtained in step (2.2)
Breath;
(2.4) the text context information input connectionism time sorter for extracting LSTM network
Be labeled in (Connectionist temporal classification, CTC) model, sequence alignment, and will mark and
Information after alignment reversely passes to Text region model, is trained to the model parameter in Text region model.
In embodiments of the present invention, the training to Text region model is realized by step (2.1)~(2.4), thus logical
It crosses expansion training dataset and solves the problems, such as training data wretched insufficiency, improve subsequent to text filed middle text information knowledge
Other accuracy.
In step S204, the content of text document data is parsed by preset text document resolver, and
The content parsed is subjected to classification output according to text data or image data.
In embodiments of the present invention, the content of text document data is solved by preset text document resolver
Analysis extracts text further according to parsing result to determine in text document data with the presence or absence of image content or content of text
Image content and content of text in this document data obtain corresponding image data and text data, and the picture that will be obtained
Data and text data carry out classification output, wherein text document resolver include document resolver, document object memory with
And document content resolver.
Preferably, the content of text document data is parsed by following step realization:
(1) text document data is input to pre-set document resolver, extracts the data in text document data
Object;
(2) by data object storage into pre-set document object memory, and will be literary by document object memory
Shelves page object is input in pre-set document content resolver;
(3) document content resolver extracts the content in data object, and by the content extracted according to text data and
Image data carries out classification output.
In embodiments of the present invention, by step (the 1)~parsing of (3) to text document data, to improve to text
The parsing accuracy of document data.
In step S205, determines that output is the text data or image data in text document data, work as output
Step S206 is executed when being text data, go to step S203 when output is image data.
In step S206, data desensitization is carried out to text data according to access privilege.
In embodiments of the present invention, when output is the text data in text document data, according to user's access right
The sensitive information in this article notebook data is compared in limit, finds the sensitive information for not having access authority in this article notebook data,
Palindrome notebook data text is filled out after encrypting to the sensitive information found, to complete data desensitization.
In embodiments of the present invention, it is identified with the engine that desensitizes by sensitive data to the figure not desensitized in unstructured data
Sheet data and text document data carry out sensitive information detection, identification and desensitization respectively, thus improve to unstructured data into
The specific aim of row data desensitization, and then improve the degree of protection to sensitive information.
Embodiment three:
Fig. 3 shows step S203 in the embodiment two of the offer of the embodiment of the present invention three and passes through trained picture letter in advance
The implementation process that breath identification is identified and desensitized to the sensitive information in image data with desensitization model, for ease of description,
Only parts related to embodiments of the present invention are shown, and details are as follows:
In step S301, text filed positioning is carried out to image data by text detection model.
In embodiments of the present invention, when carrying out text filed positioning to image data by text detection model, preferably
It is realized by following step to text filed positioning on ground:
(1) it is extracted using the conventional part for the VGG-16 convolutional neural networks for having trained parameter in text detection model
Picture feature in image data obtains corresponding characteristic pattern;
(2) text is generated on the corresponding image data of the last layer characteristic pattern obtained in step (1) by RPN network
Suggestion areas;
(3) the characteristic pattern learning text spatial context information obtained by LSTM network according to step (1), then passes through
One full articulamentum merges the feature learnt;
(4) each text suggestion areas that step (2) obtains is predicted according to fused feature, if the text is built
View region is predicted to be text filed, then returns to the corresponding position of text suggestion areas, otherwise abandons the text and builds
Discuss region;
(5) contraposition puts back into the text suggestion areas after returning and carries out position duplicate removal, then will belong to the text of one text row
This suggestion areas merges, and obtains text box.
In embodiments of the present invention, the recurrence of text filed horizontal position is increased by step (1)~(5), to draw
It is close to belong to the distance between one text row text box, avoid the appearance of scrappy text box.
In step s 302, by Text region model to navigate to it is text filed in text information identify.
In embodiments of the present invention, by Text region model to navigate to it is text filed in text information carry out
When identification, it is preferable that realize the identification to text information by following step:
(1) text box that sub-step (5) obtains in step S301 is input in Text region model;
(2) by the way that trained CNN algorithm carries out picture spy to the text box of input in Text region model training
Sign is extracted, and corresponding characteristic pattern is obtained;
(3) text context feature is extracted on the last layer characteristic pattern obtained in step (2) using LSTM network, obtain
To corresponding text information.
In embodiments of the present invention, the identification to text filed middle text information is realized by step (1)~(3), thus
Improve the resolution to text filed middle text information.
In step S303, according to access privilege, sensitive information matching is carried out to the text information identified, and right
The sensitive information matched is encrypted, to complete the desensitization operation to the data that do not desensitize.
In embodiments of the present invention, the text information identified is matched according to access privilege, finds and does not have
The sensitive information of standby access authority, and the sensitive information found is encrypted.
Preferably, it is realized by following step and sensitive information matching is carried out to text information, and to the sensitivity matched
Information is encrypted:
(1) by the access right of the user in sub-step (3) obtains in step S302 text information and third party's authority set
Limit compares, and finds the sensitive information for not having access authority;
(2) according to the sensitive information, find the sensitive information it is corresponding it is text filed in character area;
(3) the character area image value that should be encrypted is set to white, and the character area is encrypted, returned and add
Picture after close.
In embodiments of the present invention, it is realized by step (1)~(3) and the sensitive information found is encrypted, thus
The safety encrypted to sensitive information is improved, and then improves the safety of data access.
In embodiments of the present invention, by preparatory trained text detection model and Text region model to image data
In text information identified, and data desensitization is carried out to the text information that recognizes according to access privilege, to complete
Desensitization operation to the data that do not desensitize, to improve to the recognition accuracy of text information in image data and in image data
The degree of safety of sensitive information processing.
Example IV:
The structure of data desensitization device when the unstructured data that Fig. 4 shows the offer of the embodiment of the present invention four accesses,
For ease of description, only parts related to embodiments of the present invention are shown, including:
Permission declares audit unit 41, and the data access authority for receiving user's transmission declares request, according to the Shen
It submits a report asking for and asks, the access privilege of user is audited, to obtain corresponding access privilege;
Access request audits unit 42, for receiving user to the data pre-established when declaring request after the approval
The inquiry request of unstructured data in library, and inquiry request is audited according to access privilege;
Data query unit 43, according to inquiry request, is counted in the database for working as inquiry request after the approval
According to inquiry operation, the corresponding data that do not desensitize are obtained;
Data desensitization unit 44 does not desensitize with desensitization engine to what is obtained for the sensitive data identification by pre-establishing
Data carry out sensitive data identification and desensitization process, obtain corresponding desensitization data, obtained desensitization data are returned to use
Access of the user to unstructured data in database is realized at family.
As shown in Figure 5, it is preferable that data desensitization unit 44 includes:
Data type analysis unit 441, for being counted by sensitive data identification and desensitization engine to the data that do not desensitize
According to type analysis, data are not desensitized as image data or text document data using determination;
Image data desensitization unit 442, for when analyze do not desensitize data be image data when, by training in advance
Pictorial information identification with desensitization model the sensitive information in image data is identified and is desensitized, with completion to the number that do not desensitize
According to desensitization operation, pictorial information identification with desensitization model include text detection model and Text region model;And
Document data desensitization unit 443, for when analyze do not desensitize data be text document data when, by preset
Text document resolver parses the content of text document data, and by the content parsed according to text data or figure
Sheet data carries out classification output, when output is text data, carries out data desensitization to text data according to access privilege,
When output is image data, triggering image data desensitization unit 442 execute by trained pictorial information identification in advance with
Desensitization model is identified and is desensitized to the sensitive information of image data.
It is another preferably, unstructured data provided in an embodiment of the present invention access when data desensitize device further include:
Model training unit, for combining VGG-16 convolutional neural networks and LSTM network to instruct text detection model
Practice, Text region model is trained in conjunction with CNN network algorithm and LSTM network, wherein text detection model training process
The parameter of middle recurrence includes text box classification numerical value, the transverse and longitudinal coordinate of text box central point and the height of text box and width.
In embodiments of the present invention, each unit of data desensitization device when unstructured data accesses can be by corresponding hard
Part or software unit realize that each unit can be independent soft and hardware unit, also can integrate as a soft and hardware unit,
This is not to limit the present invention.The specific embodiment of each unit can refer to the correspondence step in preceding method embodiment, herein
It repeats no more.
Embodiment five:
Fig. 6 shows the structure of image data desensitization unit 442 in the example IV of the offer of the embodiment of the present invention five, in order to
Convenient for explanation, only parts related to embodiments of the present invention are shown, including:
Text filed positioning unit 61, for carrying out text filed positioning to image data by text detection model;
Text information recognition unit 62, for by Text region model to navigate to it is text filed in text information
It is identified;And
Sensitive information encryption unit 63, for carrying out sensitive letter to the text information identified according to access privilege
Breath matching, and the sensitive information matched is encrypted, to complete the desensitization operation to the data that do not desensitize.
In embodiments of the present invention, image data desensitization unit knot in data desensitization device when unstructured data accesses
Each unit in structure can be realized that each unit can be independent soft and hardware unit by corresponding hardware or software unit, can also be with
It is integrated into a soft and hardware unit, herein not to limit the present invention.The specific embodiment of each unit can refer to aforementioned implementation
Correspondence step description in example three, details are not described herein.
Embodiment six:
Fig. 7 shows the structure of the calculating equipment of the offer of the embodiment of the present invention six, for ease of description, illustrates only and this
The relevant part of inventive embodiments.
The calculating equipment 7 of the embodiment of the present invention includes processor 70, memory 71 and is stored in memory 71 and can
The computer program 72 run on processor 70.The processor 70 realizes above-mentioned unstructured number when executing computer program 72
The step in data desensitization method embodiment when according to access, such as step S101 to S104 shown in FIG. 1.Alternatively, processor
The function of each unit in above-mentioned each Installation practice, such as unit 41 to 44 shown in Fig. 4 are realized when 70 execution computer program 72
Function.
In embodiments of the present invention, the request of declaring of the data access authority sent to user is audited, and audit passes through
Afterwards, corresponding access privilege is obtained, and receives user to the inquiry request of unstructured data in database, according to user
Access authority audits inquiry request, after the approval, according to inquiry request, in the database for possessing access privilege
Middle carry out data query operation, obtains the corresponding data that do not desensitize, is identified with the engine that desensitizes by sensitive data to obtaining not
The data that desensitize carry out sensitive data identification and desensitization process, obtain corresponding desensitization data, and obtained desensitization data are returned
To user, access of the user to unstructured data in database is realized, to be effectively prevented in unstructured data sensitive
The leakage of information improves the degree of protection to sensitive information in unstructured data.
The calculating equipment of the embodiment of the present invention can be personal computer, server.Processor 70 is held in the calculating equipment 7
The step of realizing when realizing the data desensitization method when unstructured data access when row computer program 72 can refer to aforementioned side
The description of method embodiment, details are not described herein.
Embodiment seven:
In embodiments of the present invention, a kind of computer readable storage medium is provided, which deposits
Computer program is contained, which realizes the data desensitization when access of above-mentioned unstructured data when being executed by processor
Step in embodiment of the method, for example, step S101 to S104 shown in FIG. 1.Alternatively, the computer program is executed by processor
The function of each unit in the above-mentioned each Installation practice of Shi Shixian, such as the function of unit 41 to 44 shown in Fig. 4.
In embodiments of the present invention, the request of declaring of the data access authority sent to user is audited, and audit passes through
Afterwards, corresponding access privilege is obtained, and receives user to the inquiry request of unstructured data in database, according to user
Access authority audits inquiry request, after the approval, according to inquiry request, in the database for possessing access privilege
Middle carry out data query operation, obtains the corresponding data that do not desensitize, is identified with the engine that desensitizes by sensitive data to obtaining not
The data that desensitize carry out sensitive data identification and desensitization process, obtain corresponding desensitization data, and obtained desensitization data are returned
To user, access of the user to unstructured data in database is realized, to be effectively prevented in unstructured data sensitive
The leakage of information improves the degree of protection to sensitive information in unstructured data.
The computer readable storage medium of the embodiment of the present invention may include can carry computer program code any
Entity or device, recording medium, for example, the memories such as ROM/RAM, disk, CD, flash memory.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (10)
- Data desensitization method when 1. a kind of unstructured data accesses, which is characterized in that the method includes the following steps:Receive the data access authority that user sends declares request, declares request according to described, visits the user of the user Ask that permission is audited, to obtain corresponding access privilege;Request is declared after the approval when described, receives the user and unstructured data in the database pre-established is looked into Request is ask, and the inquiry request is audited according to the access privilege;When the inquiry request after the approval, according to the inquiry request, carry out data query operation in the database, Obtain the corresponding data that do not desensitize;Sensitive data knowledge is carried out to the data that do not desensitize described in obtaining by the identification of the sensitive data that pre-establishes and desensitization engine Not and desensitization process, corresponding desensitization data are obtained, the obtained desensitization data are returned into the user, described in realization Access of the user to unstructured data in the database.
- 2. the method as described in claim 1, which is characterized in that pass through sensitive data identification and the desensitization engine pair pre-established The step of obtained data that do not desensitize carry out sensitive data identification and desensitization process, comprising:It is identified with the engine that desensitizes to the data progress data type analysis that do not desensitize, described in determination by the sensitive data The data that do not desensitize are image data or text document data;When analyzing the data that do not desensitize is image data, pass through trained pictorial information identification in advance and desensitization model Sensitive information in the image data is identified and desensitized, to complete the desensitization operation to the data that do not desensitize, institute It includes text detection model and Text region model that pictorial information identification, which is stated, with desensitization model;When analyzing the data that do not desensitize is text document data, by preset text document resolver to the text The content of document data is parsed, and the content parsed is carried out classification output according to text data or image data, When output is text data, data desensitization is carried out to the text data according to the access privilege, when output is figure When sheet data, jump to through sensitive information of the trained pictorial information identification with desensitization model to the image data in advance The step of being identified and being desensitized.
- 3. method according to claim 2, which is characterized in that pass through trained pictorial information identification in advance and desensitization model The step of sensitive information in the image data is identified and is desensitized, comprising:Text filed positioning is carried out to the image data by the text detection model;By the Text region model to it is described navigate to it is text filed in text information identify;According to the access privilege, sensitive information matching carried out to the text information identified, and to matching The sensitive information is encrypted, to complete the desensitization operation to the data that do not desensitize.
- 4. method according to claim 2, which is characterized in that pass through trained pictorial information identification in advance and desensitization model Before the step of sensitive information in the image data is identified and is desensitized, the method also includes:The text detection model is trained in conjunction with VGG-16 convolutional neural networks and LSTM network, is calculated in conjunction with CNN network Method and LSTM network are trained the Text region model, wherein return during the text detection model training Parameter includes text box classification numerical value, the transverse and longitudinal coordinate of text box central point and the height of text box and width.
- The device 5. data when a kind of unstructured data accesses desensitize, which is characterized in that described device includes:Permission declares audit unit, and the data access authority for receiving user's transmission declares request, is asked according to described declare It asks, the access privilege of the user is audited, to obtain corresponding access privilege;Access request audits unit, for declaring request after the approval when described, receives the user to the number pre-established The inquiry request is audited according to the inquiry request of unstructured data in library, and according to the access privilege;Data query unit, for working as the inquiry request after the approval, according to the inquiry request, in the database Data query operation is carried out, the corresponding data that do not desensitize are obtained;AndData desensitization unit, for by the identification of the sensitive data that pre-establishes and desensitization engine to the number that do not desensitize described in obtaining According to sensitive data identification and desensitization process is carried out, corresponding desensitization data are obtained, the obtained desensitization data are returned to The user realizes access of the user to unstructured data in the database.
- 6. device as claimed in claim 5, which is characterized in that the data desensitization unit includes:Data type analysis unit, for being counted by sensitive data identification and desensitization engine to the data that do not desensitize According to type analysis, data are not desensitized as image data or text document data so that determination is described;Image data desensitization unit, for when analyzing the data that do not desensitize is image data, by trained in advance Pictorial information identification with desensitization model the sensitive information in the image data is identified and is desensitized, with completion to it is described not The desensitization operation for the data that desensitize, the pictorial information identification and desensitization model include text detection model and Text region model; AndDocument data desensitization unit, for passing through preset text when analyzing the data that do not desensitize is text document data This document resolver parses the content of the text document data, and by the content parsed according to text data or Image data carries out classification output, when output be text data when, according to the access privilege to the text data into The desensitization of row data triggers the image data desensitization unit and executes through preparatory trained figure when output is image data The identification of piece information is identified and is desensitized to the sensitive information of the image data with desensitization model.
- 7. device as claimed in claim 6, which is characterized in that the image data desensitization unit includes:Text filed positioning unit, for carrying out text filed positioning to the image data by the text detection model;Text information recognition unit, for by the Text region model to it is described navigate to it is text filed in text believe Breath is identified;AndSensitive information encryption unit, for being carried out to the text information identified sensitive according to the access privilege Information matches, and the sensitive information matched is encrypted, to complete the desensitization operation to the data that do not desensitize.
- 8. device as claimed in claim 6, which is characterized in that described device further include:Model training unit, for being instructed in conjunction with VGG-16 convolutional neural networks and LSTM network to the text detection model Practice, the Text region model is trained in conjunction with CNN network algorithm and LSTM network, wherein the text detection model The parameter returned in training process include text box classification numerical value, the transverse and longitudinal coordinate of text box central point and the height of text box and Width.
- 9. a kind of calculating equipment, including memory, processor and storage are in the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as Claims 1-4 when executing the computer program The step of any one the method.
- 10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In when the computer program is executed by processor the step of any one of such as Claims 1-4 of realization the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810937005.5A CN109325326B (en) | 2018-08-16 | 2018-08-16 | Data desensitization method, device, equipment and medium during unstructured data access |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810937005.5A CN109325326B (en) | 2018-08-16 | 2018-08-16 | Data desensitization method, device, equipment and medium during unstructured data access |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109325326A true CN109325326A (en) | 2019-02-12 |
CN109325326B CN109325326B (en) | 2022-09-30 |
Family
ID=65263730
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810937005.5A Active CN109325326B (en) | 2018-08-16 | 2018-08-16 | Data desensitization method, device, equipment and medium during unstructured data access |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109325326B (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109981619A (en) * | 2019-03-13 | 2019-07-05 | 泰康保险集团股份有限公司 | Data capture method, device, medium and electronic equipment |
CN110188565A (en) * | 2019-04-17 | 2019-08-30 | 平安科技(深圳)有限公司 | Data desensitization method, device, computer equipment and storage medium |
CN110232056A (en) * | 2019-05-21 | 2019-09-13 | 苏宁云计算有限公司 | A kind of the blood relationship analytic method and its tool of structured query language |
CN110245505A (en) * | 2019-05-20 | 2019-09-17 | 中国平安人寿保险股份有限公司 | Tables of data access method, device, computer equipment and storage medium |
CN110413643A (en) * | 2019-06-17 | 2019-11-05 | 中国平安财产保险股份有限公司 | Data query method and apparatus |
CN110851864A (en) * | 2019-11-08 | 2020-02-28 | 国网浙江省电力有限公司信息通信分公司 | Sensitive data automatic identification and processing method and system |
CN111191275A (en) * | 2019-11-28 | 2020-05-22 | 深圳云安宝科技有限公司 | Sensitive data identification method, system and device |
CN111222125A (en) * | 2019-12-17 | 2020-06-02 | 中国电力科学研究院有限公司 | Client and server safety protection system of enterprise browser |
CN111428273A (en) * | 2020-04-23 | 2020-07-17 | 北京中安星云软件技术有限公司 | Dynamic desensitization method and device based on machine learning |
CN111984983A (en) * | 2020-08-28 | 2020-11-24 | 山东健康医疗大数据有限公司 | User privacy encryption method |
CN112069203A (en) * | 2020-09-22 | 2020-12-11 | 北京百家科技集团有限公司 | Data query method and device |
CN112311879A (en) * | 2020-10-30 | 2021-02-02 | 平安信托有限责任公司 | Method and device for limiting network disk uploading, computer equipment and storage medium |
CN112380566A (en) * | 2020-11-20 | 2021-02-19 | 北京百度网讯科技有限公司 | Method, apparatus, electronic device, and medium for desensitizing document image |
CN112417406A (en) * | 2020-12-04 | 2021-02-26 | 中国电子信息产业集团有限公司第六研究所 | Data desensitization method and device, readable storage medium and electronic equipment |
CN112487458A (en) * | 2020-12-09 | 2021-03-12 | 浪潮云信息技术股份公司 | Implementation method and system using government affair open sensitive data |
CN112632597A (en) * | 2020-12-08 | 2021-04-09 | 国家计算机网络与信息安全管理中心 | Data desensitization method and device readable storage medium |
CN112714128A (en) * | 2020-12-29 | 2021-04-27 | 北京安华金和科技有限公司 | Data desensitization processing method and device |
CN113762237A (en) * | 2021-04-26 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Text image processing method, device and equipment and storage medium |
CN114244583A (en) * | 2021-11-30 | 2022-03-25 | 珠海大横琴科技发展有限公司 | Data processing method and device based on mobile client |
CN114499901A (en) * | 2020-10-26 | 2022-05-13 | 中国移动通信有限公司研究院 | Information processing method and device, server, terminal and data platform |
CN114726605A (en) * | 2022-03-30 | 2022-07-08 | 医渡云(北京)技术有限公司 | Sensitive data filtering method, device and system and computer equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809405A (en) * | 2015-04-24 | 2015-07-29 | 广东电网有限责任公司信息中心 | Structural data asset leakage prevention method based on hierarchical classification |
CN106529329A (en) * | 2016-10-11 | 2017-03-22 | 中国电子科技网络信息安全有限公司 | Desensitization system and desensitization method used for big data |
CN107315972A (en) * | 2017-06-01 | 2017-11-03 | 北京明朝万达科技股份有限公司 | A kind of dynamic desensitization method of big data unstructured document and system |
CN108153468A (en) * | 2017-12-14 | 2018-06-12 | 阿里巴巴集团控股有限公司 | Image processing method and device |
CN108197486A (en) * | 2017-12-20 | 2018-06-22 | 北京天融信网络安全技术有限公司 | Big data desensitization method, system, computer-readable medium and equipment |
CN108259491A (en) * | 2018-01-15 | 2018-07-06 | 北京炼石网络技术有限公司 | For the method, apparatus and its system of unstructured data safe handling |
-
2018
- 2018-08-16 CN CN201810937005.5A patent/CN109325326B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809405A (en) * | 2015-04-24 | 2015-07-29 | 广东电网有限责任公司信息中心 | Structural data asset leakage prevention method based on hierarchical classification |
CN106529329A (en) * | 2016-10-11 | 2017-03-22 | 中国电子科技网络信息安全有限公司 | Desensitization system and desensitization method used for big data |
CN107315972A (en) * | 2017-06-01 | 2017-11-03 | 北京明朝万达科技股份有限公司 | A kind of dynamic desensitization method of big data unstructured document and system |
CN108153468A (en) * | 2017-12-14 | 2018-06-12 | 阿里巴巴集团控股有限公司 | Image processing method and device |
CN108197486A (en) * | 2017-12-20 | 2018-06-22 | 北京天融信网络安全技术有限公司 | Big data desensitization method, system, computer-readable medium and equipment |
CN108259491A (en) * | 2018-01-15 | 2018-07-06 | 北京炼石网络技术有限公司 | For the method, apparatus and its system of unstructured data safe handling |
Non-Patent Citations (1)
Title |
---|
彭海: "基于异构计算的图片敏感文字检测系统", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109981619A (en) * | 2019-03-13 | 2019-07-05 | 泰康保险集团股份有限公司 | Data capture method, device, medium and electronic equipment |
CN110188565A (en) * | 2019-04-17 | 2019-08-30 | 平安科技(深圳)有限公司 | Data desensitization method, device, computer equipment and storage medium |
CN110245505A (en) * | 2019-05-20 | 2019-09-17 | 中国平安人寿保险股份有限公司 | Tables of data access method, device, computer equipment and storage medium |
CN110232056A (en) * | 2019-05-21 | 2019-09-13 | 苏宁云计算有限公司 | A kind of the blood relationship analytic method and its tool of structured query language |
CN110413643A (en) * | 2019-06-17 | 2019-11-05 | 中国平安财产保险股份有限公司 | Data query method and apparatus |
CN110851864A (en) * | 2019-11-08 | 2020-02-28 | 国网浙江省电力有限公司信息通信分公司 | Sensitive data automatic identification and processing method and system |
CN111191275A (en) * | 2019-11-28 | 2020-05-22 | 深圳云安宝科技有限公司 | Sensitive data identification method, system and device |
CN111222125A (en) * | 2019-12-17 | 2020-06-02 | 中国电力科学研究院有限公司 | Client and server safety protection system of enterprise browser |
CN111428273B (en) * | 2020-04-23 | 2023-08-25 | 北京中安星云软件技术有限公司 | Dynamic desensitization method and device based on machine learning |
CN111428273A (en) * | 2020-04-23 | 2020-07-17 | 北京中安星云软件技术有限公司 | Dynamic desensitization method and device based on machine learning |
CN111984983A (en) * | 2020-08-28 | 2020-11-24 | 山东健康医疗大数据有限公司 | User privacy encryption method |
CN112069203A (en) * | 2020-09-22 | 2020-12-11 | 北京百家科技集团有限公司 | Data query method and device |
CN114499901A (en) * | 2020-10-26 | 2022-05-13 | 中国移动通信有限公司研究院 | Information processing method and device, server, terminal and data platform |
CN112311879A (en) * | 2020-10-30 | 2021-02-02 | 平安信托有限责任公司 | Method and device for limiting network disk uploading, computer equipment and storage medium |
CN112380566A (en) * | 2020-11-20 | 2021-02-19 | 北京百度网讯科技有限公司 | Method, apparatus, electronic device, and medium for desensitizing document image |
CN112417406A (en) * | 2020-12-04 | 2021-02-26 | 中国电子信息产业集团有限公司第六研究所 | Data desensitization method and device, readable storage medium and electronic equipment |
CN112632597A (en) * | 2020-12-08 | 2021-04-09 | 国家计算机网络与信息安全管理中心 | Data desensitization method and device readable storage medium |
CN112487458A (en) * | 2020-12-09 | 2021-03-12 | 浪潮云信息技术股份公司 | Implementation method and system using government affair open sensitive data |
CN112714128A (en) * | 2020-12-29 | 2021-04-27 | 北京安华金和科技有限公司 | Data desensitization processing method and device |
CN113762237A (en) * | 2021-04-26 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Text image processing method, device and equipment and storage medium |
CN113762237B (en) * | 2021-04-26 | 2023-08-18 | 腾讯科技(深圳)有限公司 | Text image processing method, device, equipment and storage medium |
CN114244583A (en) * | 2021-11-30 | 2022-03-25 | 珠海大横琴科技发展有限公司 | Data processing method and device based on mobile client |
CN114726605A (en) * | 2022-03-30 | 2022-07-08 | 医渡云(北京)技术有限公司 | Sensitive data filtering method, device and system and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN109325326B (en) | 2022-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109325326A (en) | Data desensitization method, device, equipment and medium when unstructured data accesses | |
CN105590055B (en) | Method and device for identifying user credible behaviors in network interaction system | |
Han et al. | Generating fake documents using probabilistic logic graphs | |
TW201816678A (en) | Illegal transaction detection method and illegal transaction detection device | |
US9692771B2 (en) | System and method for estimating typicality of names and textual data | |
Ramanathan et al. | Phishing Website detection using latent Dirichlet allocation and AdaBoost | |
JP6532523B2 (en) | Management of user identification registration using handwriting | |
WO2021098274A1 (en) | Method and apparatus for evaluating risk of leakage of private data | |
CN109189993A (en) | Big data processing method, device, server and storage medium | |
Han et al. | CloudDLP: Transparent and scalable data sanitization for browser-based cloud storage | |
Wassan et al. | A Smart Comparative Analysis for Secure Electronic Websites. | |
Pritom et al. | Data-driven characterization and detection of covid-19 themed malicious websites | |
Wen et al. | Detecting malicious websites in depth through analyzing topics and web-pages | |
CN113918936A (en) | SQL injection attack detection method and device | |
Chen et al. | Fraud analysis and detection for real-time messaging communications on social networks | |
CN109359481A (en) | It is a kind of based on BK tree anti-collision search about subtract method | |
Xu et al. | A fast detection method of network crime based on user portrait | |
Haidar et al. | E-banking Information Security Risks Analysis Based on Ontology | |
Shravasti et al. | Smishing detection: Using artificial intelligence | |
Chen et al. | Research on Fake News Detection Based on Diffusion Growth Rate | |
KR102619521B1 (en) | Method and apparatus for encrypting confidention information based on artificial intelligence | |
US20230281296A1 (en) | Location-based pattern detection for password strength | |
Adnaan et al. | A Detailed Study on Preventing the Malicious URLs from Cyber Attacks | |
Gupta et al. | GlyphNet: Homoglyph domains dataset and detection using attention-based Convolutional Neural Networks | |
Jafari et al. | Detection of phishing addresses and pages with a data set balancing approach by generative adversarial network (GAN) and convolutional neural network (CNN) optimized with swarm intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |