CN109325326A - Data desensitization method, device, equipment and medium when unstructured data accesses - Google Patents

Data desensitization method, device, equipment and medium when unstructured data accesses Download PDF

Info

Publication number
CN109325326A
CN109325326A CN201810937005.5A CN201810937005A CN109325326A CN 109325326 A CN109325326 A CN 109325326A CN 201810937005 A CN201810937005 A CN 201810937005A CN 109325326 A CN109325326 A CN 109325326A
Authority
CN
China
Prior art keywords
data
text
desensitization
desensitize
sensitive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810937005.5A
Other languages
Chinese (zh)
Other versions
CN109325326B (en
Inventor
刘川意
方滨兴
潘鹤中
段少明
韩培义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yun An Bao Technology Co Ltd
Original Assignee
Shenzhen Yun An Bao Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yun An Bao Technology Co Ltd filed Critical Shenzhen Yun An Bao Technology Co Ltd
Priority to CN201810937005.5A priority Critical patent/CN109325326B/en
Publication of CN109325326A publication Critical patent/CN109325326A/en
Application granted granted Critical
Publication of CN109325326B publication Critical patent/CN109325326B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

Applicable data desensitization of the present invention and data sharing technology field, provide data desensitization method when a kind of access of unstructured data, device, equipment and medium, this method comprises: requesting after the approval the declaring for data access authority that user sends, obtain corresponding access privilege, user is received again to the inquiry request of unstructured data in database, inquiry request is audited according to access privilege, after the approval, according to inquiry request, data query operation is carried out in the database for possessing access privilege, obtain the corresponding data that do not desensitize, it is identified with the engine that desensitizes by sensitive data to obtained do not desensitize data progress sensitive data identification and desensitization process, obtain corresponding desensitization data, will desensitize, data return to user, realize access of the user to unstructured data in database, from And it is effectively prevented the leakage of sensitive information in unstructured data, improve the degree of protection to sensitive information in unstructured data.

Description

Data desensitization method, device, equipment and medium when unstructured data accesses
Technical field
When being accessed the invention belongs to data desensitization with data sharing technology field more particularly to a kind of unstructured data Data desensitization method, device, equipment and medium.
Background technique
With the arrival of big data era, the big data shared platform of Information transmission and machine learning has become daily Important information exchange platform in office, Communication and cooperation interaction, however, carrying out information transmission and data by network In shared procedure, the leakage of data-privacy and the secondary profiteering of key data sets are caused, therefore, how in protection data-privacy Under the premise of to third-party application public data become big data shared platform critical issue.
Solve the problems, such as that this mode is generally divided into two kinds at present:
First is that the anti-data-leakage system of building big data shared platform, for example, being in patent publication No. A kind of cloud data safe guard method is disclosed in the application for a patent for invention file of CN104767745A, is in patent publication No. The data model and its operation system of data-oriented opening and shares are disclosed in the application for a patent for invention file of CN107633181A, And data prevention method, device are disclosed in the application for a patent for invention file that patent publication No. is CN103209174B and is System, these existing data sharing platforms or guard system are in terms of protecting sensitive information, only according to user right etc. Grade and role obtain corresponding data, and data protection mode is inflexible, and these systems do not provide specific sensitive content Detection and identification method;
Second is that sensitive information detection and data desensitization system, for example, in the invention that patent publication No. is CN102970298B A kind of method for preventing from divulging a secret, equipment and system are disclosed in patent application document, are CN102467628A in patent publication No. Application for a patent for invention file in disclose the data guard method based on browser kernel Interception Technology, be in patent publication No. The leakage-preventing method of structural data assets based on classification is disclosed in the application for a patent for invention file of CN104809405A, The sensitive information detection that above-mentioned patent is mentioned is directed to file-level in sensitive information detection with data desensitization technical solution, Can only judge whether file includes sensitive information, can not detect the specific sensitive content for including in file, and be based on content Detection be also matched with canonical based on, can only be directed to text data, picture and text document (Word file, PDF cannot be detected File etc.) in sensitive information.
Summary of the invention
The purpose of the present invention is to provide a kind of unstructured data access when data desensitization method, device, equipment and Storage medium, it is intended to solve that a kind of effective unstructured data desensitization method can not be provided due to the prior art, lead to non-knot In structure data the problem of sensitive information leakage.
On the one hand, data desensitization method when being accessed the present invention provides a kind of unstructured data, the method includes Following step:
Receive the data access authority that user sends declares request, request is declared according to described, to the use of the user Family access authority is audited, to obtain corresponding access privilege;
Request is declared after the approval when described, receives the user to unstructured data in the database pre-established Inquiry request, and the inquiry request is audited according to the access privilege;
When the inquiry request after the approval, according to the inquiry request, carry out data query in the database Operation, obtains the corresponding data that do not desensitize;
Sensitive number is carried out to the data that do not desensitize described in obtaining by the identification of the sensitive data that pre-establishes and desensitization engine According to identification and desensitization process, corresponding desensitization data are obtained, the obtained desensitization data are returned into the user, realize institute State access of the user to unstructured data in the database.
On the other hand, data desensitization device when being accessed the present invention provides a kind of unstructured data, described device packet It includes:
Permission declares audit unit, and the data access authority for receiving user's transmission declares request, according to the Shen It submits a report asking for and asks, the access privilege of the user is audited, to obtain corresponding access privilege;
Access request audits unit, for declaring request after the approval when described, receives the user to pre-establishing Database in unstructured data inquiry request, and the inquiry request is examined according to the access privilege Core;
Data query unit, for working as the inquiry request after the approval, according to the inquiry request, in the data Data query operation is carried out in library, obtains the corresponding data that do not desensitize;And
Data desensitization unit, for by the identification of the sensitive data that pre-establishes with the engine that desensitizes to not taking off described in obtaining Quick data carry out sensitive data identification and desensitization process, obtain corresponding desensitization data, the obtained desensitization data are returned Back to the user, access of the user to unstructured data in the database is realized.
On the other hand, the present invention also provides a kind of calculating equipment, including memory, processor and it is stored in described deposit In reservoir and the computer program that can run on the processor, the processor are realized such as when executing the computer program Step described in data desensitization method when above-mentioned unstructured data accesses.
On the other hand, the present invention also provides a kind of computer readable storage medium, the computer readable storage mediums It is stored with computer program, number when realized when the computer program is executed by processor such as the access of above-mentioned unstructured data According to step described in desensitization method.
The request of declaring for the data access authority that the present invention sends user is audited, and after the approval, is corresponded to Access privilege, and receive user to the inquiry request of unstructured data in database, according to access privilege pair Inquiry request is audited, and after the approval, according to inquiry request, data is carried out in the database for possessing access privilege Inquiry operation obtains the corresponding data that do not desensitize, by sensitive data identify with desensitize engine to obtain do not desensitize data into The identification of row sensitive data and desensitization process, obtain corresponding desensitization data, obtained desensitization data are returned to user, realize Access of the user to unstructured data in database, so that it is effectively prevented the leakage of sensitive information in unstructured data, Improve the degree of protection to sensitive information in unstructured data.
Detailed description of the invention
Fig. 1 is the implementation process of the data desensitization method when unstructured data that the embodiment of the present invention one provides accesses Figure;
Pass through sensitive number in data desensitization method when Fig. 2 is unstructured data provided by Embodiment 2 of the present invention access With desensitization engine the data that do not desensitize are carried out with the implementation flow chart of sensitive data identification and desensitization process according to identification;
Fig. 3 is the carrying out in identification and desensitization the data that do not desensitize in image data of the offer of the embodiment of the present invention three The implementation flow chart of detection, identification and the desensitization of sensitive information;
Fig. 4 is the structural representation of the data desensitization device when unstructured data that the embodiment of the present invention four provides accesses Figure;
Fig. 5 is that the preferred structure of the data desensitization device when unstructured data that the embodiment of the present invention four provides accesses shows It is intended to;
Image data is de- in data desensitization device when Fig. 6 is the unstructured data access of the offer of the embodiment of the present invention five The structural schematic diagram of unit;And
Fig. 7 is the structural schematic diagram for the calculating equipment that the embodiment of the present invention six provides.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Specific implementation of the invention is described in detail below in conjunction with specific embodiment:
Embodiment one:
The realization stream of data desensitization method when the unstructured data that Fig. 1 shows the offer of the embodiment of the present invention one accesses Journey, for ease of description, only parts related to embodiments of the present invention are shown, and details are as follows:
In step s101, receive the data access authority that user sends and declare request, request declared according to this, to The access privilege at family is audited, to obtain corresponding access privilege.
The embodiment of the present invention is suitable for data sharing and interaction platform, system or equipment, for example, personal computer, service Device etc..User declares module declaration data access authority to preset third-party application permission, and third-party application permission declares mould Root tuber declares request according to this that user submits, and whether audit user has data query permission and the data access model of user It encloses, when after the approval, obtaining corresponding access privilege, which is that user accesses number in correspondence database According to permission, and access privilege is submitted to the third-party application authority set that pre-establishes and is saved, pass back through information, if Audit does not pass through, then returns and do not pass through information.
In step s 102, when declaring request after the approval, user is received to non-structural in the database pre-established Change the inquiry request of data, and inquiry request is audited according to access privilege.
In embodiments of the present invention, when declaring request after the approval, user writes the database for meeting and pre-establishing and wants The data query script or program asked, to be visited according to the data query script or program unstructured data in database It asks, inquire, before the data in official visit database, the inquiry request submitted according to access privilege to user (is counted According to query script or program) it is audited, to determine whether the user possesses the permission of access database, to guarantee user only The database for possessing corresponding access privilege can be accessed, wherein unstructured data, which refers to, does not have predefined data model Or the data information of predefined mode tissue is not pressed, including text document (PDF document, Word document, WPS document, LaTeX text Shelves etc.), image data, audio-visual data, medical data etc..
In step s 103, when inquiry request after the approval, according to inquiry request, carry out data query in the database Operation, obtains the corresponding data that do not desensitize.
In embodiments of the present invention, after determining that the user possesses the access authority of the database, according to the use received Family carries out data query operation to the inquiry request of unstructured data in the database in the database, obtain it is corresponding not Desensitize data.
In step S104, by the identification of the sensitive data that pre-establishes and desensitization engine to obtain do not desensitize data into The identification of row sensitive data and desensitization process, obtain corresponding desensitization data, obtained desensitization data are returned to user, realize Access of the user to unstructured data in database.
In embodiments of the present invention, it is identified with desensitization engine by the sensitive data pre-established to the obtained number that do not desensitize According to sensitive data identification and desensitization process is carried out, corresponding desensitization data are obtained, and obtained desensitization data are returned into user, To realize access of the user to unstructured data in database, wherein sensitive data include personal information (for example, phone, live Location, ID card No. etc.), personal account number's password (including mailbox account number cipher, bank logon password etc.), personal asset etc..
In embodiments of the present invention, the request of declaring of the data access authority sent to user is audited, and audit passes through Afterwards, corresponding access privilege is obtained, and receives user to the inquiry request of unstructured data in database, according to user Access authority audits inquiry request, after the approval, according to inquiry request, in the database for possessing access privilege Middle carry out data query operation, obtains the corresponding data that do not desensitize, is identified with the engine that desensitizes by sensitive data to obtaining not The data that desensitize carry out sensitive data identification and desensitization process, obtain corresponding desensitization data, and obtained desensitization data are returned To user, access of the user to unstructured data in database is realized, to be effectively prevented in unstructured data sensitive The leakage of information improves the degree of protection to sensitive information in unstructured data.
Embodiment two:
Fig. 2 shows step S104 in embodiment one provided by Embodiment 2 of the present invention to be identified and be desensitized by sensitive data Engine carries out the implementation process of sensitive data identification and desensitization process to the data that do not desensitize, for ease of description, illustrate only with The relevant part of the embodiment of the present invention, details are as follows:
In step s 201, it is identified by sensitive data and data type analysis is carried out to the data that do not desensitize with desensitization engine, Data are not desensitized as image data or text document data using determination.
In embodiments of the present invention, the data that do not desensitize are carried out with desensitization engine by the sensitive data identification pre-established Data type analysis determines that the data that do not desensitize are image data or text document data according to the result of analysis.
It in step S202, is analyzed according to data type of the step S201 to the data that do not desensitize, determines the number that do not desensitize According to for image data or text document data, step S203 is executed when the data that do not desensitize analyzed are image data, when The data that do not desensitize analyzed execute step S204 when being text document data.
In step S203, by trained pictorial information identification in advance with desensitization model to the sensitivity in image data Information is identified and is desensitized, to complete the desensitization operation to the data that do not desensitize.
In embodiments of the present invention, pictorial information identification and desensitization model include text detection model and Text region mould Type is identified and is desensitized to the sensitive information in image data by text detection model and the completion of Text region model.
Before the model that identifies and desensitize by pictorial information is identified and is desensitized to the sensitive information in image data, Preferably, (Long Short-Term Memory, LSTM) network pair is remembered in conjunction with VGG-16 convolutional neural networks and shot and long term Text detection model is trained, in conjunction with convolutional neural networks (Convolutional Neural Network, CNN) algorithm and LSTM network is trained Text region model, to improve the recognition accuracy to text information in image data.
In embodiments of the present invention, it is preferable that text detection model is trained by following step realization:
(1.1) pass through the visual geometric group convolutional neural networks (Visual with 13 convolutional layers, 3 full articulamentums Geometry Group-16, VGG-16) picture feature extraction is carried out to pre-generated training image data collection, it is corresponded to Characteristic pattern;
(1.2) suggest network (Region Proposal Network, RPN) in VGG-16 convolutional Neural net by region Text suggestion areas (anchors) is generated on the corresponding image data of the last layer characteristic pattern that network obtains;
(1.3) by LSTM network according to obtained characteristic pattern learning text spatial context information, then pass through one Full articulamentum merges the feature learnt;
(1.4) each text suggestion areas is predicted according to fused feature, if text suggestion areas is pre- It is text filed for surveying, then returns to the corresponding position of text suggestion areas, otherwise abandons text suggestion areas.
In embodiments of the present invention, the training to text detection model is realized by step (1.1)~(1.4), to mention The high subsequent accuracy to positioning text filed in image data.
It is further preferred that during to text detection model training, the parameter packet of text detection model training recurrence Text box classification numerical value, the transverse and longitudinal coordinate of text box central point and the height of text box and width are included, thus by disposably returning Return text filed size position, realize text filed whole prediction, reduces accidentally frame and increase text at the same time The recurrence of region horizontal position has furthered and has belonged to the distance between the text box of one text row, avoided scrappy text box Occur.
In embodiments of the present invention, it is another preferably, by following step realization Text region model is trained:
(2.1) by the data self-generating algorithm based on background collection, font set and corpus to pre- Mr. in step (1.1) At training image data collection expanded, generate training picture EDS extended data set;
(2.2) picture feature extraction is carried out to training picture EDS extended data set by CNN algorithm, obtains corresponding feature Figure;
(2.3) text context letter is extracted by LSTM network on the last layer characteristic pattern obtained in step (2.2) Breath;
(2.4) the text context information input connectionism time sorter for extracting LSTM network Be labeled in (Connectionist temporal classification, CTC) model, sequence alignment, and will mark and Information after alignment reversely passes to Text region model, is trained to the model parameter in Text region model.
In embodiments of the present invention, the training to Text region model is realized by step (2.1)~(2.4), thus logical It crosses expansion training dataset and solves the problems, such as training data wretched insufficiency, improve subsequent to text filed middle text information knowledge Other accuracy.
In step S204, the content of text document data is parsed by preset text document resolver, and The content parsed is subjected to classification output according to text data or image data.
In embodiments of the present invention, the content of text document data is solved by preset text document resolver Analysis extracts text further according to parsing result to determine in text document data with the presence or absence of image content or content of text Image content and content of text in this document data obtain corresponding image data and text data, and the picture that will be obtained Data and text data carry out classification output, wherein text document resolver include document resolver, document object memory with And document content resolver.
Preferably, the content of text document data is parsed by following step realization:
(1) text document data is input to pre-set document resolver, extracts the data in text document data Object;
(2) by data object storage into pre-set document object memory, and will be literary by document object memory Shelves page object is input in pre-set document content resolver;
(3) document content resolver extracts the content in data object, and by the content extracted according to text data and Image data carries out classification output.
In embodiments of the present invention, by step (the 1)~parsing of (3) to text document data, to improve to text The parsing accuracy of document data.
In step S205, determines that output is the text data or image data in text document data, work as output Step S206 is executed when being text data, go to step S203 when output is image data.
In step S206, data desensitization is carried out to text data according to access privilege.
In embodiments of the present invention, when output is the text data in text document data, according to user's access right The sensitive information in this article notebook data is compared in limit, finds the sensitive information for not having access authority in this article notebook data, Palindrome notebook data text is filled out after encrypting to the sensitive information found, to complete data desensitization.
In embodiments of the present invention, it is identified with the engine that desensitizes by sensitive data to the figure not desensitized in unstructured data Sheet data and text document data carry out sensitive information detection, identification and desensitization respectively, thus improve to unstructured data into The specific aim of row data desensitization, and then improve the degree of protection to sensitive information.
Embodiment three:
Fig. 3 shows step S203 in the embodiment two of the offer of the embodiment of the present invention three and passes through trained picture letter in advance The implementation process that breath identification is identified and desensitized to the sensitive information in image data with desensitization model, for ease of description, Only parts related to embodiments of the present invention are shown, and details are as follows:
In step S301, text filed positioning is carried out to image data by text detection model.
In embodiments of the present invention, when carrying out text filed positioning to image data by text detection model, preferably It is realized by following step to text filed positioning on ground:
(1) it is extracted using the conventional part for the VGG-16 convolutional neural networks for having trained parameter in text detection model Picture feature in image data obtains corresponding characteristic pattern;
(2) text is generated on the corresponding image data of the last layer characteristic pattern obtained in step (1) by RPN network Suggestion areas;
(3) the characteristic pattern learning text spatial context information obtained by LSTM network according to step (1), then passes through One full articulamentum merges the feature learnt;
(4) each text suggestion areas that step (2) obtains is predicted according to fused feature, if the text is built View region is predicted to be text filed, then returns to the corresponding position of text suggestion areas, otherwise abandons the text and builds Discuss region;
(5) contraposition puts back into the text suggestion areas after returning and carries out position duplicate removal, then will belong to the text of one text row This suggestion areas merges, and obtains text box.
In embodiments of the present invention, the recurrence of text filed horizontal position is increased by step (1)~(5), to draw It is close to belong to the distance between one text row text box, avoid the appearance of scrappy text box.
In step s 302, by Text region model to navigate to it is text filed in text information identify.
In embodiments of the present invention, by Text region model to navigate to it is text filed in text information carry out When identification, it is preferable that realize the identification to text information by following step:
(1) text box that sub-step (5) obtains in step S301 is input in Text region model;
(2) by the way that trained CNN algorithm carries out picture spy to the text box of input in Text region model training Sign is extracted, and corresponding characteristic pattern is obtained;
(3) text context feature is extracted on the last layer characteristic pattern obtained in step (2) using LSTM network, obtain To corresponding text information.
In embodiments of the present invention, the identification to text filed middle text information is realized by step (1)~(3), thus Improve the resolution to text filed middle text information.
In step S303, according to access privilege, sensitive information matching is carried out to the text information identified, and right The sensitive information matched is encrypted, to complete the desensitization operation to the data that do not desensitize.
In embodiments of the present invention, the text information identified is matched according to access privilege, finds and does not have The sensitive information of standby access authority, and the sensitive information found is encrypted.
Preferably, it is realized by following step and sensitive information matching is carried out to text information, and to the sensitivity matched Information is encrypted:
(1) by the access right of the user in sub-step (3) obtains in step S302 text information and third party's authority set Limit compares, and finds the sensitive information for not having access authority;
(2) according to the sensitive information, find the sensitive information it is corresponding it is text filed in character area;
(3) the character area image value that should be encrypted is set to white, and the character area is encrypted, returned and add Picture after close.
In embodiments of the present invention, it is realized by step (1)~(3) and the sensitive information found is encrypted, thus The safety encrypted to sensitive information is improved, and then improves the safety of data access.
In embodiments of the present invention, by preparatory trained text detection model and Text region model to image data In text information identified, and data desensitization is carried out to the text information that recognizes according to access privilege, to complete Desensitization operation to the data that do not desensitize, to improve to the recognition accuracy of text information in image data and in image data The degree of safety of sensitive information processing.
Example IV:
The structure of data desensitization device when the unstructured data that Fig. 4 shows the offer of the embodiment of the present invention four accesses, For ease of description, only parts related to embodiments of the present invention are shown, including:
Permission declares audit unit 41, and the data access authority for receiving user's transmission declares request, according to the Shen It submits a report asking for and asks, the access privilege of user is audited, to obtain corresponding access privilege;
Access request audits unit 42, for receiving user to the data pre-established when declaring request after the approval The inquiry request of unstructured data in library, and inquiry request is audited according to access privilege;
Data query unit 43, according to inquiry request, is counted in the database for working as inquiry request after the approval According to inquiry operation, the corresponding data that do not desensitize are obtained;
Data desensitization unit 44 does not desensitize with desensitization engine to what is obtained for the sensitive data identification by pre-establishing Data carry out sensitive data identification and desensitization process, obtain corresponding desensitization data, obtained desensitization data are returned to use Access of the user to unstructured data in database is realized at family.
As shown in Figure 5, it is preferable that data desensitization unit 44 includes:
Data type analysis unit 441, for being counted by sensitive data identification and desensitization engine to the data that do not desensitize According to type analysis, data are not desensitized as image data or text document data using determination;
Image data desensitization unit 442, for when analyze do not desensitize data be image data when, by training in advance Pictorial information identification with desensitization model the sensitive information in image data is identified and is desensitized, with completion to the number that do not desensitize According to desensitization operation, pictorial information identification with desensitization model include text detection model and Text region model;And
Document data desensitization unit 443, for when analyze do not desensitize data be text document data when, by preset Text document resolver parses the content of text document data, and by the content parsed according to text data or figure Sheet data carries out classification output, when output is text data, carries out data desensitization to text data according to access privilege, When output is image data, triggering image data desensitization unit 442 execute by trained pictorial information identification in advance with Desensitization model is identified and is desensitized to the sensitive information of image data.
It is another preferably, unstructured data provided in an embodiment of the present invention access when data desensitize device further include:
Model training unit, for combining VGG-16 convolutional neural networks and LSTM network to instruct text detection model Practice, Text region model is trained in conjunction with CNN network algorithm and LSTM network, wherein text detection model training process The parameter of middle recurrence includes text box classification numerical value, the transverse and longitudinal coordinate of text box central point and the height of text box and width.
In embodiments of the present invention, each unit of data desensitization device when unstructured data accesses can be by corresponding hard Part or software unit realize that each unit can be independent soft and hardware unit, also can integrate as a soft and hardware unit, This is not to limit the present invention.The specific embodiment of each unit can refer to the correspondence step in preceding method embodiment, herein It repeats no more.
Embodiment five:
Fig. 6 shows the structure of image data desensitization unit 442 in the example IV of the offer of the embodiment of the present invention five, in order to Convenient for explanation, only parts related to embodiments of the present invention are shown, including:
Text filed positioning unit 61, for carrying out text filed positioning to image data by text detection model;
Text information recognition unit 62, for by Text region model to navigate to it is text filed in text information It is identified;And
Sensitive information encryption unit 63, for carrying out sensitive letter to the text information identified according to access privilege Breath matching, and the sensitive information matched is encrypted, to complete the desensitization operation to the data that do not desensitize.
In embodiments of the present invention, image data desensitization unit knot in data desensitization device when unstructured data accesses Each unit in structure can be realized that each unit can be independent soft and hardware unit by corresponding hardware or software unit, can also be with It is integrated into a soft and hardware unit, herein not to limit the present invention.The specific embodiment of each unit can refer to aforementioned implementation Correspondence step description in example three, details are not described herein.
Embodiment six:
Fig. 7 shows the structure of the calculating equipment of the offer of the embodiment of the present invention six, for ease of description, illustrates only and this The relevant part of inventive embodiments.
The calculating equipment 7 of the embodiment of the present invention includes processor 70, memory 71 and is stored in memory 71 and can The computer program 72 run on processor 70.The processor 70 realizes above-mentioned unstructured number when executing computer program 72 The step in data desensitization method embodiment when according to access, such as step S101 to S104 shown in FIG. 1.Alternatively, processor The function of each unit in above-mentioned each Installation practice, such as unit 41 to 44 shown in Fig. 4 are realized when 70 execution computer program 72 Function.
In embodiments of the present invention, the request of declaring of the data access authority sent to user is audited, and audit passes through Afterwards, corresponding access privilege is obtained, and receives user to the inquiry request of unstructured data in database, according to user Access authority audits inquiry request, after the approval, according to inquiry request, in the database for possessing access privilege Middle carry out data query operation, obtains the corresponding data that do not desensitize, is identified with the engine that desensitizes by sensitive data to obtaining not The data that desensitize carry out sensitive data identification and desensitization process, obtain corresponding desensitization data, and obtained desensitization data are returned To user, access of the user to unstructured data in database is realized, to be effectively prevented in unstructured data sensitive The leakage of information improves the degree of protection to sensitive information in unstructured data.
The calculating equipment of the embodiment of the present invention can be personal computer, server.Processor 70 is held in the calculating equipment 7 The step of realizing when realizing the data desensitization method when unstructured data access when row computer program 72 can refer to aforementioned side The description of method embodiment, details are not described herein.
Embodiment seven:
In embodiments of the present invention, a kind of computer readable storage medium is provided, which deposits Computer program is contained, which realizes the data desensitization when access of above-mentioned unstructured data when being executed by processor Step in embodiment of the method, for example, step S101 to S104 shown in FIG. 1.Alternatively, the computer program is executed by processor The function of each unit in the above-mentioned each Installation practice of Shi Shixian, such as the function of unit 41 to 44 shown in Fig. 4.
In embodiments of the present invention, the request of declaring of the data access authority sent to user is audited, and audit passes through Afterwards, corresponding access privilege is obtained, and receives user to the inquiry request of unstructured data in database, according to user Access authority audits inquiry request, after the approval, according to inquiry request, in the database for possessing access privilege Middle carry out data query operation, obtains the corresponding data that do not desensitize, is identified with the engine that desensitizes by sensitive data to obtaining not The data that desensitize carry out sensitive data identification and desensitization process, obtain corresponding desensitization data, and obtained desensitization data are returned To user, access of the user to unstructured data in database is realized, to be effectively prevented in unstructured data sensitive The leakage of information improves the degree of protection to sensitive information in unstructured data.
The computer readable storage medium of the embodiment of the present invention may include can carry computer program code any Entity or device, recording medium, for example, the memories such as ROM/RAM, disk, CD, flash memory.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims (10)

  1. Data desensitization method when 1. a kind of unstructured data accesses, which is characterized in that the method includes the following steps:
    Receive the data access authority that user sends declares request, declares request according to described, visits the user of the user Ask that permission is audited, to obtain corresponding access privilege;
    Request is declared after the approval when described, receives the user and unstructured data in the database pre-established is looked into Request is ask, and the inquiry request is audited according to the access privilege;
    When the inquiry request after the approval, according to the inquiry request, carry out data query operation in the database, Obtain the corresponding data that do not desensitize;
    Sensitive data knowledge is carried out to the data that do not desensitize described in obtaining by the identification of the sensitive data that pre-establishes and desensitization engine Not and desensitization process, corresponding desensitization data are obtained, the obtained desensitization data are returned into the user, described in realization Access of the user to unstructured data in the database.
  2. 2. the method as described in claim 1, which is characterized in that pass through sensitive data identification and the desensitization engine pair pre-established The step of obtained data that do not desensitize carry out sensitive data identification and desensitization process, comprising:
    It is identified with the engine that desensitizes to the data progress data type analysis that do not desensitize, described in determination by the sensitive data The data that do not desensitize are image data or text document data;
    When analyzing the data that do not desensitize is image data, pass through trained pictorial information identification in advance and desensitization model Sensitive information in the image data is identified and desensitized, to complete the desensitization operation to the data that do not desensitize, institute It includes text detection model and Text region model that pictorial information identification, which is stated, with desensitization model;
    When analyzing the data that do not desensitize is text document data, by preset text document resolver to the text The content of document data is parsed, and the content parsed is carried out classification output according to text data or image data, When output is text data, data desensitization is carried out to the text data according to the access privilege, when output is figure When sheet data, jump to through sensitive information of the trained pictorial information identification with desensitization model to the image data in advance The step of being identified and being desensitized.
  3. 3. method according to claim 2, which is characterized in that pass through trained pictorial information identification in advance and desensitization model The step of sensitive information in the image data is identified and is desensitized, comprising:
    Text filed positioning is carried out to the image data by the text detection model;
    By the Text region model to it is described navigate to it is text filed in text information identify;
    According to the access privilege, sensitive information matching carried out to the text information identified, and to matching The sensitive information is encrypted, to complete the desensitization operation to the data that do not desensitize.
  4. 4. method according to claim 2, which is characterized in that pass through trained pictorial information identification in advance and desensitization model Before the step of sensitive information in the image data is identified and is desensitized, the method also includes:
    The text detection model is trained in conjunction with VGG-16 convolutional neural networks and LSTM network, is calculated in conjunction with CNN network Method and LSTM network are trained the Text region model, wherein return during the text detection model training Parameter includes text box classification numerical value, the transverse and longitudinal coordinate of text box central point and the height of text box and width.
  5. The device 5. data when a kind of unstructured data accesses desensitize, which is characterized in that described device includes:
    Permission declares audit unit, and the data access authority for receiving user's transmission declares request, is asked according to described declare It asks, the access privilege of the user is audited, to obtain corresponding access privilege;
    Access request audits unit, for declaring request after the approval when described, receives the user to the number pre-established The inquiry request is audited according to the inquiry request of unstructured data in library, and according to the access privilege;
    Data query unit, for working as the inquiry request after the approval, according to the inquiry request, in the database Data query operation is carried out, the corresponding data that do not desensitize are obtained;And
    Data desensitization unit, for by the identification of the sensitive data that pre-establishes and desensitization engine to the number that do not desensitize described in obtaining According to sensitive data identification and desensitization process is carried out, corresponding desensitization data are obtained, the obtained desensitization data are returned to The user realizes access of the user to unstructured data in the database.
  6. 6. device as claimed in claim 5, which is characterized in that the data desensitization unit includes:
    Data type analysis unit, for being counted by sensitive data identification and desensitization engine to the data that do not desensitize According to type analysis, data are not desensitized as image data or text document data so that determination is described;
    Image data desensitization unit, for when analyzing the data that do not desensitize is image data, by trained in advance Pictorial information identification with desensitization model the sensitive information in the image data is identified and is desensitized, with completion to it is described not The desensitization operation for the data that desensitize, the pictorial information identification and desensitization model include text detection model and Text region model; And
    Document data desensitization unit, for passing through preset text when analyzing the data that do not desensitize is text document data This document resolver parses the content of the text document data, and by the content parsed according to text data or Image data carries out classification output, when output be text data when, according to the access privilege to the text data into The desensitization of row data triggers the image data desensitization unit and executes through preparatory trained figure when output is image data The identification of piece information is identified and is desensitized to the sensitive information of the image data with desensitization model.
  7. 7. device as claimed in claim 6, which is characterized in that the image data desensitization unit includes:
    Text filed positioning unit, for carrying out text filed positioning to the image data by the text detection model;
    Text information recognition unit, for by the Text region model to it is described navigate to it is text filed in text believe Breath is identified;And
    Sensitive information encryption unit, for being carried out to the text information identified sensitive according to the access privilege Information matches, and the sensitive information matched is encrypted, to complete the desensitization operation to the data that do not desensitize.
  8. 8. device as claimed in claim 6, which is characterized in that described device further include:
    Model training unit, for being instructed in conjunction with VGG-16 convolutional neural networks and LSTM network to the text detection model Practice, the Text region model is trained in conjunction with CNN network algorithm and LSTM network, wherein the text detection model The parameter returned in training process include text box classification numerical value, the transverse and longitudinal coordinate of text box central point and the height of text box and Width.
  9. 9. a kind of calculating equipment, including memory, processor and storage are in the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as Claims 1-4 when executing the computer program The step of any one the method.
  10. 10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In when the computer program is executed by processor the step of any one of such as Claims 1-4 of realization the method.
CN201810937005.5A 2018-08-16 2018-08-16 Data desensitization method, device, equipment and medium during unstructured data access Active CN109325326B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810937005.5A CN109325326B (en) 2018-08-16 2018-08-16 Data desensitization method, device, equipment and medium during unstructured data access

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810937005.5A CN109325326B (en) 2018-08-16 2018-08-16 Data desensitization method, device, equipment and medium during unstructured data access

Publications (2)

Publication Number Publication Date
CN109325326A true CN109325326A (en) 2019-02-12
CN109325326B CN109325326B (en) 2022-09-30

Family

ID=65263730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810937005.5A Active CN109325326B (en) 2018-08-16 2018-08-16 Data desensitization method, device, equipment and medium during unstructured data access

Country Status (1)

Country Link
CN (1) CN109325326B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109981619A (en) * 2019-03-13 2019-07-05 泰康保险集团股份有限公司 Data capture method, device, medium and electronic equipment
CN110188565A (en) * 2019-04-17 2019-08-30 平安科技(深圳)有限公司 Data desensitization method, device, computer equipment and storage medium
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language
CN110245505A (en) * 2019-05-20 2019-09-17 中国平安人寿保险股份有限公司 Tables of data access method, device, computer equipment and storage medium
CN110413643A (en) * 2019-06-17 2019-11-05 中国平安财产保险股份有限公司 Data query method and apparatus
CN110851864A (en) * 2019-11-08 2020-02-28 国网浙江省电力有限公司信息通信分公司 Sensitive data automatic identification and processing method and system
CN111191275A (en) * 2019-11-28 2020-05-22 深圳云安宝科技有限公司 Sensitive data identification method, system and device
CN111222125A (en) * 2019-12-17 2020-06-02 中国电力科学研究院有限公司 Client and server safety protection system of enterprise browser
CN111428273A (en) * 2020-04-23 2020-07-17 北京中安星云软件技术有限公司 Dynamic desensitization method and device based on machine learning
CN111984983A (en) * 2020-08-28 2020-11-24 山东健康医疗大数据有限公司 User privacy encryption method
CN112069203A (en) * 2020-09-22 2020-12-11 北京百家科技集团有限公司 Data query method and device
CN112311879A (en) * 2020-10-30 2021-02-02 平安信托有限责任公司 Method and device for limiting network disk uploading, computer equipment and storage medium
CN112380566A (en) * 2020-11-20 2021-02-19 北京百度网讯科技有限公司 Method, apparatus, electronic device, and medium for desensitizing document image
CN112417406A (en) * 2020-12-04 2021-02-26 中国电子信息产业集团有限公司第六研究所 Data desensitization method and device, readable storage medium and electronic equipment
CN112487458A (en) * 2020-12-09 2021-03-12 浪潮云信息技术股份公司 Implementation method and system using government affair open sensitive data
CN112632597A (en) * 2020-12-08 2021-04-09 国家计算机网络与信息安全管理中心 Data desensitization method and device readable storage medium
CN112714128A (en) * 2020-12-29 2021-04-27 北京安华金和科技有限公司 Data desensitization processing method and device
CN113762237A (en) * 2021-04-26 2021-12-07 腾讯科技(深圳)有限公司 Text image processing method, device and equipment and storage medium
CN114244583A (en) * 2021-11-30 2022-03-25 珠海大横琴科技发展有限公司 Data processing method and device based on mobile client
CN114499901A (en) * 2020-10-26 2022-05-13 中国移动通信有限公司研究院 Information processing method and device, server, terminal and data platform
CN114726605A (en) * 2022-03-30 2022-07-08 医渡云(北京)技术有限公司 Sensitive data filtering method, device and system and computer equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809405A (en) * 2015-04-24 2015-07-29 广东电网有限责任公司信息中心 Structural data asset leakage prevention method based on hierarchical classification
CN106529329A (en) * 2016-10-11 2017-03-22 中国电子科技网络信息安全有限公司 Desensitization system and desensitization method used for big data
CN107315972A (en) * 2017-06-01 2017-11-03 北京明朝万达科技股份有限公司 A kind of dynamic desensitization method of big data unstructured document and system
CN108153468A (en) * 2017-12-14 2018-06-12 阿里巴巴集团控股有限公司 Image processing method and device
CN108197486A (en) * 2017-12-20 2018-06-22 北京天融信网络安全技术有限公司 Big data desensitization method, system, computer-readable medium and equipment
CN108259491A (en) * 2018-01-15 2018-07-06 北京炼石网络技术有限公司 For the method, apparatus and its system of unstructured data safe handling

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809405A (en) * 2015-04-24 2015-07-29 广东电网有限责任公司信息中心 Structural data asset leakage prevention method based on hierarchical classification
CN106529329A (en) * 2016-10-11 2017-03-22 中国电子科技网络信息安全有限公司 Desensitization system and desensitization method used for big data
CN107315972A (en) * 2017-06-01 2017-11-03 北京明朝万达科技股份有限公司 A kind of dynamic desensitization method of big data unstructured document and system
CN108153468A (en) * 2017-12-14 2018-06-12 阿里巴巴集团控股有限公司 Image processing method and device
CN108197486A (en) * 2017-12-20 2018-06-22 北京天融信网络安全技术有限公司 Big data desensitization method, system, computer-readable medium and equipment
CN108259491A (en) * 2018-01-15 2018-07-06 北京炼石网络技术有限公司 For the method, apparatus and its system of unstructured data safe handling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
彭海: "基于异构计算的图片敏感文字检测系统", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109981619A (en) * 2019-03-13 2019-07-05 泰康保险集团股份有限公司 Data capture method, device, medium and electronic equipment
CN110188565A (en) * 2019-04-17 2019-08-30 平安科技(深圳)有限公司 Data desensitization method, device, computer equipment and storage medium
CN110245505A (en) * 2019-05-20 2019-09-17 中国平安人寿保险股份有限公司 Tables of data access method, device, computer equipment and storage medium
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language
CN110413643A (en) * 2019-06-17 2019-11-05 中国平安财产保险股份有限公司 Data query method and apparatus
CN110851864A (en) * 2019-11-08 2020-02-28 国网浙江省电力有限公司信息通信分公司 Sensitive data automatic identification and processing method and system
CN111191275A (en) * 2019-11-28 2020-05-22 深圳云安宝科技有限公司 Sensitive data identification method, system and device
CN111222125A (en) * 2019-12-17 2020-06-02 中国电力科学研究院有限公司 Client and server safety protection system of enterprise browser
CN111428273B (en) * 2020-04-23 2023-08-25 北京中安星云软件技术有限公司 Dynamic desensitization method and device based on machine learning
CN111428273A (en) * 2020-04-23 2020-07-17 北京中安星云软件技术有限公司 Dynamic desensitization method and device based on machine learning
CN111984983A (en) * 2020-08-28 2020-11-24 山东健康医疗大数据有限公司 User privacy encryption method
CN112069203A (en) * 2020-09-22 2020-12-11 北京百家科技集团有限公司 Data query method and device
CN114499901A (en) * 2020-10-26 2022-05-13 中国移动通信有限公司研究院 Information processing method and device, server, terminal and data platform
CN112311879A (en) * 2020-10-30 2021-02-02 平安信托有限责任公司 Method and device for limiting network disk uploading, computer equipment and storage medium
CN112380566A (en) * 2020-11-20 2021-02-19 北京百度网讯科技有限公司 Method, apparatus, electronic device, and medium for desensitizing document image
CN112417406A (en) * 2020-12-04 2021-02-26 中国电子信息产业集团有限公司第六研究所 Data desensitization method and device, readable storage medium and electronic equipment
CN112632597A (en) * 2020-12-08 2021-04-09 国家计算机网络与信息安全管理中心 Data desensitization method and device readable storage medium
CN112487458A (en) * 2020-12-09 2021-03-12 浪潮云信息技术股份公司 Implementation method and system using government affair open sensitive data
CN112714128A (en) * 2020-12-29 2021-04-27 北京安华金和科技有限公司 Data desensitization processing method and device
CN113762237A (en) * 2021-04-26 2021-12-07 腾讯科技(深圳)有限公司 Text image processing method, device and equipment and storage medium
CN113762237B (en) * 2021-04-26 2023-08-18 腾讯科技(深圳)有限公司 Text image processing method, device, equipment and storage medium
CN114244583A (en) * 2021-11-30 2022-03-25 珠海大横琴科技发展有限公司 Data processing method and device based on mobile client
CN114726605A (en) * 2022-03-30 2022-07-08 医渡云(北京)技术有限公司 Sensitive data filtering method, device and system and computer equipment

Also Published As

Publication number Publication date
CN109325326B (en) 2022-09-30

Similar Documents

Publication Publication Date Title
CN109325326A (en) Data desensitization method, device, equipment and medium when unstructured data accesses
CN105590055B (en) Method and device for identifying user credible behaviors in network interaction system
Han et al. Generating fake documents using probabilistic logic graphs
TW201816678A (en) Illegal transaction detection method and illegal transaction detection device
US9692771B2 (en) System and method for estimating typicality of names and textual data
Ramanathan et al. Phishing Website detection using latent Dirichlet allocation and AdaBoost
JP6532523B2 (en) Management of user identification registration using handwriting
WO2021098274A1 (en) Method and apparatus for evaluating risk of leakage of private data
CN109189993A (en) Big data processing method, device, server and storage medium
Han et al. CloudDLP: Transparent and scalable data sanitization for browser-based cloud storage
Wassan et al. A Smart Comparative Analysis for Secure Electronic Websites.
Pritom et al. Data-driven characterization and detection of covid-19 themed malicious websites
Wen et al. Detecting malicious websites in depth through analyzing topics and web-pages
CN113918936A (en) SQL injection attack detection method and device
Chen et al. Fraud analysis and detection for real-time messaging communications on social networks
CN109359481A (en) It is a kind of based on BK tree anti-collision search about subtract method
Xu et al. A fast detection method of network crime based on user portrait
Haidar et al. E-banking Information Security Risks Analysis Based on Ontology
Shravasti et al. Smishing detection: Using artificial intelligence
Chen et al. Research on Fake News Detection Based on Diffusion Growth Rate
KR102619521B1 (en) Method and apparatus for encrypting confidention information based on artificial intelligence
US20230281296A1 (en) Location-based pattern detection for password strength
Adnaan et al. A Detailed Study on Preventing the Malicious URLs from Cyber Attacks
Gupta et al. GlyphNet: Homoglyph domains dataset and detection using attention-based Convolutional Neural Networks
Jafari et al. Detection of phishing addresses and pages with a data set balancing approach by generative adversarial network (GAN) and convolutional neural network (CNN) optimized with swarm intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant