CN110472434A - Data desensitization method, system, medium and electronic equipment - Google Patents

Data desensitization method, system, medium and electronic equipment Download PDF

Info

Publication number
CN110472434A
CN110472434A CN201910628442.3A CN201910628442A CN110472434A CN 110472434 A CN110472434 A CN 110472434A CN 201910628442 A CN201910628442 A CN 201910628442A CN 110472434 A CN110472434 A CN 110472434A
Authority
CN
China
Prior art keywords
data
type
value
character string
action type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910628442.3A
Other languages
Chinese (zh)
Other versions
CN110472434B (en
Inventor
江国洲
谭典雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910628442.3A priority Critical patent/CN110472434B/en
Publication of CN110472434A publication Critical patent/CN110472434A/en
Application granted granted Critical
Publication of CN110472434B publication Critical patent/CN110472434B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of data desensitization method, system, medium and electronic equipment.The described method includes: extracting user data in offline database;The user data is handled, incremental data and current version snapshot contents that user generates within a period of time to documents editing are obtained;Judge that the action type in the incremental data whether there is in action type white list;When the action type in the incremental data is present in the action type white list, desensitization process is carried out to the different data structure of the corresponding value of the action type.The method carries out the deformation of data by desensitization rule to the sensitive information in user data, realizes the reliably protecting of privacy-sensitive data.

Description

Data desensitization method, system, medium and electronic equipment
Technical field
The present invention relates to technical field of data processing, in particular to a kind of data desensitization method, system, medium and Electronic equipment.
Background technique
With the iteratively faster of product, often cause problems during iteration, in order to solve the problems, such as to occur, leads to Often more profound debugging can be carried out again after getting information using backstage debugging tool (Debug) come orientation problem.So And during orientation problem, the sensitive information in user data can expose, and be easy to cause the leakage of user data.
Therefore, in long-term research and development, inventor exchanges the data sensitive problem progress during examination orientation problem One of a large amount of research, proposes a kind of data desensitization method, to solve the above technical problems.
Summary of the invention
The purpose of the present invention is to provide a kind of data desensitization method, system, medium and electronic equipments, are able to solve above-mentioned At least one technical problem mentioned.Concrete scheme is as follows:
Specific embodiment according to the present invention, in a first aspect, the present invention provides a kind of data desensitization method, comprising: In User data is extracted in offline database;The user data is handled, user is obtained and is produced within a period of time to documents editing Raw incremental data and current version snapshot contents;Judge that the action type in the incremental data whether there is in operation class Type white list;When the action type in the incremental data is present in the action type white list, to the action type The different data structure of corresponding value carries out desensitization process.
Specific embodiment according to the present invention, second aspect, the present invention provide a kind of data desensitization system, comprising: mention Modulus block, for extracting user data in offline database;Processing module obtains user for handling the user data The incremental data and current version snapshot contents generated within a period of time to documents editing;Judgment module, for judging Action type in the incremental data whether there is in action type white list;Desensitize module, for working as the incremental data In action type when being present in the action type white list, to the different data structure of the corresponding value of the action type into Row desensitization process.
Specific embodiment according to the present invention, the third aspect, the present invention provide a kind of computer readable storage medium, On be stored with computer program, when described program is executed by processor realize as above described in any item data desensitization methods.
Specific embodiment according to the present invention, fourth aspect, the present invention provide a kind of electronic equipment, comprising: one or Multiple processors;Storage device, for storing one or more programs, when one or more of programs are by one or more When a processor executes, so that one or more of processors realize as above described in any item data desensitization methods.
The above scheme of the embodiment of the present invention compared with prior art, at least has the advantages that
First, the deformation of data is carried out to the sensitive information in user data by desensitization rule, realizes privacy-sensitive number According to reliably protecting;
Second, in the case where protecting user data not reveal, it can guarantee the ability of backstage debugging tool orientation problem.
Third, by setting up black and white lists mechanism, when there is unknown data structure and field in user data, Neng Goujin Row shielding.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.It should be evident that the accompanying drawings in the following description is only the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.In the accompanying drawings:
Fig. 1 shows a kind of method flow diagram of data desensitization according to an embodiment of the present invention;
Fig. 2 shows according to an embodiment of the present invention when the data structure includes formula field, to the formula word The method flow diagram of Duan Jinhang desensitization process;
Fig. 3 shows a kind of structural schematic diagram of data desensitization system according to an embodiment of the present invention;
Fig. 4 shows the electronic equipment attachment structure schematic diagram of embodiment according to the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into It is described in detail to one step, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts All other embodiment, shall fall within the protection scope of the present invention.
The term used in embodiments of the present invention is only to be not intended to be limiting merely for for the purpose of describing particular embodiments The present invention.In the embodiment of the present invention and the "an" of singular used in the attached claims, " described " and "the" It is also intended to including most forms, unless the context clearly indicates other meaning, " a variety of " generally comprise at least two.
It should be appreciated that term "and/or" used herein is only a kind of incidence relation for describing affiliated partner, indicate There may be three kinds of relationships, for example, A and/or B, can indicate: individualism A, exist simultaneously A and B, individualism B these three Situation.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".
It will be appreciated that though may be described in embodiments of the present invention using term first, second, third, etc.., But these ... it should not necessarily be limited by these terms.These terms be only used to by ... distinguish.For example, not departing from implementation of the present invention In the case where example range, first ... can also be referred to as second ..., and similarly, second ... can also be referred to as the One ....
Depending on context, word as used in this " if ", " if " can be construed to " ... when " or " when ... " or " in response to determination " or " in response to detection ".Similarly, context is depended on, phrase " if it is determined that " or " such as Fruit detection (condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when detection (statement Condition or event) when " or " in response to detection (condition or event of statement) ".
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability Include, so that commodity or device including a series of elements not only include those elements, but also including not clear The other element listed, or further include for this commodity or the intrinsic element of device.In the feelings not limited more Under condition, the element that is limited by sentence "including a ...", it is not excluded that in the commodity or device for including the element also There are other identical elements.
The alternative embodiment that the invention will now be described in detail with reference to the accompanying drawings.
Embodiment 1
Referring to Fig. 1, the embodiment of the present invention provides a kind of data desensitization method, this method comprises the following steps:
S100 extracts user data in offline database.
Specifically, extracting the user data from the offline database of online service, which includes that user is every The incremental data and current version snapshot generated in secondary a period of time to documents editing.
S110 handles the user data, obtains the incremental data that user generates within a period of time to documents editing And current version snapshot contents.
It operates, obtains specifically, first the above-mentioned steps S100 user data extracted is decrypted, decompresses and serialize etc. To specific structural data;User is separated from the structural data again to generate within a period of time to documents editing Incremental data (Changeset) and current version snapshot contents (Snapshot).
Wherein, the incremental data is that user generates within a period of time to documents editing every time, the incremental number It include different action types (Action) in, the action type is used to show that the documents editing operates the behaviour belonged to Make type.Operated type includes but is not limited to hide, replicate and paste.The version snapshot includes the document in some tool The total data of body time, the version snapshot can show a complete online table.For example, the increasing for arriving n version when 0 It measures on data application to document, the snapshot of a n version will be generated, but in order to save storage space, not be each version Incremental data can all be immediately generated a version snapshot.
S120 judges that the action type in the incremental data whether there is in action type white list.
Specifically, judging the incremental data after carrying out serializing processing to the above-mentioned steps S110 incremental data obtained In include action type whether there is in action type white list.Wherein, the white list is stored by constant dictionary, including All legal action type values existing for existing service.
S130, when the action type in the incremental data is present in action type white list, to the action type The different data structure of corresponding value carries out desensitization process;Wherein, when the data structure includes value, pattern and version snapshot word Duan Shi, the desensitization process include: when the data type of described value, pattern and version snapshot fields is integer type or floating type When, then described value, pattern and version snapshot fields are converted into unified number.
Specifically, be present in the action type white list when the action type, then start to the incremental data into Row desensitization process.Due to there are many different action types in the incremental data, the value of each action type is by different Data structure definition, by judging and assessing comprehensively, the data structure there are sensitive data includes formula (Formula), value (Value), the fields such as pattern (Style), title (Sheet Name), version snapshot (Snapshot).Specifically, working as institute Stating data structure includes value, pattern and when version snapshot fields, judges data belonging to described value, pattern and version snapshot fields Type, then different desensitization process is carried out for different data types.Wherein, the data type includes integer type, floats The general type of vertex type, character string, array ([] interface { }) and other types.The desensitization process to different types of data Method includes the following three types situation:
The first, when the data type of described value, pattern and version snapshot fields is integer type or floating type, then by institute It states value, pattern and version snapshot fields and is converted to unified number.For example, data 123.342 are all converted to 111.111, it will 123123213 are all converted to 111111111.
Second, when the data type of described value, pattern and version snapshot fields is character string, and the character string is not When being internal web page address, the character string is traversed, and corresponding desensitization process is carried out according to the type of the character string.Specifically , the character string is converted by canonical matching rule, and judge the character string after converting whether into intra-company Network address (URL), to carry out corresponding desensitization process.
When the character string is not internal web page address, the character string is traversed, and according to the type of the character string Carry out corresponding desensitization process.Specifically, described carry out corresponding desensitization process according to the type of the character string, comprising: when When the character string is capitalization English letter, the character string is converted to the capitalization of unified fixation;Or, working as the character When string is small English alphabet, the character string is converted to the lowercase of unified fixation;Or, when the character string is Chinese character When, then the character string is converted into " Chinese ";Or, then the character string retains not when the character string is spcial character collection Become;Or, when the character string is other, then by the character string be converted to " unknown data " or character " ";Or, when described When character string is business demand, then specially treated is carried out to the character string.
In another embodiment, when the character string is internal network address, the intranet is taken out according to canonical matching rule User identifier (Token) in location encrypts the user identifier by encryption function and replaces original user mark.The present invention is real It applies in example, judges that the character string whether be the purpose of intra-company's network address is the convenient pass that can be learnt between different product Connection;In another embodiment, whether it is intra-company's network address without distinguishing the character string, unified desensitization rule can be used.
The third, when the data type of described value, pattern and version snapshot fields is the general type of array (map [string] Interface { }) when, then corresponding field value when key value is type in the incremental data is obtained, and according to the field The data type of value carries out desensitization process to the field value.Wherein, according to the type of the field value to the field value into The desensitization process of row recursion, specifically includes:
When the type of the field value includes link or when picture, then will link (Link) belonging to the link and picture, Value field (Value) in picture is converted;Or, then the notice is sent out when the type of the field value includes notice User's unique identification (Token), content of text (Text) and the chain field (Link) of object is sent to convert;Or, when described When the type of field value includes text, then described the text field (Text) is converted;Or, working as the type packet of the field value When including other, then the field value is converted into unknown number (Unknownmsg).Specifically, the mode of the conversion is according to this hair Speak frankly that bright secretary carries to integer type, floating point type, character string, the general type of array ([] interface { }) and other kinds of Conversion regime is converted, and is no longer stated one by one herein.
In another embodiment, when the data type of described value, pattern and version snapshot fields is not the general type of array, by institute The value (Value) stated in the general type of array is converted to unknown data (Unknownmsg).
Further, referring to Fig. 2, the different data structure to the corresponding value of the action type carries out at desensitization Reason, comprising: when the data structure includes formula field, desensitization process is carried out to the formula field, comprising:
S131 obtains the formula name according to canonical matching rule.Wherein, the formula name includes summation, asks flat Mean etc..
S132 judges that the formula name whether there is in existing formula white list.Wherein, the white name of existing formula It singly include that white list mechanism is established to the existing all legal formulas of service.
S133, when the formula name is present in the existing formula white list, according to the canonical matching rule Match the value in the formula.Specifically, the value in the formula is converted according to canonical matching rule.Another embodiment In, when the formula name is not present in the existing formula white list, it is fixed that the primary system one in the formula is set as one Format, such as Sum (1,99) format, to mark this position, there are formula, and do not influence service and leak data.
S134 carries out conversion process to described value according to the data type of described value.Specifically, turning to the step S133 Value after alternatively is converted accordingly according to the data type of described value.The transform mode according to the present invention specification record Integer type, floating point type, character string, the general type of array and other kinds of conversion regime are converted, herein no longer one by one Statement.
Further, the different data structure to the corresponding value of the action type carries out desensitization process, comprising: works as institute When to state data structure include table name field, desensitization process is carried out to the title according to md5 encryption algorithm.Wherein, The md5 encryption algorithm is handled by certain algorithm the file or data that were originally plaintext, is become unreadable One section of code, i.e. generation ciphertext.It can be realized the mesh that protection data are not illegally stolen, read by the md5 encryption algorithm 's.
In another embodiment, when judging that the action type is not present in the action type white list, generate illegal The corresponding value of the incremental data (Value) to remind administrator, and is converted to unknown data by operation log (Unknownmsg), desensitization process terminates at this time.
Data desensitization method provided in an embodiment of the present invention can be realized the reliably protecting of privacy-sensitive data, it is ensured that not let out Reveal the sensitive data of user;By setting up black and white lists mechanism, when there are can when unknown data structure and field for certain data It is shielded;Transformation of data is carried out by unified expansible desensitization rule to sensitive information, had both protected the sensitivity of user hidden Private data, and the ability in data desensitization post debugging tool setting problems will not be weakened.
Embodiment 2
Referring to Fig. 3, the embodiment of the present invention provides a kind of data desensitization system 300, which includes: extraction module 310, processing module 320, judgment module 330 and desensitization module 340.
The extraction module 310, for extracting user data in offline database.Specifically, the extraction module 310 The user data is extracted from the offline database of online service, which includes user every time to the one of documents editing The incremental data and current version snapshot contents generated in the section time.
The processing module 320 obtains user within a period of time to documents editing for handling the user data The incremental data of generation and current version snapshot contents.Specifically, the processing module 320 is first to the extraction module 310 The user data of extraction such as is decrypted, decompresses and serializes at the operation, obtains specific structural data;Again from the structure Change in data and separates the incremental data (Changeset) and current version that user generates within a period of time to documents editing Snapshot contents (Snapshot).
Wherein, the incremental data is that user generates within a bit of time to documents editing every time, the increment It include different action types (Action) in data, the action type is used to show what the documents editing operation belonged to Action type.Operated type includes but is not limited to hide, replicate and paste.The corresponding version snapshot is exactly this document In the total data of some specific time, the version snapshot can show a complete online table.For example, when 0 to n The incremental data of version is applied on document, will generate the snapshot of a n version, but in order to save storage space, is not every The incremental data of a version can all be immediately generated a version snapshot.
The judgment module 330, for judging that it is white in action type that the action type in the incremental data whether there is List.
Specifically, after the incremental data obtained to the processing module 320 carries out serializing processing, the judgment module 330 judge that the action type for including in the incremental data whether there is in action type white list.Wherein, the white list It is stored by constant dictionary, including all legal action type values existing for existing service.
The desensitization module 340, for when the action type in the incremental data is present in white list, to the behaviour The different data structure for making the corresponding value of type carries out desensitization process;Wherein, when the data structure includes value, pattern and version Snapshot fields, and the data type of described value, pattern and version snapshot fields be integer type or floating type when, the desensitization module Described value, pattern and version snapshot fields are converted into unified number.
Specifically, be present in the action type white list when the action type, then the desensitization module 340 starts pair The incremental data carries out desensitization process.Due to there is many different action types, each operation class in the incremental data The value of type is by different data structure definitions, and by judging and assessing comprehensively, the data structure there are sensitive data includes formula (Formula), it is worth the words such as (Value), pattern (Style), title (Sheet Name), version snapshot (Snapshot) Section.Specifically, when the data structure includes value, pattern and when version snapshot fields, described in the desensitization module 340 first judges Data type belonging to value, pattern and version snapshot fields, then different desensitization process is carried out for different data types.Its In, the data type includes integer type, floating point type, character string, the general type of array ([] interface { }) and other classes Type.The desensitization module 340 includes the following three types the desensitization process method of different types of data:
The first, it is described de- when the data type of described value, pattern and version snapshot fields is integer type or floating type Described value, pattern and version snapshot fields are converted to unified number by quick module 340.For example, data 123.342 are all converted It is 111.111, is all converted to 111111111 for 123123213.
Second, when the data type of described value, pattern and version snapshot fields is character string, and the character string is not When being internal web page address, the desensitization module 340 traverses the character string, and is carried out accordingly according to the type of the character string Desensitization process.Specifically, the desensitization module 340 converts the character string by canonical matching rule, and judge Whether the character string after converting is into intra-company's network address (URL), to carry out corresponding desensitization process.
When the character string is not internal web page address, the desensitization module 340 traverses the character string, and according to institute The type for stating character string carries out corresponding desensitization process.Specifically, the desensitization module 340 according to the type of the character string into The rule of the corresponding desensitization process of row includes: that the character string is converted to system when the character string is capitalization English letter One fixed capitalization;Or, the character string is converted to unified fixation when the character string is small English alphabet Lowercase;Or, the character string is then converted to " Chinese " when the character string is Chinese character;Or, when the character string is spy When different character set, then the character string retains constant;Or, then the character string is converted to when the character string is other " unknown data " or character " ";Or, then carrying out specially treated to the character string when the character string is business demand.
In another embodiment, when the character string is internal network address, the desensitization module 340 is according to canonical matching rule The user identifier (Token) in the internal network address is taken out, the user identifier is encrypted by encryption function and replaces original use Family mark.In the embodiment of the present invention, judge that the character string whether be the purpose of intra-company's network address is convenient can learn not With the association between product;In another embodiment, whether it is intra-company's network address without distinguishing the character string, system can be used One desensitization rule.
The third, when the data type of described value, pattern and version snapshot fields is the general type of array (map [string] Interface { }) when, the desensitization module 340 obtains corresponding field value when key value is type in the incremental data, and Desensitization process is carried out to the field value according to the data type of the field value.Wherein, the desensitization module 340 is according to described The type of field value carries out the desensitization process of recursion to the field value, and the specific rule that desensitizes includes:
When the type of the field value includes link or when picture, then will link (Link) belonging to the link and picture, Value field (Value) in picture is converted;Or, then the notice is sent out when the type of the field value includes notice User's unique identification (Token), content of text (Text) and the chain field (Link) of object is sent to convert;Or, when described When the type of field value includes text, then described the text field (Text) is converted;Or, working as the type packet of the field value When including other, then the field value is converted into unknown number (Unknownmsg).Specifically, the mode of the conversion is according to this theory Bright record to integer type, floating point type, character string, the general type of array ([] interface { }) and other kinds of conversion side Formula is converted, and is no longer stated one by one herein.
In another embodiment, when the data type of described value, pattern and version snapshot fields is not the general type (map of array [string] interface { }) when, the value (Value) in the general type of the array is converted to unknown number by the desensitization module 340 According to (Unknownmsg).
Further, the desensitization module 340 is also used to: when the data structure includes formula field, to the formula Field carries out desensitization process.Specifically, the desensitization module 340 further comprises:
Acquisition submodule 341, for obtaining the formula name according to canonical matching rule.Wherein, the formula name Including summation, averaging etc..
List judging submodule 342, for judging that the formula name whether there is in existing formula white list.Its In, the existing formula white list includes establishing white list mechanism to the existing all legal formulas of service.
Matched sub-block 343, for when the formula name is present in the existing formula white list, according to described Canonical matching rule matches the value in the formula.Specifically, the value in the formula is turned according to canonical matching rule Change.In another embodiment, when the formula name is not present in the existing formula white list, the matched sub-block 343 Primary system one in the formula is set as a fixed format, such as Sum (1,99) format, to mark this position there are formula, And do not influence service and leak data.
Transform subblock 344, for carrying out conversion process to described value according to the data type of described value.Specifically, right Value after the matched sub-block 343 is converted is converted accordingly according to the data type of described value.The transform mode root According to description of the invention record to integer type, floating point type, character string, the general type of array and other kinds of conversion regime into Row conversion, is no longer stated one by one herein.
Further, the desensitization module 340 is also used to: when the data structure includes table name field, according to MD5 Encryption Algorithm carries out desensitization process to the title.Wherein, the md5 encryption algorithm be to the file for being originally plaintext or Data are handled by certain algorithm, become one section of unreadable code, i.e. generation ciphertext.It is calculated by the md5 encryption Method can be realized the purpose that protection data are not illegally stolen, read.
In another embodiment, when judging that the action type is not present in the action type white list, the desensitization Module 340 generates illegal operation log, to remind administrator, and the corresponding value of the incremental data (Value) is converted to Unknown data (Unknownmsg), the process that desensitizes at this time terminate.
Data desensitization system provided in an embodiment of the present invention can be realized the reliably protecting of privacy-sensitive data, it is ensured that not let out Reveal the sensitive data of user;By setting up black and white lists mechanism, when there are can when unknown data structure and field for certain data It is shielded;Transformation of data is carried out by unified expansible desensitization rule to sensitive information, had both protected the sensitivity of user hidden Private data, and the ability in data desensitization post debugging tool setting problems will not be weakened.
Embodiment 3
The embodiment of the present disclosure provides a kind of nonvolatile computer storage media, and the computer storage medium is stored with Desensitizing in above-mentioned any means embodiment to data can be performed in computer executable instructions, the computer executable instructions Method.
Embodiment 4
The embodiment of the present disclosure provides a kind of electronic equipment, which is used for data desensitization method, the electronic equipment, packet It includes: at least one processor;And the memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by one processor, and described instruction is by described at least one Manage device execute so that at least one described processor can:
User data is extracted in offline database;
It handles the user data, obtains incremental data that user generates within a period of time to documents editing and current Version snapshot contents;
Judge that the action type in the incremental data whether there is in action type white list;
When the action type in the incremental data is present in the action type white list, to the action type pair The different data structure for the value answered carries out desensitization process;Wherein, when the data structure includes value, pattern and version snapshot fields When, desensitization process is carried out to described value, pattern and version snapshot fields, comprising:
When the data type of described value, pattern and version snapshot fields is integer type or floating type, then by described value, sample Formula and version snapshot fields are converted to unified number.
Embodiment 5
Below with reference to Fig. 4, it illustrates the structural schematic diagrams for the electronic equipment for being suitable for being used to realize the embodiment of the present disclosure.This Terminal device in open embodiment can include but is not limited to such as mobile phone, laptop, digit broadcasting receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as vehicle mounted guidance Terminal) etc. mobile terminal and such as number TV, desktop computer etc. fixed terminal.Electronic equipment shown in Fig. 4 An only example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in figure 4, electronic equipment may include processing unit (such as central processing unit, graphics processor etc.) 401, Random access storage device can be loaded into according to the program being stored in read-only memory (ROM) 402 or from storage device 408 (RAM) program in 403 and execute various movements appropriate and processing.In RAM 403, it is also stored with the behaviour of electronic equipment 400 Various programs and data needed for making.Processing unit 401, ROM 402 and RAM 403 are connected with each other by bus 404.It is defeated Enter/export (I/O) interface 405 and is also connected to bus 404.
In general, following device can connect to I/O interface 405: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 406 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 407 of dynamic device etc.;Storage device 408 including such as tape, hard disk etc.;And communication device 409.Communication device 409, which can permit electronic equipment, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 4, which is shown, to be had respectively The electronic equipment of kind device, it should be understood that being not required for implementing or having all devices shown.It can be alternatively real Apply or have more or fewer devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 409, or from storage device 408 It is mounted, or is mounted from ROM 402.When the computer program is executed by processing unit 401, the embodiment of the present disclosure is executed Method in the above-mentioned function that limits.
It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated, In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable and deposit Any computer-readable medium other than storage media, the computer-readable signal media can send, propagate or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc. are above-mentioned Any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not It is fitted into the electronic equipment.
The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard The mode of part is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions, for example, the One acquiring unit is also described as " obtaining the unit of at least two internet protocol addresses ".

Claims (11)

1. a kind of data desensitization method characterized by comprising
User data is extracted in offline database;
The user data is handled, the incremental data and current version that acquisition user generates within the time to documents editing are fast According to content;
Judge that the action type in the incremental data whether there is in action type white list;
It is corresponding to the action type when the action type in the incremental data is present in the action type white list The different data structure of value carries out desensitization process.
2. the method according to claim 1, wherein when the data structure includes value, pattern and version snapshot When field, desensitization process is carried out to described value, pattern and version snapshot fields, comprising:
When the data type of described value, pattern and version snapshot fields be integer type or floating type when, then by described value, pattern and Version snapshot fields are converted to unified number.
3. according to the method described in claim 2, it is characterized in that, described carry out described value, pattern and version snapshot fields Desensitization process further comprises: when the data type of described value, pattern and version snapshot fields is character string, and the character When string is not internal web page address, the character string is traversed, and corresponding desensitization process is carried out according to the type of the character string.
4. according to the method described in claim 2, it is characterized in that, described carry out described value, pattern and version snapshot fields Desensitization process further comprises: when the data type of described value, pattern and version snapshot fields type general for array, then obtaining Corresponding field value when key value is type in the incremental data, and according to the data type of the field value to the field Value carries out desensitization process.
5. according to the method described in claim 4, it is characterized in that, the data type according to the field value is to the word Segment value carries out desensitization process, comprising:
When the type of the field value includes link or picture, then by the value in link belonging to the link and picture, picture Field is converted;Or,
When the type of the field value includes notice, then corresponding user identifier, content of text and link word are notified by described Duan Jinhang conversion;Or,
When the type of the field value includes text, then described the text field is converted.
6. according to the method described in claim 3, it is characterized in that, the type according to the character string is taken off accordingly Quick processing, comprising:
When the character string is capitalization English letter, the character string is converted to the capitalization of unified fixation;Or,
When the character string is small English alphabet, the character string is converted to the lowercase of unified fixation;Or,
When the character string is Chinese character, then the character string is converted into the Chinese;Or,
When the character string is spcial character collection, then the character string retains constant.
7. the method according to claim 1, wherein the different data to the corresponding value of the action type Structure carries out desensitization process, further comprises: when the data structure includes formula field, taking off to the formula field Quick processing, comprising:
The formula name is obtained according to canonical matching rule;
Judge that the formula name whether there is in existing formula white list;
When the formula name is present in the existing formula white list, the public affairs are matched according to the canonical matching rule Value in formula;
Conversion process is carried out to described value according to the data type of described value.
8. the method according to claim 1, wherein the different data to the corresponding value of the action type Structure carries out desensitization process, further comprises: when the data structure includes table name field, according to md5 encryption algorithm Desensitization process is carried out to the title.
The system 9. a kind of data desensitize characterized by comprising
Extraction module, for extracting user data in offline database;
Processing module obtains the increment that user generates within a period of time to documents editing for handling the user data Data and current version snapshot contents;
Judgment module, for judging that the action type in the incremental data whether there is in action type white list;
Desensitize module, for when the action type in the incremental data is present in the action type white list, to described The different data structure of the corresponding value of action type carries out desensitization process.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed Such as method described in any item of the claim 1 to 8 is realized when device executes.
11. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs, when one or more of programs are by one or more of processing When device executes, so that one or more of processors realize such as method described in any item of the claim 1 to 8.
CN201910628442.3A 2019-07-12 2019-07-12 Data desensitization method, system, medium, and electronic device Active CN110472434B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910628442.3A CN110472434B (en) 2019-07-12 2019-07-12 Data desensitization method, system, medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910628442.3A CN110472434B (en) 2019-07-12 2019-07-12 Data desensitization method, system, medium, and electronic device

Publications (2)

Publication Number Publication Date
CN110472434A true CN110472434A (en) 2019-11-19
CN110472434B CN110472434B (en) 2021-09-14

Family

ID=68508065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910628442.3A Active CN110472434B (en) 2019-07-12 2019-07-12 Data desensitization method, system, medium, and electronic device

Country Status (1)

Country Link
CN (1) CN110472434B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026763A (en) * 2019-12-13 2020-04-17 中国建设银行股份有限公司 Data processing method, device, equipment and storage medium
CN111444538A (en) * 2020-03-25 2020-07-24 北京奇艺世纪科技有限公司 Information desensitization method and device, electronic equipment and storage medium
CN111898340A (en) * 2020-07-30 2020-11-06 北京字节跳动网络技术有限公司 File processing method and device and readable storage medium
CN113360947A (en) * 2021-06-30 2021-09-07 杭州网易再顾科技有限公司 Data desensitization method and device, computer readable storage medium and electronic equipment
WO2022088754A1 (en) * 2020-10-27 2022-05-05 华为技术有限公司 File desensitization method and apparatus, and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132419A1 (en) * 2007-11-15 2009-05-21 Garland Grammer Obfuscating sensitive data while preserving data usability
US20120151597A1 (en) * 2010-12-14 2012-06-14 International Business Machines Corporation De-Identification of Data
CN103226466A (en) * 2013-04-26 2013-07-31 浪潮集团山东通用软件有限公司 Efficient incremental data capturing method
CN105260402A (en) * 2015-09-18 2016-01-20 久盈世纪(北京)科技有限公司 Data management method and apparatus
CN106529329A (en) * 2016-10-11 2017-03-22 中国电子科技网络信息安全有限公司 Desensitization system and desensitization method used for big data
CN106611129A (en) * 2016-12-27 2017-05-03 东华互联宜家数据服务有限公司 Data desensitization method, device and system
CN107315968A (en) * 2017-06-29 2017-11-03 国信优易数据有限公司 A kind of data processing method and equipment
CN107392051A (en) * 2017-07-28 2017-11-24 北京明朝万达科技股份有限公司 A kind of big data processing method and system
CN107992771A (en) * 2017-12-20 2018-05-04 北京明朝万达科技股份有限公司 A kind of data desensitization method and device
CN109460676A (en) * 2018-10-30 2019-03-12 全球能源互联网研究院有限公司 A kind of desensitization method of blended data, desensitization device and desensitization equipment
CN109597843A (en) * 2018-12-19 2019-04-09 北京锐安科技有限公司 Data managing method, device, storage medium and the electronic equipment of big data environment
CN109815742A (en) * 2019-02-22 2019-05-28 蔷薇智慧科技有限公司 Data desensitization method and device
CN109918944A (en) * 2019-03-01 2019-06-21 维沃移动通信有限公司 A kind of information protecting method, device, mobile terminal and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132419A1 (en) * 2007-11-15 2009-05-21 Garland Grammer Obfuscating sensitive data while preserving data usability
US20120151597A1 (en) * 2010-12-14 2012-06-14 International Business Machines Corporation De-Identification of Data
CN103226466A (en) * 2013-04-26 2013-07-31 浪潮集团山东通用软件有限公司 Efficient incremental data capturing method
CN105260402A (en) * 2015-09-18 2016-01-20 久盈世纪(北京)科技有限公司 Data management method and apparatus
CN106529329A (en) * 2016-10-11 2017-03-22 中国电子科技网络信息安全有限公司 Desensitization system and desensitization method used for big data
CN106611129A (en) * 2016-12-27 2017-05-03 东华互联宜家数据服务有限公司 Data desensitization method, device and system
CN107315968A (en) * 2017-06-29 2017-11-03 国信优易数据有限公司 A kind of data processing method and equipment
CN107392051A (en) * 2017-07-28 2017-11-24 北京明朝万达科技股份有限公司 A kind of big data processing method and system
CN107992771A (en) * 2017-12-20 2018-05-04 北京明朝万达科技股份有限公司 A kind of data desensitization method and device
CN109460676A (en) * 2018-10-30 2019-03-12 全球能源互联网研究院有限公司 A kind of desensitization method of blended data, desensitization device and desensitization equipment
CN109597843A (en) * 2018-12-19 2019-04-09 北京锐安科技有限公司 Data managing method, device, storage medium and the electronic equipment of big data environment
CN109815742A (en) * 2019-02-22 2019-05-28 蔷薇智慧科技有限公司 Data desensitization method and device
CN109918944A (en) * 2019-03-01 2019-06-21 维沃移动通信有限公司 A kind of information protecting method, device, mobile terminal and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026763A (en) * 2019-12-13 2020-04-17 中国建设银行股份有限公司 Data processing method, device, equipment and storage medium
CN111444538A (en) * 2020-03-25 2020-07-24 北京奇艺世纪科技有限公司 Information desensitization method and device, electronic equipment and storage medium
CN111444538B (en) * 2020-03-25 2024-04-23 北京奇艺世纪科技有限公司 Information desensitizing method and device, electronic equipment and storage medium
CN111898340A (en) * 2020-07-30 2020-11-06 北京字节跳动网络技术有限公司 File processing method and device and readable storage medium
WO2022088754A1 (en) * 2020-10-27 2022-05-05 华为技术有限公司 File desensitization method and apparatus, and storage medium
CN113360947A (en) * 2021-06-30 2021-09-07 杭州网易再顾科技有限公司 Data desensitization method and device, computer readable storage medium and electronic equipment
CN113360947B (en) * 2021-06-30 2022-07-26 杭州网易再顾科技有限公司 Data desensitization method and device, computer readable storage medium and electronic equipment

Also Published As

Publication number Publication date
CN110472434B (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN110472434A (en) Data desensitization method, system, medium and electronic equipment
CN109361711B (en) Firewall configuration method and device, electronic equipment and computer readable medium
US11025649B1 (en) Systems and methods for malware classification
US11038913B2 (en) Providing context associated with a potential security issue for an analyst
US11159547B2 (en) Malware clustering approaches based on cognitive computing techniques
US20170353481A1 (en) Malware detection by exploiting malware re-composition variations using feature evolutions and confusions
US10541982B1 (en) Techniques for protecting electronic data
JP7120350B2 (en) SECURITY INFORMATION ANALYSIS METHOD, SECURITY INFORMATION ANALYSIS SYSTEM AND PROGRAM
US11580222B2 (en) Automated malware analysis that automatically clusters sandbox reports of similar malware samples
CN109983464B (en) Detecting malicious scripts
US11797668B2 (en) Sample data generation apparatus, sample data generation method, and computer readable medium
CN110084053A (en) Data desensitization method, device, electronic equipment and storage medium
CN111526136A (en) Malicious attack detection method, system, device and medium based on cloud WAF
CN109495513A (en) Unsupervised encryption malicious traffic stream detection method, device, equipment and medium
CN109388551A (en) There are the method for loophole probability, leak detection method, relevant apparatus for prediction code
US20240095289A1 (en) Data enrichment systems and methods for abbreviated domain name classification
US20210224415A1 (en) Privacy Protection Through Template Embedding
CN109522683A (en) Software source tracing method, system, computer equipment and storage medium
EP3675433A1 (en) Email inspection device, email inspection method, and email inspection program
CN113225331A (en) Method, system and device for detecting host intrusion safety based on graph neural network
CN105354506B (en) The method and apparatus of hidden file
EP4266200A1 (en) Generating device, generating method, and generating program
CN113935034A (en) Malicious code family classification method and device based on graph neural network and storage medium
CN117473511B (en) Edge node vulnerability data processing method, device, equipment and storage medium
CN109214212A (en) Information leakage protection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.