CN117951747A - Self-adaptive desensitization method, system, equipment and medium - Google Patents

Self-adaptive desensitization method, system, equipment and medium Download PDF

Info

Publication number
CN117951747A
CN117951747A CN202410350656.XA CN202410350656A CN117951747A CN 117951747 A CN117951747 A CN 117951747A CN 202410350656 A CN202410350656 A CN 202410350656A CN 117951747 A CN117951747 A CN 117951747A
Authority
CN
China
Prior art keywords
sensitive word
desensitization
desensitized
adaptive
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410350656.XA
Other languages
Chinese (zh)
Other versions
CN117951747B (en
Inventor
刘大炜
罗佳丽
刘翔锋
欧阳森山
赵炜煜
王攀
雷霭荻
刘志波
高信
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Aircraft Industrial Group Co Ltd
Original Assignee
Chengdu Aircraft Industrial Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Aircraft Industrial Group Co Ltd filed Critical Chengdu Aircraft Industrial Group Co Ltd
Priority to CN202410350656.XA priority Critical patent/CN117951747B/en
Publication of CN117951747A publication Critical patent/CN117951747A/en
Application granted granted Critical
Publication of CN117951747B publication Critical patent/CN117951747B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention relates to the technical field of data security, in particular to a self-adaptive desensitizing method, a self-adaptive desensitizing system, self-adaptive desensitizing equipment and a self-adaptive desensitizing medium; firstly, acquiring keywords of a current file to be desensitized of a sender user; then adding and deleting the current sensitive word stock according to the keywords to obtain a new sensitive word stock; finally, generating a regular expression according to the new sensitive word stock, positioning the sensitive word position according to the regular expression to obtain a sensitive word, and desensitizing to obtain a desensitized file; the data desensitization of various types is realized, the data types before and after the desensitization are kept unchanged, the security of the desensitized data is ensured, the readability of the desensitized file is ensured, and the desensitization speed is further improved; according to different roles of receiving and transmitting users, the self-adaptive desensitization strength is selected, important information is prevented from being leaked to unreliable personnel, and the defect that the existing desensitization algorithm cannot resist collusion attack and violent enumeration attack is overcome; the sensitive word retrieval and the sensitive word desensitization operation are processed concurrently by multithreading, so that the desensitization speed is greatly increased.

Description

Self-adaptive desensitization method, system, equipment and medium
Technical Field
The invention relates to the technical field of data security, in particular to a self-adaptive desensitizing method, a self-adaptive desensitizing system, self-adaptive desensitizing equipment and a self-adaptive desensitizing medium.
Background
The self-adaptive data desensitization method in the prior art comprises the following steps: step one, data extraction is carried out from a common database, a time sequence database and a file or FTP interface by utilizing the Sqoop technology, and the integrity of original data and the integrity of logical relations among the data are ensured in the data extraction process; step two, data desensitization realizes the conversion of sensitive data through a desensitization algorithm in a data desensitization algorithm library, and the relevance and the integrity of the data are maintained in the desensitization process, so that the consistency of the data in the same system is ensured; and thirdly, after the data distribution data is desensitized, the data is distributed through a database, a file and an FTP interface. The invention takes the application requirement as the guide, and uses the desensitization strategy to drive the dynamic generation method of the desensitization rule, so that the desensitization result is dependent and repeatable. Low use cost and convenient expansion of algorithm and application.
The desensitization method is based on a desensitization rule base, before file transmission, the positions of the sensitive words are searched and positioned in a regular matching mode, and then operations such as cutting, shifting, replacing and the like are carried out on the sensitive fields according to the corresponding rules in the desensitization rule base. For digital type data, the existing operations such as truncation, shift and the like cannot cope with violent enumeration attacks. For character type data, the existing replacement operation lacks the capability of collusion attack resistance, and a plurality of attackers can guess real sensitive fields by jointly examining the desensitized documents.
Disclosure of Invention
Aiming at the problem that the existing data desensitization method cannot resist the attack of the common media and the violent enumeration attack, the invention provides a self-adaptive desensitization method, a system, equipment and a medium, which firstly acquire the keywords of the current file to be desensitized of a sender user; then adding and deleting the current sensitive word stock according to the keywords to obtain a new sensitive word stock; finally, generating a regular expression according to the new sensitive word stock, positioning the sensitive word position according to the regular expression to obtain a sensitive word, and desensitizing to obtain a desensitized file; the method realizes the desensitization of various data, the data types before and after the desensitization are kept unchanged, ensures the security of the desensitized data, ensures the readability of the desensitized file, and further improves the desensitization speed.
The invention has the following specific implementation contents:
an adaptive desensitization method specifically comprises the following steps:
Step S1: acquiring keywords of a current file to be desensitized of a sender user;
Step S2: adding and deleting the current sensitive word stock according to the keywords to obtain a new sensitive word stock;
Step S3: and generating a regular expression according to the new sensitive word stock, positioning the sensitive word position according to the regular expression to obtain a sensitive word, and desensitizing according to the sensitive word to obtain a desensitized file.
To better implement the present invention, further, before the step S1, the adaptive desensitization method includes:
Judging whether the current user is a registered user, if so, judging whether user information input by the current user is matched with user information managed by the background, if so, outputting a successful login popup window, and if not, outputting a user name/password error popup window; if the user is not registered, the output registration interface guides the current user to register.
In order to better implement the present invention, further, in step S2, the following steps are specifically included:
Step S21: adding and deleting the current sensitive word stock according to the keywords to obtain a new sensitive word stock;
step S22: and (3) acquiring a receiver ID input by a sender user, judging whether the receiver ID belongs to a user ID managed in the background, if so, executing step S3, and otherwise, outputting an invalid popup window of the current user ID.
In order to better implement the present invention, further, the step S3 specifically includes the following steps:
step S31: reading a current file to be desensitized in a character stream form to obtain a character string;
Step S32: generating a regular expression according to the new sensitive word stock;
step S33: positioning the position of the sensitive word according to the character string and the regular expression, and storing the position of the sensitive word in a preset file list;
step S34: concurrently circularly scanning a preset file list, and acquiring sensitive words according to the sensitive word positions;
Step S35: converting the sensitive word into a binary string and converting the binary string into a matrix;
step S36: determining the number of rounds of desensitization algorithm iteration according to the type of the sender user and the type of the receiver role;
Step S37: taking the matrix as input of a desensitization algorithm, and performing iterative processing according to the number of rounds to obtain an iterative result secret;
Step S38: replacing the sensitive word position of the current file to be desensitized according to the iteration result secret;
step S39: and repeating the steps S31-S38 until all the sensitive words of the current file to be desensitized are replaced, and obtaining the desensitized file corresponding to the current file to be desensitized.
In order to better implement the present invention, further, the specific operation of step S31 is as follows: reading the suffix of the current file to be desensitized, analyzing the input stream according to BufferReader if the suffix of the current file to be desensitized is txt, calling WordExtractor class analysis input stream of the poi library if the suffix of the current file to be desensitized is doc/docx, and reading the analysis input stream result into a character string s in a row unit to obtain the character string s.
In order to better implement the present invention, further, the step S37 specifically includes the following steps:
step S371: circularly shifting each row of elements left by a specific bit number in a row unit of the matrix to obtain a shifted matrix;
Step S372: multiplying the shifted matrix with the set eigenvalue matrix to obtain an output matrix;
Step S373: and returning the output matrix to the step S371 by taking the output matrix as the input of the desensitization algorithm until the round number circulation is completed, and taking the output matrix of the last round as the iteration result secret.
In order to better implement the present invention, further, the eigenvalue matrix set in step S372 is a matrix in which each column is added to be 1.
In order to better implement the present invention, further, the sensitive word positions in step S33 include a sensitive word start position and a sensitive word end position.
Based on the self-adaptive desensitization method, in order to better realize the invention, a self-adaptive desensitization system is further provided, which comprises an acquisition unit, an adding and deleting unit and a desensitization unit;
The acquiring unit is used for acquiring keywords of the current file to be desensitized of the sender user;
The adding, deleting and checking unit is used for adding, deleting and checking the current sensitive word stock according to the keywords to obtain a new sensitive word stock;
the desensitization unit is used for generating a regular expression according to the new sensitive word stock, positioning the sensitive word position according to the regular expression to obtain a sensitive word, and desensitizing according to the sensitive word to obtain a desensitized file.
Based on the above-mentioned self-adaptive desensitization method, in order to better implement the present invention, further, an electronic device is proposed, including a memory and a processor; the memory has a computer program stored thereon; the above described adaptive desensitisation method is implemented when the computer program is executed on the processor.
Based on the above-mentioned self-adaptive desensitization method, in order to better implement the present invention, further, a computer readable storage medium is provided, on which computer instructions are stored; the above-described adaptive desensitization method is implemented when the computer instructions are executed on the above-described electronic device.
The invention has the following beneficial effects:
(1) The invention is simultaneously suitable for the desensitization of various types of data, the data types before and after the desensitization are kept unchanged, the security of the desensitization of the data is ensured, and the readability of the desensitized file is ensured.
(2) According to different roles of receiving and transmitting users, the invention adaptively selects the desensitization intensity, prevents important information from being leaked to unreliable personnel, and solves the defect that the existing desensitization algorithm cannot resist collusion attack and violent enumeration attack.
(3) The invention adopts multithread concurrent processing sensitive word retrieval and sensitive word desensitization operation, thereby greatly accelerating the desensitization speed.
Drawings
FIG. 1 is a schematic block diagram of a process for adaptive desensitization provided by the present invention.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only some embodiments of the present invention, but not all embodiments, and therefore should not be considered as limiting the scope of protection. All other embodiments, which are obtained by a worker of ordinary skill in the art without creative efforts, are within the protection scope of the present invention based on the embodiments of the present invention.
In the description of the present invention, it should be noted that, unless explicitly stated and limited otherwise, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; or may be directly connected, or may be indirectly connected through an intermediate medium, or may be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Example 1:
The embodiment provides a self-adaptive desensitizing method, which specifically comprises the following steps:
Before proceeding to step S1, it includes:
Judging whether the current user is a registered user, if so, judging whether user information input by the current user is matched with user information managed by the background, if so, outputting a successful login popup window, and if not, outputting a user name/password error popup window; if the user is not registered, the output registration interface guides the current user to register.
Step S1: and acquiring keywords of the current file to be desensitized of the sender user.
Step S2: and adding, deleting and changing the current sensitive word stock according to the keywords to obtain a new sensitive word stock.
The step S2 specifically includes the following steps:
Step S21: adding and deleting the current sensitive word stock according to the keywords to obtain a new sensitive word stock;
step S22: and (3) acquiring a receiver ID input by a sender user, judging whether the receiver ID belongs to a user ID managed in the background, if so, executing step S3, and otherwise, outputting an invalid popup window of the current user ID.
Step S3: and generating a regular expression according to the new sensitive word stock, positioning the sensitive word position according to the regular expression to obtain a sensitive word, and desensitizing according to the sensitive word to obtain a desensitized file.
The step S3 specifically comprises the following steps:
Step S31: and reading the current file to be desensitized in a character stream form to obtain a character string.
Further, the specific operation of step S31 is as follows: reading the suffix of the current file to be desensitized, analyzing the input stream according to BufferReader if the suffix of the current file to be desensitized is txt, calling WordExtractor class analysis input stream of the poi library if the suffix of the current file to be desensitized is doc/docx, and reading the analysis input stream result into a character string s in a row unit to obtain the character string s.
Step S32: and generating a regular expression according to the new sensitive word stock.
Step S33: and positioning the position of the sensitive word according to the character string and the regular expression, and storing the position in a preset file list.
Further, the sensitive word positions in step S33 include a sensitive word start position and a sensitive word end position.
Step S34: and concurrently circularly scanning a preset file list, and acquiring the sensitive words according to the sensitive word positions.
Step S35: converting the sensitive word into a binary string and converting the binary string into a matrix.
Step S36: and determining the number of rounds of desensitization algorithm iteration according to the type of the sender user and the type of the receiver role.
Step S37: and taking the matrix as input of a desensitization algorithm, and performing iterative processing according to the number of rounds to obtain an iterative result secret.
Further, the step S37 specifically includes the following steps:
step S371: circularly shifting each row of elements left by a specific bit number in a row unit of the matrix to obtain a shifted matrix;
Step S372: and multiplying the shifted matrix with the set eigenvalue matrix to obtain an output matrix.
Further, the set eigenvalue matrix in step S372 is a matrix with 1 added for each column.
Step S373: and returning the output matrix to the step S371 by taking the output matrix as the input of the desensitization algorithm until the round number circulation is completed, and taking the output matrix of the last round as the iteration result secret.
Step S38: and replacing the sensitive word position of the current file to be desensitized according to the iteration result secret.
Step S39: and repeating the steps S31-S38 until all the sensitive words of the current file to be desensitized are replaced, and obtaining the desensitized file corresponding to the current file to be desensitized.
Working principle: firstly, acquiring keywords of a current file to be desensitized of a sender user; then adding and deleting the current sensitive word stock according to the keywords to obtain a new sensitive word stock; finally, generating a regular expression according to the new sensitive word stock, positioning the sensitive word position according to the regular expression to obtain a sensitive word, and desensitizing to obtain a desensitized file; the method realizes the desensitization of various data, the data types before and after the desensitization are kept unchanged, ensures the security of the desensitized data, ensures the readability of the desensitized file, and further improves the desensitization speed.
Example 2:
On the basis of the above embodiment 1, as shown in fig. 1, this embodiment specifically includes the following steps:
step S1: acquiring related keywords of a current file to be desensitized in a current sender user account;
Step S2: adding, deleting and modifying the current sensitive word stock according to the related keywords to obtain a new sensitive word stock;
Step S3: and performing desensitization operation on the current file to be desensitized based on the new sensitive word stock to obtain a desensitized file corresponding to the current file to be desensitized.
Optionally, before the step S1, the adaptive desensitizing method includes:
judging whether the current user is a registered user or not, if so:
Comparing whether the user information input by the current user and the user information managed by the background are matched, if so, outputting a successful login popup window, otherwise, outputting a user name/password error popup window;
otherwise, outputting a registration interface to guide the current user to register.
Optionally, between the step S2 and the step S3, the adaptive desensitizing method further includes:
acquiring a receiver id input by the sender user;
and judging whether the receiver id belongs to the user id managed in the background, if so, entering a step S3, otherwise, outputting an invalid popup window of the current user id.
Optionally, the step S3 includes:
Step S31: reading the current file to be desensitized in a character stream form to obtain a character string;
step S32: generating a regular expression according to the new sensitive word stock;
Step S33: positioning the position of a sensitive word according to the regular expression and the character string, and storing the position of the sensitive word in a preset file list, wherein the position of the sensitive word comprises a starting position and a stopping position;
step S34: concurrently and circularly scanning data in a preset file list, and positioning and acquiring sensitive words according to the starting position and the ending position;
Step S35: converting the sensitive word into a 32-bit 16-ary string, and converting the 32-bit 16-ary string into a 4*4-column matrix, wherein each element in the matrix is a 2-bit 16-ary number;
Step S36: determining the number of rounds of desensitization algorithm iteration according to the type of the sender user and the type of the receiver role;
step S37: taking the 4*4-column matrix as the input of the desensitization algorithm, and performing iterative processing according to the number of rounds to obtain an iterative result secret;
Step S38: replacing the position of the sensitive word in the current file to be desensitized by using the iteration result secret;
step S39: and repeating the steps S31-S38 until all the sensitive words in the file to be desensitized are replaced, and obtaining the desensitized file corresponding to the current file to be desensitized.
Optionally, the step S31 includes: analyzing an input stream by utilizing BufferReader for a file with a suffix txt in the current file to be desensitized, analyzing the input stream by utilizing WordExtractor types of a poi library for the file with a suffix doc/docx, and reading an analysis input stream result into a character string s in a row unit, namely:
Where InputStream is the input stream, bufferReader, wordExtractor is the parser, and s is the string.
Optionally, the step S32 includes:
Where regex is a regular expression, s is a string, start is the start position of the sensitive word, and end is the end position of the sensitive word.
Optionally, the step S37 includes:
Step S371: according to the row unit of the 4*4 columns of matrixes, each row of elements circularly shifts left by a specific bit number to obtain shifted matrixes;
step S372: multiplying the shifted matrix and the eigenvalue matrix to obtain an output matrix;
Step S373: and taking the output matrix as the matrix of 4*4 columns and returning to the step S371 until the cycle of the round number is completed, and taking the output matrix of the last round as the iteration result secret.
Optionally, in the step S372, the eigenvalue matrix is a matrix with 1 added in each column.
Working principle: the embodiment can be simultaneously suitable for desensitizing Chinese, english and digital type data, the data types before and after desensitization are kept unchanged, the security is ensured, and the readability of the file is ensured; according to different roles of receiving and transmitting users, the system can adaptively select the desensitization intensity to prevent important information from leaking to untrusted people; the defect that the prior desensitization algorithm cannot resist collusion attack and violent enumeration attack is overcome; the sensitive word retrieval and the sensitive word desensitization operation are processed concurrently by multithreading, so that the desensitization speed is greatly increased.
Other portions of this embodiment are the same as those of embodiment 1 described above, and thus will not be described again.
Example 3:
This embodiment is described in detail with reference to one specific example, as shown in fig. 1, based on any one of the above embodiments 1 to 2.
As shown in fig. 1, the adaptive desensitization method includes:
step S1: acquiring related keywords of a current file to be desensitized in a current sender user account;
before that, it is first determined whether the current user is a registered user, if so:
Comparing whether the user information input by the current user and the user information managed by the background are matched, if so, outputting a successful login popup window, otherwise, outputting a user name/password error popup window;
otherwise, outputting a registration interface to guide the current user to register.
S2: adding, deleting and modifying the current sensitive word stock according to the related keywords to obtain a new sensitive word stock;
In this embodiment, the user may create a plurality of sensitive word banks, so that the operation of adding, deleting and modifying the current sensitive word is not limited to the operation of adding a new sensitive word bank to the original sensitive word bank. But the sensitive word stock in the current sender user account is not visible and not operable in other user accounts.
In addition, between the step S2 and the step S3, the adaptive desensitizing method further includes:
acquiring a receiver id input by the sender user;
and judging whether the receiver id belongs to the user id managed in the background, if so, entering a step S3, otherwise, outputting an invalid popup window of the current user id.
Step S3: and performing desensitization operation on the current file to be desensitized based on the new sensitive word stock to obtain a desensitized file corresponding to the current file to be desensitized.
Optionally, the step S3 includes:
Step S31: reading the current file to be desensitized in a character stream form to obtain a character string;
Analyzing an input stream by utilizing BufferReader for a file with a suffix txt in the current file to be desensitized, analyzing the input stream by utilizing WordExtractor types of a poi library for the file with a suffix doc/docx, and reading an analysis input stream result into a character string s in a row unit, namely:
Where InputStream is the input stream, bufferReader, wordExtractor is the parser, and s is the string.
Step S32: generating a regular expression according to the new sensitive word stock;
The regular expression generation modes corresponding to English sensitive words and Chinese sensitive words are different, the former words are separated by spaces, in the process of searching the sensitive words, the invention only pays attention to the sensitive fields in the true sense, so that a word boundary, namely the position between a word and a space is matched by using \b, otherwise, the search result may display that the internal characters of one word are the sensitive fields. In contrast, the latter does not have spaces as boundaries, so regular expressions can directly use the original string, with specific reference to table 1.
Table 1 regular expression generation example table
Step S33: positioning the position of a sensitive word according to the regular expression and the character string, and storing the position of the sensitive word in a preset file list, wherein the position of the sensitive word comprises a starting position and a stopping position; namely:
Where regex is a regular expression, s is a string, start is the start position of the sensitive word, and end is the end position of the sensitive word.
As an example, the sensitive word position is saved to an arranlist list named wordloc, and each element in wordloc is an array of length 2 describing the start address and the end address of the string.
Step S34: concurrently and circularly scanning data in a preset file list, and positioning and acquiring sensitive words according to the starting position and the ending position;
step S35: converting the sensitive word into a 32-bit 16-ary string,
In the invention, a single English letter occupies one byte, namely two hexadecimal bits, and a single Chinese character occupies two bytes, namely four hexadecimal bits;
Where word is the original sensitive content located and code16 is a 16-ary string;
converting the 32-bit 16-system string into a 4*4-column matrix, wherein each element in the matrix is a 2-bit 16-system number;
Where TRA is a 4*4-column byte matrix generated after conversion.
S36: according to the type of the user of the sender and the role type of the receiver, determining the iteration round number of the desensitization algorithm, wherein the higher the circulation number is, the lower the possibility of being cracked is;
The number of rounds of the algorithm corresponding to the different user roles is shown in table 2.
Where row_a is the Role information of the sender, row_b is the Role information of the receiver, and round is the number of rounds of algorithm iteration.
Table 2 iteration round table corresponding to user roles
Step S37: taking the 4*4-column matrix as the input of the desensitization algorithm, and performing iterative processing according to the number of rounds to obtain an iterative result secret; that is, TRA is taken as input, the desensitization algorithm performs round operation on the matrix, and the calculation result of each round is taken as input of the next round.
The step S37 includes:
Step S371: according to the row unit of the 4*4 columns of matrixes, each row of elements circularly shifts left by a specific bit number to obtain shifted matrixes;
step S372: multiplying the shifted matrix and the eigenvalue matrix to obtain an output matrix;
The eigenvalue matrix is a Feature matrix and each column adds to a matrix of 1. Let F0, F1, F2, F3 be the four elements of the first column of the Feature matrix.
Wherein, the random function returns a random decimal of two digits after the decimal point of more than 0 and less than the specified number interval. The matrix generation algorithm first generates the element S1 of the first row of the first column, next generates S2 from the value of S1, and so on. The Feature matrix for each round needs to be regenerated.
The matrix multiplication uses M_TRA matrix and Feature matrix to carry out multiplication operation, each column of Feature matrix is added to be 1, as the range of digital intervals after Chinese, english and digital are converted into 16 system is different and specific, and the system processes sensitive words of different types such as Chinese, english and digital separately, the sum of columns is1, so that the attribute of characters after multiplication operation is not changed, for example, one digital character is not converted into characters, and one Chinese character is not converted into English character.
The output matrix TRB of each round is used as input for the next round to perform a cyclic operation.
Step S373: and taking the output matrix as the matrix of 4*4 columns and returning to the step S371 until the cycle of the round number is completed, and taking the output matrix of the last round as the iteration result secret.
Step S38: replacing the position of the sensitive word in the current file to be desensitized by using the iteration result secret;
step S39: and repeating the steps S31-S38 until all the sensitive words in the file to be desensitized are replaced, and obtaining the desensitized file corresponding to the current file to be desensitized.
Other portions of this embodiment are the same as any of embodiments 1 to 2, and thus will not be described again.
Example 4:
the embodiment proposes an adaptive desensitization system based on any one of the above embodiments 1 to 3, including an acquisition unit, an add-delete-modify unit, and a desensitization unit;
The acquiring unit is used for acquiring keywords of the current file to be desensitized of the sender user;
The adding, deleting and checking unit is used for adding, deleting and checking the current sensitive word stock according to the keywords to obtain a new sensitive word stock;
the desensitization unit is used for generating a regular expression according to the new sensitive word stock, positioning the sensitive word position according to the regular expression to obtain a sensitive word, and desensitizing according to the sensitive word to obtain a desensitized file.
The embodiment also provides electronic equipment, which comprises a memory and a processor; the memory has a computer program stored thereon; the above described adaptive desensitisation method is implemented when the computer program is executed on the processor.
The present embodiment also proposes a computer-readable storage medium having stored thereon computer instructions; the above-described adaptive desensitization method is implemented when the computer instructions are executed on the above-described electronic device.
Other portions of this embodiment are the same as any of embodiments 1 to 3, and thus will not be described again.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent variation, etc. of the above embodiment according to the technical matter of the present invention fall within the scope of the present invention.

Claims (11)

1. The self-adaptive desensitization method is characterized by comprising the following steps of:
Step S1: acquiring keywords of a current file to be desensitized of a sender user;
Step S2: adding and deleting the current sensitive word stock according to the keywords to obtain a new sensitive word stock;
Step S3: and generating a regular expression according to the new sensitive word stock, positioning the sensitive word position according to the regular expression to obtain a sensitive word, and desensitizing according to the sensitive word to obtain a desensitized file.
2. An adaptive desensitization method according to claim 1, wherein prior to said step S1, said adaptive desensitization method comprises:
Judging whether the current user is a registered user, if so, judging whether user information input by the current user is matched with user information managed by the background, if so, outputting a login success popup, and if not, outputting a user name error popup or a password error popup; if the user is not registered, the output registration interface guides the current user to register.
3. An adaptive desensitization method according to claim 1, characterized in that in said step S2, it comprises in particular the steps of:
Step S21: adding and deleting the current sensitive word stock according to the keywords to obtain a new sensitive word stock;
step S22: and (3) acquiring a receiver ID input by a sender user, judging whether the receiver ID belongs to a user ID managed in the background, if so, executing step S3, and otherwise, outputting an invalid popup window of the current user ID.
4. An adaptive desensitizing method according to claim 3, wherein said step S3 comprises the steps of:
step S31: reading a current file to be desensitized in a character stream form to obtain a character string;
Step S32: generating a regular expression according to the new sensitive word stock;
step S33: positioning the position of the sensitive word according to the character string and the regular expression, and storing the position of the sensitive word in a preset file list;
step S34: concurrently circularly scanning a preset file list, and acquiring sensitive words according to the sensitive word positions;
Step S35: converting the sensitive word into a binary string and converting the binary string into a matrix;
step S36: determining the number of rounds of desensitization algorithm iteration according to the type of the sender user and the type of the receiver role;
Step S37: taking the matrix as input of a desensitization algorithm, and performing iterative processing according to the number of rounds to obtain an iterative result secret;
Step S38: replacing the sensitive word position of the current file to be desensitized according to the iteration result secret;
step S39: and repeating the steps S31-S38 until all the sensitive words of the current file to be desensitized are replaced, and obtaining the desensitized file corresponding to the current file to be desensitized.
5. The adaptive desensitizing method according to claim 4, wherein said step S31 is specifically performed by: reading the suffix of the current file to be desensitized, analyzing the input stream according to BufferReader if the suffix of the current file to be desensitized is txt, calling WordExtractor class analysis input stream of the poi library if the suffix of the current file to be desensitized is doc or docx, and reading the analysis input stream result into a character string s in a row unit to obtain the character string s.
6. The adaptive desensitizing method according to claim 4, wherein said step S37 comprises the steps of:
step S371: circularly shifting each row of elements left by a specific bit number in a row unit of the matrix to obtain a shifted matrix;
Step S372: multiplying the shifted matrix with the set eigenvalue matrix to obtain an output matrix;
Step S373: and returning the output matrix to the step S371 by taking the output matrix as the input of the desensitization algorithm until the round number circulation is completed, and taking the output matrix of the last round as the iteration result secret.
7. The adaptive desensitizing method according to claim 6, wherein said set eigenvalue matrix in step S372 is a matrix with 1 for each column addition.
8. The adaptive desensitizing method according to claim 4, wherein said sensitive word positions in step S33 include a sensitive word start position and a sensitive word end position.
9. The self-adaptive desensitization system is characterized by comprising an acquisition unit, an adding and deleting unit and a desensitization unit;
The acquiring unit is used for acquiring keywords of the current file to be desensitized of the sender user;
The adding, deleting and checking unit is used for adding, deleting and checking the current sensitive word stock according to the keywords to obtain a new sensitive word stock;
the desensitization unit is used for generating a regular expression according to the new sensitive word stock, positioning the sensitive word position according to the regular expression to obtain a sensitive word, and desensitizing according to the sensitive word to obtain a desensitized file.
10. An electronic device comprising a memory and a processor; the memory has a computer program stored thereon; the computer program, when executed on the processor, implements the adaptive desensitization method according to any one of claims 1-8.
11. A computer-readable storage medium having stored thereon computer instructions; the adaptive desensitization method according to any one of claims 1-8, when said computer instructions are executed on an electronic device according to claim 10.
CN202410350656.XA 2024-03-26 2024-03-26 Self-adaptive desensitization method, system, equipment and medium Active CN117951747B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410350656.XA CN117951747B (en) 2024-03-26 2024-03-26 Self-adaptive desensitization method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410350656.XA CN117951747B (en) 2024-03-26 2024-03-26 Self-adaptive desensitization method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN117951747A true CN117951747A (en) 2024-04-30
CN117951747B CN117951747B (en) 2024-07-12

Family

ID=90803358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410350656.XA Active CN117951747B (en) 2024-03-26 2024-03-26 Self-adaptive desensitization method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN117951747B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007081519A2 (en) * 2005-12-30 2007-07-19 Steven Kays Genius adaptive design
US20120197896A1 (en) * 2008-02-25 2012-08-02 Georgetown University System and method for detecting, collecting, analyzing, and communicating event-related information
CN108197163A (en) * 2017-12-14 2018-06-22 上海银江智慧智能化技术有限公司 A kind of structuring processing method based on judgement document
CN108717408A (en) * 2018-05-11 2018-10-30 杭州排列科技有限公司 A kind of sensitive word method for real-time monitoring, electronic equipment, storage medium and system
CN110633577A (en) * 2019-08-22 2019-12-31 阿里巴巴集团控股有限公司 Text desensitization method and device
CN110782346A (en) * 2019-10-09 2020-02-11 山东科技大学 Intelligent contract classification method based on keyword feature extraction and attention
CN112001174A (en) * 2020-08-10 2020-11-27 深圳中兴网信科技有限公司 Text desensitization method, apparatus, electronic device and computer-readable storage medium
CN113722758A (en) * 2021-08-31 2021-11-30 平安科技(深圳)有限公司 Log desensitization method and device, computer equipment and storage medium
CN115168345A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium
CN115952854A (en) * 2023-03-14 2023-04-11 杭州太美星程医药科技有限公司 Training method of text desensitization model, text desensitization method and application
CN116049884A (en) * 2023-01-17 2023-05-02 三江学院 Data desensitization method, system and medium based on role access control
CN116484420A (en) * 2023-04-19 2023-07-25 中国邮政储蓄银行股份有限公司 Text desensitization processing method and device
CN116610772A (en) * 2023-05-05 2023-08-18 中国工商银行股份有限公司 Data processing method, device and server
WO2024046081A1 (en) * 2022-08-30 2024-03-07 华为技术有限公司 Information recommendation method, electronic device, and server

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007081519A2 (en) * 2005-12-30 2007-07-19 Steven Kays Genius adaptive design
US20120197896A1 (en) * 2008-02-25 2012-08-02 Georgetown University System and method for detecting, collecting, analyzing, and communicating event-related information
CN108197163A (en) * 2017-12-14 2018-06-22 上海银江智慧智能化技术有限公司 A kind of structuring processing method based on judgement document
CN108717408A (en) * 2018-05-11 2018-10-30 杭州排列科技有限公司 A kind of sensitive word method for real-time monitoring, electronic equipment, storage medium and system
CN110633577A (en) * 2019-08-22 2019-12-31 阿里巴巴集团控股有限公司 Text desensitization method and device
CN110782346A (en) * 2019-10-09 2020-02-11 山东科技大学 Intelligent contract classification method based on keyword feature extraction and attention
CN112001174A (en) * 2020-08-10 2020-11-27 深圳中兴网信科技有限公司 Text desensitization method, apparatus, electronic device and computer-readable storage medium
CN113722758A (en) * 2021-08-31 2021-11-30 平安科技(深圳)有限公司 Log desensitization method and device, computer equipment and storage medium
CN115168345A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium
WO2024046081A1 (en) * 2022-08-30 2024-03-07 华为技术有限公司 Information recommendation method, electronic device, and server
CN116049884A (en) * 2023-01-17 2023-05-02 三江学院 Data desensitization method, system and medium based on role access control
CN115952854A (en) * 2023-03-14 2023-04-11 杭州太美星程医药科技有限公司 Training method of text desensitization model, text desensitization method and application
CN116484420A (en) * 2023-04-19 2023-07-25 中国邮政储蓄银行股份有限公司 Text desensitization processing method and device
CN116610772A (en) * 2023-05-05 2023-08-18 中国工商银行股份有限公司 Data processing method, device and server

Also Published As

Publication number Publication date
CN117951747B (en) 2024-07-12

Similar Documents

Publication Publication Date Title
US7013304B1 (en) Method for locating digital information files
US6751607B2 (en) System and method for the identification of latent relationships amongst data elements in large databases
CN110110163A (en) Safe substring search is with filtering enciphered data
EP3292481B1 (en) Method, system and computer program product for performing numeric searches
Gakhov Probabilistic data structures and algorithms for big data applications
Regéciová et al. Pattern Matching in YARA: Improved Aho-Corasick Algorithm
CN117951747B (en) Self-adaptive desensitization method, system, equipment and medium
Panigrahy et al. A geometric approach to lower bounds for approximate near-neighbor search and partial match
KR20010033096A (en) Management in data structures
CN115310436A (en) Document outline extraction method and device, electronic equipment and storage medium
CN116126997B (en) Document deduplication storage method, system, device and storage medium
Moataz et al. Substring search over encrypted data
CN113760894A (en) Data calling method and device, electronic equipment and storage medium
CN115793992B (en) Data storage method, device, electronic equipment and readable storage medium
CN116628759A (en) MNSS platform communication Cookie data blurring method and data management method
CN113065419B (en) Pattern matching algorithm and system based on flow high-frequency content
CN115577374A (en) Encryption fusion storage method, device and medium based on MD5
CN113360522B (en) Method and device for rapidly identifying sensitive data
US20220092157A1 (en) Digital watermarking for textual data
Tseng et al. A fast scalable automaton-matching accelerator for embedded content processors
CN111310088B (en) Page rendering method and device
CN113849538A (en) Intelligent extraction method and system based on fuzzy search multiple options
KR19990068397A (en) The e-mail searching method by telephone number.
CN105608122A (en) Method and apparatus for storing electronic form data
Grossi Notes Accompanying Today’s Class in Algorithm Design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant