WO2023109066A1 - Data processing method and apparatus, device, and storage medium - Google Patents

Data processing method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2023109066A1
WO2023109066A1 PCT/CN2022/100534 CN2022100534W WO2023109066A1 WO 2023109066 A1 WO2023109066 A1 WO 2023109066A1 CN 2022100534 W CN2022100534 W CN 2022100534W WO 2023109066 A1 WO2023109066 A1 WO 2023109066A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
array
parameter
mask
data processing
Prior art date
Application number
PCT/CN2022/100534
Other languages
French (fr)
Chinese (zh)
Inventor
康志鹏
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2023109066A1 publication Critical patent/WO2023109066A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Definitions

  • This application relates to the technical field of data processing, involving but not limited to data processing methods, devices, equipment and storage media.
  • the general processing method is: through traversal, log information including the first mask field is searched in a plurality of log information as the data to be masked. In this way, when masking the log information, the amount of data processing is relatively large, and the processing efficiency is low.
  • the present application provides a data processing method, device, device, and storage medium, which reduce the amount of data processing and improve processing efficiency when masking log information.
  • the present application provides a data processing method, the method comprising: in a configuration file, determining a first mask field and a first mask type; the first mask type is the first mask field to which the first mask field belongs mask type;
  • the first array includes at least three second numerical values sorted by numerical value; one of the second numerical values is used to point to a log information; the The first value is used to point to the first mask field;
  • the present application provides a data processing device, the device comprising:
  • the first determining unit is configured to determine a first mask field and a first mask type in the configuration file; the first mask type is the mask type to which the first mask field belongs;
  • An obtaining unit configured to obtain a first array and a first value for the first mask field; the first array includes at least three second values sorted by value; one second value is used to point to A log message; the first value is used to point to the first mask field;
  • a search unit configured to search for a target second value identical to the first value in the first array
  • the second determination unit is configured to determine the log information pointed to by the target second value as the data to be masked
  • the executing unit is configured to execute a masking operation corresponding to the first masking type on the data to be masked.
  • the present application also provides an electronic device, including: a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the above data processing method when executing the program.
  • the present application also provides a storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above data processing method is realized.
  • the data processing method, device, device, and storage medium provided by the present application include: in the configuration file, determining the first mask field and the first mask type; the first mask type is the first mask The mask type to which the field belongs; a first array and a first numerical value for the first mask field are obtained; the first array includes at least three second numerical values sorted according to the magnitude of the numerical values; one second numerical value is used is used to point to a log information; the first value is used to point to the first mask field; in the first array, the second value of the target that is the same as the first value is searched; the second value of the target is The log information pointed to by the value is determined as the data to be masked; and the masking operation corresponding to the first mask type is performed on the data to be masked.
  • the first numerical value corresponding to the first mask field and the second numerical value corresponding to the multiple log information are obtained; the process of finding the data to be masked in the multiple log information is converted into The process of searching for the first numerical value in the first array composed of two numerical values; because the data processing amount of the numerical value search process is small, the scheme of the present application reduces the data processing amount and improves the processing when masking the log information. efficiency.
  • FIG. 1 is an optional structural schematic diagram of a data processing system provided in an embodiment of the present application
  • FIG. 2 is an optional schematic flow chart of the data processing method provided by the embodiment of the present application.
  • FIG. 3 is an optional schematic flowchart of a data processing method provided in an embodiment of the present application.
  • FIG. 4 is an optional schematic flowchart of a data processing method provided in an embodiment of the present application.
  • FIG. 5 is an optional schematic flowchart of a data processing method provided in an embodiment of the present application.
  • FIG. 6 is an optional schematic diagram of a masked log file provided in an embodiment of the present application.
  • FIG. 7 is an optional structural schematic diagram of a data processing device provided in an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an optional electronic device provided in an embodiment of the present application.
  • first ⁇ second ⁇ third is used as an example to distinguish different objects, and does not represent a specific order for the objects, and does not have a limitation on the sequence. It can be understood that “first ⁇ second ⁇ third” can be interchanged in a specific order or sequential order if allowed, so that the embodiments of the application described here can be used in a manner other than what is illustrated or described here implemented sequentially.
  • Embodiments of the present application may provide a data processing method, device, device, and storage medium.
  • the data processing method can be realized by a data processing device, and each functional entity in the data processing device can be composed of hardware resources of electronic equipment, such as computing resources such as processors, and communication resources (such as used to support the realization of optical cables, cellular, etc. mode of communication) collaborative implementation.
  • the data processing method provided in the embodiment of the present application is applied to a data processing system, and the data processing system includes a data processing terminal.
  • the data processing terminal is used to execute: in the configuration file, determine the first mask field and the first mask type; the first mask type is the mask type to which the first mask field belongs; obtain the first array and the first numerical value for the first mask field; the first array includes at least three second numerical values sorted by numerical value; one of the second numerical values is used to point to a log information; the first numerical value It is used to point to the first mask field; in the first array, search for a target second value that is the same as the first value; determine the log information pointed to by the target second value as to be masked data; performing a mask operation corresponding to the first mask type on the data to be masked.
  • the data processing system may also include a client.
  • the client is used to collect log information, and send the collected log information to the data processing end for processing.
  • the structure of the data processing system may be as shown in FIG. 1 , including: a data processing terminal 10 and a client terminal 20 .
  • the data processing terminal 10 and the client terminal 20 can communicate through the network 30 .
  • the data processing terminal 10 is used to execute: in the configuration file, determine the first mask field and the first mask type; the first mask type is the mask type to which the first mask field belongs; obtain A first array and a first value for the first mask field; the first array includes at least three second values sorted by value; one of the second values is used to point to a log information; the The first value is used to point to the first mask field; in the first array, search for the same target second value as the first value; determine the log information pointed to by the target second value as Data to be masked; performing a masking operation corresponding to the first mask type on the data to be masked.
  • the data processing terminal 10 may include a physical machine (such as a server, etc.), or a virtual machine (such as a cloud platform, etc.).
  • a physical machine such as a server, etc.
  • a virtual machine such as a cloud platform, etc.
  • the client 20 is used to collect log information, and send the collected log information to the data processing end for processing.
  • the client 20 may include a mobile terminal device (such as a mobile phone, a tablet computer, etc.), or a non-mobile terminal device (such as a desktop computer, a server, etc.).
  • a mobile terminal device such as a mobile phone, a tablet computer, etc.
  • a non-mobile terminal device such as a desktop computer, a server, etc.
  • the network 30 is used for communication between the data processing terminal 10 and the client 20 .
  • the network 30 may be a wired network, or a wireless network and so on.
  • data processing terminal 10 and the client terminal 20 may be deployed on the same electronic device, or may be deployed on different electronic devices.
  • the embodiment of the present application provides a data processing method, which is applied to a data processing device; wherein, the data processing device can be deployed at the data processing terminal 10 in FIG. 1 .
  • the data processing process provided by the embodiment of the present application will be described.
  • FIG. 2 shows a schematic flowchart of an optional data processing method.
  • the data processing method provided in the embodiment of the present application is used to mask log information.
  • the data processing method is the same for each mask field when masking is performed, and the data processing method is now described by taking the first mask field as an example. If multiple mask fields need to be masked, the following data processing methods are respectively executed for each mask field, so as to realize masking of multiple mask fields.
  • the data processing method may include but not limited to S201 to S205 shown in FIG. 2 .
  • the data processing end determines a first mask field and a first mask type in a configuration file.
  • the mask field is used to represent the fields that need to be masked.
  • the embodiment of the present application does not limit the specific content of the mask field, which can be configured according to actual requirements.
  • the mask field can be customer name, customer card number and so on.
  • Field types are used to classify masked fields.
  • the embodiment of the present application does not limit the specific classification method, the specific field type, and the mask fields included in the field type, which may be defined according to actual requirements.
  • the field types may include a card number field type and a name field type.
  • the card number field type can include: mask field 1 (the bank card number of another bank bound) and mask field 2 (customer card number);
  • the name field type can include mask field 3 (the name corresponding to the card number) and mask Field 4 (the name corresponding to the bank card number bound to other banks).
  • the configuration file defines at least one mask field and the mask type to which each mask field in the at least one mask field belongs.
  • the first mask field is any one of the at least one mask field; the first mask type is the mask type to which the first mask field belongs.
  • the configuration file may be in the form of a dictionary.
  • the configuration file in the form of a dictionary includes a plurality of dictionary items, and each dictionary item defines a specific mask field and a mask type to which the mask field belongs.
  • configuration files may include:
  • ⁇ /dict-group> means: the configuration file is a dictionary type.
  • S201 can be implemented as: the data processing end obtains at least one mask field included in the configuration file in the configuration file, uses any one mask field in the at least one mask field as the first mask field, and determines in the configuration file The mask type to which the first mask field belongs is the first mask type.
  • the data processing end obtains a first array and a first value for the first mask field.
  • the first array includes at least three second values sorted by numerical magnitude. Wherein, a second value is used to point to a log information.
  • the embodiment of the present application does not specifically limit the manner of obtaining the second value, which may be configured according to actual requirements.
  • the first letter of the function field in the log information can be converted into American Standard Code for Information Interchange (ASCII), and the ASCII code can be used as the second key pointing to the log information. value.
  • ASCII American Standard Code for Information Interchange
  • the log information may be numbered using the function field in the log information as a standard, and the number may be used as a second value pointing to the log information.
  • the numbers for the same function field should be kept the same, and the numbers for different function fields should be kept different.
  • the log information includes at least three log information, and correspondingly, the second value includes at least three second values; the at least three second values form a first array.
  • transaction 1 includes 4 log information, namely: log information 1 (customer card number: 111111111111), log information 2 (name corresponding to the card number: Wang Xiaoer) , log information 3 (the bank card number bound to another bank: 2222222222222), log information 4 (the name corresponding to the bank card number bound to other bank: Zhang Xiaosi).
  • the second value corresponding to log information 1 may be 3, the second value corresponding to log information 2 may be 7, the second value corresponding to log information 3 may be 15, and the second value corresponding to log information 4 may be 18.
  • the first array is [3, 7, 15, 18].
  • the first value is used to point to the first mask field.
  • the embodiment of the present application does not specifically limit the manner of obtaining the first value, which may be configured according to actual requirements. It should be noted that the manner of obtaining the first value should be consistent with the method of obtaining the second value.
  • example 2 when the first mask field is the name corresponding to the bank card number bound to another bank, the first value can be 18.
  • S202 can be implemented as: the data processing end generates at least three second values through at least three log information, arranges the at least three second values according to the size to obtain the first array, and the data processing end A first value is generated according to the first mask field.
  • the data processing end generates a second value pointing to log information for all log information in advance
  • S202 may be implemented as: the data processing end determines the at least three log values among multiple second values. For at least three second values corresponding to the information, arrange the at least three second values according to the size to obtain the first array; the data processing end determines that the first mask field corresponds to the plurality of second values according to the first mask field.
  • the second value of is the first value.
  • the data processing end searches the first array for a target second value that is the same as the first value.
  • a binary search algorithm may be used to search for a target second value identical to the first value in the first array.
  • a traversal algorithm may also be used to search for a target second value that is the same as the first value in the first array.
  • the data processing end determines the log information pointed to by the target second value as the data to be masked.
  • the embodiment of the present application does not limit the specific implementation of determining the log information through the second value, and it may be configured according to actual requirements.
  • the log information may be in the form of a table, and the rows of the table include a function field, a value of the function field, and a corresponding second value.
  • the target second value can be searched in the corresponding second value column first, and the log information of the row where the target second value is located is determined as the data to be masked.
  • the data processing end performs a masking operation corresponding to the first masking type on the data to be masked.
  • a mask type corresponds to a mask operation.
  • the corresponding mask operation is: display the last four digits of the card number, and mask the other digits; when the mask type is the name field type, the corresponding mask The code operation is: display the first character, and mask other characters.
  • S205 may be implemented as: the data processing end determines a mask operation corresponding to the first mask type, and executes the mask operation on the masked data.
  • the embodiment of the present application does not limit the specific masking method, which can be configured according to actual requirements.
  • any replacement algorithm may be used for masking. For example, characters that require a mask can be replaced with special characters.
  • special characters can include any of the following: “*”, “#”, “&”.
  • the data processing solution provided by the embodiment of the present application includes: in the configuration file, determining a first mask field and a first mask type; the first mask type is the mask type to which the first mask field belongs; Obtaining a first array and a first value for the first mask field; the first array includes at least three second values sorted according to the magnitude of the values; one of the second values is used to point to a log message; the The first numerical value is used to point to the first mask field; in the first array, search for the same target second numerical value as the first numerical value; and determine the log information pointed to by the target second numerical value is the data to be masked; for the data to be masked, perform a masking operation corresponding to the first mask type.
  • the first numerical value corresponding to the first mask field and the second numerical value corresponding to multiple log information are obtained; the process of finding the data to be masked in multiple log information is converted into The process of searching for the first numerical value in the first array composed of two numerical values; since the data processing amount of the numerical value search process is small, the scheme of the present application reduces the data processing amount and improves the processing when masking the log information. efficiency.
  • the process may include but not limited to the following S2031 to S2035.
  • the data processing end determines the first parameter and the second parameter.
  • the initial value of the first parameter is one; the initial value of the second parameter is N, and N is used to represent the length of the first array, that is, N is greater than or equal to 3.
  • the data processing end assigns a value of one to the first parameter, and assigns a value of N to the second parameter.
  • S2031 may be implemented as: Low: 1; High: N. Among them, Low indicates the first parameter, Low: 1 indicates assigning 1 to Low; High indicates the second parameter, and High: N indicates assigning N to High.
  • the data processing terminal judges the magnitude relationship between the second numerical value whose subscript is the third parameter in the first array and the first numerical value.
  • the third parameter is the average value of the first parameter and the second parameter.
  • Example 3 If the first array is [1, 3, 7, 15, 18, 24, 25], the first value is 18; then the first parameter is 1, the second parameter is 7, and the third parameter is 4; The second value whose subscript is the third parameter is 15; that is, the second value 15 whose subscript is the third parameter is smaller than the first value 18.
  • the third parameter may be a value rounded back from the average value. For example, if the average value of the first parameter and the second parameter is 3.5, then the third parameter may be 4.
  • the data processing end determines that the target second value is the second value whose subscript is the third parameter.
  • the data processing end determines the second numerical value subscripted as the third parameter as the target numerical value.
  • the data processing end modifies the first parameter by adding one to the third parameter.
  • the data processing end modifies the first parameter to add one to the third parameter, and re-executes: S2032
  • the data processing end judges the size between the second value of the third parameter and the first value in the first array whose subscript is relation.
  • example 4 the data processing end modifies the first parameter to 5, the second parameter to 7; the third parameter to 6, re-judgment in the array, the subscript is the second value of the third parameter is 24, 24 is greater than the first A value of 18.
  • the data processing end modifies the second parameter to be the third parameter minus one.
  • the data processing end modifies the second parameter to be the third parameter minus one, and re-executes: S2032
  • the data processing end judges the size between the second value of the third parameter and the first value in the first array whose subscript is relation.
  • the data processing end modifies the second parameter to 5, the first parameter to 5, then the third parameter to 5, then the subscript is the second value of the third parameter to 18, and the subscript is determined to be the third The second value of the parameter is 18, and 18 is determined as the target second value.
  • the data processing terminal judges the size relationship between the second value whose subscript is the third parameter and the first value in the first array, it may also first Filter the second value in the first array.
  • the filtering process may include but not limited to any one of the following Embodiments A to C.
  • Embodiment A Filter the second value in the first array according to the increasing direction of the subscript
  • Embodiment B Filter the second value in the first array according to the direction in which the subscript decreases
  • Embodiment C Simultaneously filter the second value in the first array according to the increasing direction of the subscript and the decreasing direction of the subscript.
  • This process may include but not limited to SA01 to SA04 described below.
  • the data processing end determines a first reference value based on the first array.
  • the first reference value is used to assist in determining the first screening quantity.
  • the embodiment of the present application does not limit the manner of determining the first reference value, which may be configured according to actual conditions.
  • the data processing end subtracts the first value, and the result of subscripting the second value of the first parameter in the first array is determined as the second reference value.
  • the data processing end determines that the second reference value is the first value minus the first second value in the first array (the second value whose subscript is one). result.
  • the data processing end determines the first screening quantity as a result of dividing the second reference value by the first reference value.
  • the data processing end calculates the result of dividing the second reference value by the first reference value, and uses the result as the first screening quantity.
  • the data processing end filters out the second value of the first screening quantity starting from the subscript of the first parameter in the first array according to the increasing direction of the subscript.
  • the first screening number is 2
  • the first array is [1, 3, 7, 15, 18, 24, 25]
  • This process may include but not limited to the following SB01 to SB04.
  • the data processing end determines a first reference value based on the first array.
  • SB01 The specific implementation manner of SB01 is the same as that of SA01, and reference may be made to SA01 for the specific implementation, which will not be repeated here.
  • the data processing end determines the result of subtracting the first value from the second value in the first array as the subscript of the second parameter as the third reference value.
  • the data processing end determines that the third reference value is the second value (the last second value) of the second parameter in the first array minus the first value. result.
  • the data processing end determines the second screening quantity as a result of dividing the third reference value by the first reference value.
  • the data processing end calculates the result of dividing the third reference value by the first reference value, and uses the result as the second screening quantity.
  • the data processing end filters out the second numerical value of the second screening quantity from the first array whose subscript is the second parameter according to the direction in which the subscript decreases.
  • the second parameter is 7
  • the second screening number is 2
  • the first array is [1, 3, 7, 15, 18, 24, 25]
  • Embodiment C For the specific implementation process of Embodiment C, reference may be made to the detailed descriptions of Embodiments A and B, which will not be repeated here.
  • the process of determining the first reference value at the SA01 data processing end based on the first array, and the determination of the first reference value at the SB01 data processing end based on the first array will be described below. Specifically, it may include but not limited to the following SA011 and SA012.
  • the data processing end calculates the difference between two adjacent second values among the at least three second values in the first array to obtain at least two adjacent differences.
  • Example 6 if the first array is [1, 3, 7, 15, 18, 24, 25], then at least two adjacent differences include: 2, 4, 8, 3, 6 and 1.
  • the data processing end determines that the fourth reference value is the maximum value of the at least two adjacent differences.
  • example 7 the fourth reference value is 8.
  • the data processing method provided in the embodiment of the present application can also modify the configuration file.
  • the mask field in the configuration file can be increased or decreased
  • the log information to be processed may also be modified.
  • the process may include but not limited to the following S401 and S402.
  • the data processing end adds a first dictionary item to the configuration file.
  • the data processing end configures the first mask field and the mask type to which the first mask field belongs in the first dictionary item, so as to obtain a new configuration file.
  • the data processing end determines the first mask field and the first mask type in the configuration file may be implemented as: determining the first mask field and the first mask type in the new configuration file.
  • Log Refers to the transaction records of the software system, which are mainly used to assist in problem solving and subsequent auditing.
  • the first point is that according to the characteristics of financial data, the optimized binary search algorithm is used to discard invalid data during the search process, so that the field to be masked can be quickly found in a short period of time, and the masking process can be performed through an arbitrary replacement algorithm.
  • the second point by configuring the mask information file (equivalent to the configuration file), dynamically realize the automatic processing process in the case of increasing or decreasing the mask type, increasing or decreasing the transaction that needs to be masked, or increasing or decreasing the masked field , sorting out the need for secondary development of the program, reducing the development workload and ensuring rapid business response; at the same time, the configuration mask information file can be exported to other systems in the form of a jar package to realize the common configuration of the configuration mask information file.
  • the processing flow may refer to FIG. 5 , and may include but not limited to the following S501 to S507.
  • the sensitive field is equivalent to a mask field.
  • the payment counter system involves fund payment, payment account number and name, and customer privacy information
  • the customer account number and the customer name belong to different mask types, so the desensitization methods (masking processing methods) are also different, and individualized processing needs to be performed on the customer account number and the customer name respectively.
  • configuration files may include:
  • ⁇ /dict-group> means: the configuration file is a dictionary type.
  • the optimized binary search algorithm can quickly locate sensitive information.
  • each information in the log information array is sorted in ascending order according to the ASCII code corresponding to the first letter (equivalent to the second value). Since a log information corresponds to a unique initial letter, then only in the ordered array, The target sensitive information can be found by finding the value corresponding to the target sensitive field through the algorithm.
  • the optimized binary search algorithm actually performs a screening work before each binary search to filter out unnecessary elements, which can greatly improve the search speed.
  • the realization process of screening may mainly include: for example, it is necessary to find 18 (equivalent to the first value or also called the target value) from the arrays 1, 3, 7, 15, 18, 24 (equivalent to the first array), Then the processing procedure of the optimized binary search algorithm is adopted: the difference between two adjacent numbers in the calculation array is 2, 4, 8, 3 and 6 respectively, and the maximum value E of the adjacent difference (equivalent to the first reference value ) is 8, the first difference between the calculated target value 18 and the minimum number 1 (equivalent to the second value of the first parameter in the first array) is 17 (equivalent to the second reference value), and the first difference The value 17 is divided by the maximum value 8 of the adjacent difference to obtain a first reference value of 2.125 (equivalent to the first screening quantity, which can also be called the number of forward screening elements).
  • the first 3 data from the front of the serial number are directly filtered out, that is, 1, 3 and 7 are directly filtered out; the number 18 to be searched and the maximum number 24 are calculated (equivalent to the second parameter whose subscript is the second parameter in the first array) value) is 6 (equivalent to the third reference value), divide the second difference value 6 by the maximum value 8 to get a second reference value 0.75 (equivalent to the second screening quantity, which can also be called the post
  • the number of elements to be screened) directly filter out 1 piece of data whose sequence number is from the back to the front in the array, that is, filter out 24; in this way, the filtered data includes 15 and 18. That is, you only need to find the target value in 15 and 18 through the ordinary binary search algorithm, so that a large number of unnecessary search elements can be eliminated after filtering, thereby greatly reducing the number of search comparisons and improving the search speed.
  • the program corresponding to the optimized binary search algorithm may include:
  • n // means to assign a value to the variable High n//
  • Index 0 //Indicates assigning a value of 0 to the variable Index//
  • Low_span (equivalent to the first screening quantity): 0 //Indicates that the variable Low_span is assigned a value of 0//
  • High_span (equivalent to the second screening quantity): 0 //Indicates that the variable High_span is assigned a value of 0//
  • Low Low+[Low_span] //Indicates that in a cycle, the assignment of Low is equal to the sum of the last assignment of Low and [Low_span]//
  • High_span (A[High]-X)/M //Indicates that the assignment of High_span is equal to the result of dividing the difference between the value of High in A and X by M//
  • High High-[High_span] //Indicates that in a loop, the assignment of High is equal to the sum of the last assignment of High and [High_span]//
  • Low indicates the starting serial number of the array
  • High indicates the length of the array
  • Index indicates the return serial number
  • Low_span indicates the number of elements filtered forward
  • High_span indicates the number of elements filtered backward
  • A is the array
  • X is the target value
  • M is the maximum value of the difference between two adjacent numbers in the array.
  • Low_span indicates the number of elements filtered forward from A[Low]
  • High_span indicates the number of elements filtered backward from A[High]. If one of Low_span and High_span is less than 0, it means that the element X to be searched is not between A[Low] and A[High], that is to say, there is no element in A, then the loop can be ended directly.
  • Sensitive fields can be located through the above optimized binary search algorithm, each sensitive field value needs to be desensitized (masked), and the type corresponding to the sensitive field is read from the configuration file (for example, bind_card_no corresponds to 0-customer account type), for the log plaintext information corresponding to bind_card_no, perform mask processing corresponding to the type corresponding to the sense field.
  • bind_card_no corresponds to 0-customer account type
  • the mask processing method can use any replacement algorithm. For example, special characters (*, etc.) can be used to replace part of the true value, combined with financial account numbers
  • special characters (*, etc.) can be used to replace part of the true value, combined with financial account numbers
  • the characteristics of the data only the first six digits of the card number and the last four digits of the mantissa are displayed, and the other digits are covered with "*".
  • the log information after splicing the mask will be output and printed uniformly after the loop processing is completed. As shown in Figure 6, the masked log file, the sensitive fields involved in the configuration file have been masked. From the performance point of view, through the search and desensitization algorithm processing, the transaction will not be affected while printing the log speed.
  • the above-mentioned configuration files can be expanded horizontally and vertically, that is, adding sensitive fields and mask types.
  • mask types such as 2-ID card, 3-mobile phone number
  • the data processing device 70 includes: a first determining unit 701 , an obtaining unit 702 , a searching unit 703 , a second determining unit 704 and an executing unit 705 . in:
  • the first determining unit 701 is configured to determine a first mask field and a first mask type in the configuration file; the first mask type is the mask type to which the first mask field belongs;
  • the obtaining unit 702 is configured to obtain a first array and a first numerical value for the first mask field; the first array includes at least three second numerical values sorted by numerical value; one of the second numerical values is used for point to a log message; the first value is used to point to the first mask field;
  • a search unit 703, configured to search for a target second value that is the same as the first value in the first array
  • the second determining unit 704 is configured to determine the log information pointed to by the target second value as the data to be masked
  • the executing unit 705 is configured to execute a masking operation corresponding to the first masking type on the data to be masked.
  • the search unit 703 is further configured to:
  • the first parameter is one; the second parameter is N; the N is used to characterize the length of the first array;
  • the subscript is the magnitude relationship between the second value of the third parameter and the first value;
  • the third parameter is the average value of the first parameter and the second parameter ;
  • If the subscript is that the second value of the third parameter is smaller than the first value, then modify the first parameter to add one to the third parameter, and perform the determination of the subscript in the first array is the magnitude relationship between the second value of the third parameter and the first value;
  • the subscript is that the second value of the third parameter is greater than the first value, then modify the second parameter to be the third parameter minus one, and perform the determination of the subscript in the first array is the magnitude relationship between the second value of the third parameter and the first value.
  • the search unit 703 is further configured to: before judging the magnitude relationship between the second value whose subscript is the third parameter and the first value in the first array:
  • the second value of the first screening quantity is filtered out.
  • the search unit 703 is further configured to: before judging the magnitude relationship between the second value whose subscript is the third parameter and the first value in the first array:
  • the second numerical value of the second screening quantity is filtered out.
  • the search unit 703 is further configured to:
  • the search unit 703 is further configured to:
  • the target second value does not exist in the first array.
  • the data processing device 70 may further include a configuration unit configured to: the configuration file does not include the first mask field; when executing the configuration file, determine the first mask field and execute before the first mask type:
  • the first determining unit 701 is further configured to:
  • a first mask field and a first mask type are determined.
  • the data processing device includes each included unit, which can be realized by a processor in an electronic device; of course, it can also be realized by a specific logic circuit; in the process of implementation, the processor It can be a central processing unit (CPU, Central Processing Unit), a microprocessor (MPU, Micro Processor Unit), a digital signal processor (DSP, Digital Signal Processor) or a field programmable gate array (FPGA, Field-Programmable Gate Array) wait.
  • CPU Central Processing Unit
  • MPU Micro Processor Unit
  • DSP Digital Signal Processor
  • FPGA Field-Programmable Gate Array
  • the above-mentioned data processing method is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
  • the computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk.
  • embodiments of the present application are not limited to any specific combination of hardware and software.
  • an embodiment of the present application provides an electronic device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the above implementation when executing the program The steps in the data processing method provided in the example.
  • the electronic device 80 may be the above-mentioned electronic device. As shown in FIG. 8 , the electronic device 80 includes: a processor 801 , at least one communication bus 802 , a user interface 803 , at least one external communication interface 804 and a memory 805 . Wherein, the communication bus 802 is configured to realize connection and communication between these components. Wherein, the user interface 803 may include a display screen, and the external communication interface 804 may include a standard wired interface and a wireless interface.
  • the memory 805 is configured to store instructions and applications executable by the processor 801, and can also cache data to be processed or processed by the processor 801 and various modules in the electronic device (for example, image data, audio data, voice communication data and video data) Communication data), which can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • the embodiments of the present application provide a storage medium, that is, a computer-readable storage medium, on which a computer program is stored.
  • a storage medium that is, a computer-readable storage medium, on which a computer program is stored.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration
  • the unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
  • the above-mentioned integrated units of the present application are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.

Abstract

The present application discloses a data processing method and apparatus, a device, and a storage medium. The method comprises: determining a first mask field and a first mask type in a configuration file, wherein the first mask type is a mask type to which the first mask field belongs; obtaining a first array and a first numerical value for the first mask field, wherein the first array comprises at least three second numerical values sorted according to the magnitude of the numerical value, one second numerical value is used for pointing to one piece of log information, and the first numerical value is used for pointing to the first mask field; searching for a target second numerical value which is the same as the first numerical value in the first array; determining, as data to be masked, the log information to which the target second numerical value points; and performing a mask operation corresponding to the first mask type on said data. According to the solution of the present application, when the log information is masked, the data processing amount is reduced, and the processing efficiency is improved.

Description

一种数据处理方法、装置、设备及存储介质A data processing method, device, equipment and storage medium
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111545063.1、申请日为2021年12月16日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。This application is based on a Chinese patent application with application number 202111545063.1 and a filing date of December 16, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated into this application by reference.
技术领域technical field
本申请涉及数据处理技术领域,涉及但不限于数据处理方法、装置、设备及存储介质。This application relates to the technical field of data processing, involving but not limited to data processing methods, devices, equipment and storage media.
背景技术Background technique
随着计算机技术的飞速发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Fintech)转变,但由于金融行业的安全性和实时性要求,也对技术提出了更高的要求。With the rapid development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually transforming into financial technology (Fintech). However, due to the security and real-time requirements of the financial industry, more and more technical requirements high demands.
相关技术中,对于日志信息的掩码问题,通常需要知道哪个交易需要掩码,然后针对该需掩码的交易的多个日志信息进行全文检索,找到含有敏感字段(掩码字段)的目标日志信息(待掩码数据);再确定该掩码字段对应的掩码类型,对该目标日志信息执行与该掩码类型对应的掩码处理。In related technologies, for the masking of log information, it is usually necessary to know which transaction needs to be masked, and then perform a full-text search for multiple log information of the transaction that needs to be masked, and find the target log containing sensitive fields (masked fields) Information (data to be masked); then determine the mask type corresponding to the mask field, and perform mask processing corresponding to the mask type on the target log information.
其中,在确定待掩码数据时,一般处理方式为:通过遍历的方式,在多个个日志信息中查找包含第一掩码字段的日志信息,作为待掩码数据。这样,在对日志信息进行掩码时的数据处理量较大,处理效率较低。Wherein, when the data to be masked is determined, the general processing method is: through traversal, log information including the first mask field is searched in a plurality of log information as the data to be masked. In this way, when masking the log information, the amount of data processing is relatively large, and the processing efficiency is low.
发明内容Contents of the invention
本申请提供一种数据处理方法及装置、设备、存储介质,在对日志信息进行掩码时,减小了数据处理量,提高了处理效率。The present application provides a data processing method, device, device, and storage medium, which reduce the amount of data processing and improve processing efficiency when masking log information.
本申请的技术方案是这样实现的:The technical scheme of the present application is realized like this:
本申请提供了一种数据处理方法,所述方法包括:在配置文件中,确定第一掩码字段和第一掩码类型;所述第一掩码类型为所述第一掩码字段所属的掩 码类型;The present application provides a data processing method, the method comprising: in a configuration file, determining a first mask field and a first mask type; the first mask type is the first mask field to which the first mask field belongs mask type;
获得第一数组和针对所述第一掩码字段的第一数值;所述第一数组包括按照数值大小排序的至少三个第二数值;一个所述第二数值用于指向一个日志信息;所述第一数值用于指向所述第一掩码字段;Obtaining a first array and a first numerical value for the first mask field; the first array includes at least three second numerical values sorted by numerical value; one of the second numerical values is used to point to a log information; the The first value is used to point to the first mask field;
在所述第一数组中,查找与所述第一数值相同的目标第二数值;In the first array, find a target second value that is the same as the first value;
将所述目标第二数值指向的日志信息,确定为待掩码数据;Determining the log information pointed to by the second value of the target as the data to be masked;
对所述待掩码数据,执行所述第一掩码类型对应的掩码操作。For the data to be masked, perform a masking operation corresponding to the first mask type.
本申请提供了一种数据处理装置,所述装置包括:The present application provides a data processing device, the device comprising:
第一确定单元,配置为在配置文件中,确定第一掩码字段和第一掩码类型;所述第一掩码类型为所述第一掩码字段所属的掩码类型;The first determining unit is configured to determine a first mask field and a first mask type in the configuration file; the first mask type is the mask type to which the first mask field belongs;
获得单元,配置为获得第一数组和针对所述第一掩码字段的第一数值;所述第一数组包括按照数值大小排序的至少三个第二数值;一个所述第二数值用于指向一个日志信息;所述第一数值用于指向所述第一掩码字段;An obtaining unit configured to obtain a first array and a first value for the first mask field; the first array includes at least three second values sorted by value; one second value is used to point to A log message; the first value is used to point to the first mask field;
查找单元,配置为在所述第一数组中查找与所述第一数值相同的目标第二数值;a search unit configured to search for a target second value identical to the first value in the first array;
第二确定单元,配置为将所述目标第二数值指向的日志信息,确定为待掩码数据;The second determination unit is configured to determine the log information pointed to by the target second value as the data to be masked;
执行单元,配置为对所述待掩码数据,执行所述第一掩码类型对应的掩码操作。The executing unit is configured to execute a masking operation corresponding to the first masking type on the data to be masked.
本申请还提供了一种电子设备,包括:存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述数据处理方法。The present application also provides an electronic device, including: a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the above data processing method when executing the program.
本申请还提供了一种存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述数据处理方法。The present application also provides a storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above data processing method is realized.
本申请所提供的数据处理方法、装置、设备及存储介质,包括:在配置文件中,确定第一掩码字段和第一掩码类型;所述第一掩码类型为所述第一掩码字段所属的掩码类型;获得第一数组和针对所述第一掩码字段的第一数值;所 述第一数组包括按照数值大小排序的至少三个第二数值;一个所述第二数值用于指向一个日志信息;所述第一数值用于指向所述第一掩码字段;在所述第一数组中,查找与所述第一数值相同的目标第二数值;将所述目标第二数值指向的日志信息,确定为待掩码数据;对所述待掩码数据,执行所述第一掩码类型对应的掩码操作。对于本申请的方案,获得第一掩码字段对应的第一数值,以及多个日志信息对应的第二数值;将在多个日志信息中查找待掩码数据的过程,转换为在多个第二数值组成的第一数组中查找第一数值的过程;由于数值的查找过程数据处理量较小,所以本申请的方案在对日志信息进行掩码时,减小了数据处理量,提高了处理效率。The data processing method, device, device, and storage medium provided by the present application include: in the configuration file, determining the first mask field and the first mask type; the first mask type is the first mask The mask type to which the field belongs; a first array and a first numerical value for the first mask field are obtained; the first array includes at least three second numerical values sorted according to the magnitude of the numerical values; one second numerical value is used is used to point to a log information; the first value is used to point to the first mask field; in the first array, the second value of the target that is the same as the first value is searched; the second value of the target is The log information pointed to by the value is determined as the data to be masked; and the masking operation corresponding to the first mask type is performed on the data to be masked. For the solution of this application, the first numerical value corresponding to the first mask field and the second numerical value corresponding to the multiple log information are obtained; the process of finding the data to be masked in the multiple log information is converted into The process of searching for the first numerical value in the first array composed of two numerical values; because the data processing amount of the numerical value search process is small, the scheme of the present application reduces the data processing amount and improves the processing when masking the log information. efficiency.
附图说明Description of drawings
图1为本申请实施例提供的数据处理系统的一种可选的结构示意图;FIG. 1 is an optional structural schematic diagram of a data processing system provided in an embodiment of the present application;
图2为本申请实施例提供的数据处理方法的一种可选的流程示意图Figure 2 is an optional schematic flow chart of the data processing method provided by the embodiment of the present application
图3为本申请实施例提供的数据处理方法的一种可选的流程示意图;FIG. 3 is an optional schematic flowchart of a data processing method provided in an embodiment of the present application;
图4为本申请实施例提供的数据处理方法的一种可选的流程示意图;FIG. 4 is an optional schematic flowchart of a data processing method provided in an embodiment of the present application;
图5为本申请实施例提供的数据处理方法的一种可选的流程示意图;FIG. 5 is an optional schematic flowchart of a data processing method provided in an embodiment of the present application;
图6为本申请实施例提供的掩码后的日志文件的一种可选的示意图;FIG. 6 is an optional schematic diagram of a masked log file provided in an embodiment of the present application;
图7为本申请实施例提供的数据处理装置的一种可选的结构示意图;FIG. 7 is an optional structural schematic diagram of a data processing device provided in an embodiment of the present application;
图8为本申请实施例提供的电子设备的一种可选的结构示意图。FIG. 8 is a schematic structural diagram of an optional electronic device provided in an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对申请的具体技术方案做进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the specific technical solutions of the application will be further described in detail below in conjunction with the drawings in the embodiments of the present application. The following examples are used to illustrate the present application, but not to limit the scope of the present application.
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。In the following description, references to "some embodiments" describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
在以下的描述中,所涉及的术语“第一\第二\第三”仅是为例区别不同的对象,不代表针对对象的特定排序,不具有先后顺序的限定。可以理解地,“第一 \第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。In the following description, the term "first\second\third" is used as an example to distinguish different objects, and does not represent a specific order for the objects, and does not have a limitation on the sequence. It can be understood that "first\second\third" can be interchanged in a specific order or sequential order if allowed, so that the embodiments of the application described here can be used in a manner other than what is illustrated or described here implemented sequentially.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.
本申请实施例可提供数据处理方法及装置、设备和存储介质。实际应用中,数据处理方法可由数据处理装置实现,数据处理装置中的各功能实体可以由电子设备的硬件资源,如处理器等计算资源、通信资源(如用于支持实现光缆、蜂窝等各种方式通信)协同实现。Embodiments of the present application may provide a data processing method, device, device, and storage medium. In practical applications, the data processing method can be realized by a data processing device, and each functional entity in the data processing device can be composed of hardware resources of electronic equipment, such as computing resources such as processors, and communication resources (such as used to support the realization of optical cables, cellular, etc. mode of communication) collaborative implementation.
本申请实施例提供的数据处理方法应用于数据处理系统,数据处理系统包括数据处理端。The data processing method provided in the embodiment of the present application is applied to a data processing system, and the data processing system includes a data processing terminal.
数据处理端用于执行:在配置文件中,确定第一掩码字段和第一掩码类型;所述第一掩码类型为所述第一掩码字段所属的掩码类型;获得第一数组和针对所述第一掩码字段的第一数值;所述第一数组包括按照数值大小排序的至少三个第二数值;一个所述第二数值用于指向一个日志信息;所述第一数值用于指向所述第一掩码字段;在所述第一数组中,查找与所述第一数值相同的目标第二数值;将所述目标第二数值指向的日志信息,确定为待掩码数据;对所述待掩码数据,执行所述第一掩码类型对应的掩码操作。The data processing terminal is used to execute: in the configuration file, determine the first mask field and the first mask type; the first mask type is the mask type to which the first mask field belongs; obtain the first array and the first numerical value for the first mask field; the first array includes at least three second numerical values sorted by numerical value; one of the second numerical values is used to point to a log information; the first numerical value It is used to point to the first mask field; in the first array, search for a target second value that is the same as the first value; determine the log information pointed to by the target second value as to be masked data; performing a mask operation corresponding to the first mask type on the data to be masked.
可选的,数据处理系统还可以包括客户端。客户端用于采集日志信息,并将采集的日志信息发送至数据处理端进行处理。Optionally, the data processing system may also include a client. The client is used to collect log information, and send the collected log information to the data processing end for processing.
作为一示例,数据处理系统的结构可如图1所示,包括:数据处理端10、客户端20。数据处理端10与客户端20之间可以通过网络30进行通信。As an example, the structure of the data processing system may be as shown in FIG. 1 , including: a data processing terminal 10 and a client terminal 20 . The data processing terminal 10 and the client terminal 20 can communicate through the network 30 .
这里,数据处理端10用于执行:在配置文件中,确定第一掩码字段和第一掩码类型;所述第一掩码类型为所述第一掩码字段所属的掩码类型;获得第一数组和针对所述第一掩码字段的第一数值;所述第一数组包括按照数值大小排序的至少三个第二数值;一个所述第二数值用于指向一个日志信息;所述第一数值用于指向所述第一掩码字段;在所述第一数组中,查找与所述第一数值相 同的目标第二数值;将所述目标第二数值指向的日志信息,确定为待掩码数据;对所述待掩码数据,执行所述第一掩码类型对应的掩码操作。Here, the data processing terminal 10 is used to execute: in the configuration file, determine the first mask field and the first mask type; the first mask type is the mask type to which the first mask field belongs; obtain A first array and a first value for the first mask field; the first array includes at least three second values sorted by value; one of the second values is used to point to a log information; the The first value is used to point to the first mask field; in the first array, search for the same target second value as the first value; determine the log information pointed to by the target second value as Data to be masked; performing a masking operation corresponding to the first mask type on the data to be masked.
其中,数据处理端10可以包括物理机(例如服务器等),或者虚拟机(例如云平台等)。Wherein, the data processing terminal 10 may include a physical machine (such as a server, etc.), or a virtual machine (such as a cloud platform, etc.).
客户端20用于采集日志信息,并将采集的日志信息发送至数据处理端进行处理。The client 20 is used to collect log information, and send the collected log information to the data processing end for processing.
其中,客户端20可以包括移动终端设备(例如手机、平板电脑等),或者非移动终端设备(例如台式电脑、服务器等)。Wherein, the client 20 may include a mobile terminal device (such as a mobile phone, a tablet computer, etc.), or a non-mobile terminal device (such as a desktop computer, a server, etc.).
网络30用于数据处理端10与客户端20之间通信。其中,网络30可以为有线网络,或者无线网络等等。The network 30 is used for communication between the data processing terminal 10 and the client 20 . Wherein, the network 30 may be a wired network, or a wireless network and so on.
需要说明的是,数据处理端10和客户端20可以部署于同一个电子设备上,也可以部署于不同的电子设备上。It should be noted that the data processing terminal 10 and the client terminal 20 may be deployed on the same electronic device, or may be deployed on different electronic devices.
下面,结合图1所示的数据处理系统的示意图,对本申请实施例提供的数据处理方法及装置、设备和存储介质的各实施例进行说明。Below, with reference to the schematic diagram of the data processing system shown in FIG. 1 , various embodiments of the data processing method, device, device, and storage medium provided by the embodiments of the present application will be described.
第一方面,本申请实施例提供一种数据处理方法,该方法应用于数据处理装置;其中,该数据处理装置可以部署于图1中的数据处理端10。下面,对本申请实施例提供的数据处理过程进行说明。In the first aspect, the embodiment of the present application provides a data processing method, which is applied to a data processing device; wherein, the data processing device can be deployed at the data processing terminal 10 in FIG. 1 . Next, the data processing process provided by the embodiment of the present application will be described.
图2示意了一种可选的一种数据处理方法的流程示意图,本申请实施例提供的数据处理方法,用于对日志信息进行掩码。FIG. 2 shows a schematic flowchart of an optional data processing method. The data processing method provided in the embodiment of the present application is used to mask log information.
其中,对于每个掩码字段在进行掩码时的数据处理方法相同,现以第一掩码字段为例,对该数据处理方法进行说明。若需要对多个掩码字段进行掩码时,分别针对每个掩码字段均执行下述数据处理方法,以实现对多个掩码字段的掩码。Wherein, the data processing method is the same for each mask field when masking is performed, and the data processing method is now described by taking the first mask field as an example. If multiple mask fields need to be masked, the following data processing methods are respectively executed for each mask field, so as to realize masking of multiple mask fields.
该数据处理方法可以包括但不限于图2所示的S201至S205。The data processing method may include but not limited to S201 to S205 shown in FIG. 2 .
S201、数据处理端在配置文件中,确定第一掩码字段和第一掩码类型。S201. The data processing end determines a first mask field and a first mask type in a configuration file.
掩码字段用于表征需要进行掩码处理的字段。本申请实施例对掩码字段的具体内容不作限定可以根据实际需求进行配置。示例性的,掩码字段可以为客 户姓名,客户卡号等等。The mask field is used to represent the fields that need to be masked. The embodiment of the present application does not limit the specific content of the mask field, which can be configured according to actual requirements. Exemplarily, the mask field can be customer name, customer card number and so on.
字段类型用于对掩码字段进行分类。本申请实施例对具体的分类方法、具体的字段类型以及字段类型包括的掩码字段不作限定,可以根据实际需求进行限定。Field types are used to classify masked fields. The embodiment of the present application does not limit the specific classification method, the specific field type, and the mask fields included in the field type, which may be defined according to actual requirements.
在一示例中,字段类型可以包括卡号字段类型和姓名字段类型。其中,卡号字段类型可以包括:掩码字段1(绑定的他行的银行卡号)和掩码字段2(客户卡号);姓名字段类型可以包括掩码字段3(卡号对应的姓名)和掩码字段4(绑定他行的银行卡号对应的姓名)。In an example, the field types may include a card number field type and a name field type. Among them, the card number field type can include: mask field 1 (the bank card number of another bank bound) and mask field 2 (customer card number); the name field type can include mask field 3 (the name corresponding to the card number) and mask Field 4 (the name corresponding to the bank card number bound to other banks).
配置文件定义了至少一个掩码字段以及该至少一个掩码字段中每个掩码字段所属的掩码类型。其中,第一掩码字段为该至少一个掩码字段中的任一个掩码字段;第一掩码类型为第一掩码字段所属的掩码类型。The configuration file defines at least one mask field and the mask type to which each mask field in the at least one mask field belongs. Wherein, the first mask field is any one of the at least one mask field; the first mask type is the mask type to which the first mask field belongs.
本申请实施例对配置文件的表现形式不作具体限定,可以根据实际需求进行配置。示例性的,可以配置文件可以为字典形式。其中,字典形式的配置文件中包括多个字典项,通过每个字典项定义具体的掩码字段以及掩码字段所属的掩码类型。The embodiment of the present application does not specifically limit the expression form of the configuration file, which may be configured according to actual requirements. Exemplarily, the configuration file may be in the form of a dictionary. Wherein, the configuration file in the form of a dictionary includes a plurality of dictionary items, and each dictionary item defines a specific mask field and a mask type to which the mask field belongs.
示例性的,配置文件可以包括:Exemplarily, configuration files may include:
<dict-group field=“mask_fields”describe=“掩码字段类型0-卡号账号,1-姓名”><dict-group field="mask_fields" describe="mask field type 0-card number and account number, 1-name">
<dict-item value=“bind_card_no”name=“0”/><dict-item value="bind_card_no" name="0"/>
<dict-item value=“card_no”name=“0”/><dict-item value="card_no" name="0"/>
<dict-item value=“cust_name”name=“1”/><dict-item value="cust_name" name="1"/>
<dict-item value=“rcv_name”name=“1”/><dict-item value="rcv_name" name="1"/>
</dict-group></dict-group>
其中,<dict-group field=“mask_fields”describe=“掩码字段类型0-卡号账号,1-姓名”>表示:创建两个掩码类型,卡号账号和姓名,卡号账号对应掩码字段类型为0,姓名对应掩码字段类型为1。Among them, <dict-group field="mask_fields" describe="mask field type 0-card number and account, 1-name"> means: create two mask types, card number and name, and the corresponding mask field type of card number is 0, the mask field type corresponding to the name is 1.
<dict-item value=“bind_card_no”name=“0”/>表示:创建一个字典项, 在掩码字段类型为0的情况下,敏感字段的字段信息可以包括绑定的他行的银行卡号。<dict-item value="bind_card_no" name="0"/> means: create a dictionary item, and when the mask field type is 0, the field information of the sensitive field can include the bound bank card number of other banks.
<dict-item value=“card_no”name=“0”/>表示:创建一个字典项,在掩码字段类型为0的情况下,敏感字段的字段信息可以包括客户卡号。<dict-item value="card_no" name="0"/> means: create a dictionary item, and when the mask field type is 0, the field information of the sensitive field can include the customer card number.
<dict-item value=“cust_name”name=“1”/>表示:创建一个字典项,在掩码字段类型为1的情况下,敏感字段的字段信息可以包括卡号对应的姓名。<dict-item value="cust_name" name="1"/> means: create a dictionary item, and when the mask field type is 1, the field information of the sensitive field can include the name corresponding to the card number.
<dict-item value=“rcv_name”name=“1”/>表示:创建一个字典项,在掩码字段类型为1的情况下,敏感字段的字段信息可以包括绑定他行的银行卡号对应的姓名。<dict-item value="rcv_name" name="1"/> means: create a dictionary item, and when the mask field type is 1, the field information of the sensitive field can include the bank card number corresponding to other banks Name.
</dict-group>表示:该配置文件为字典类型。</dict-group> means: the configuration file is a dictionary type.
S201可以实施为:数据处理端在配置文件中获取配置文件包括的至少一个掩码字段,将该至少一个掩码字段中的任一个掩码字段作为第一掩码字段,并在配置文件中确定第一掩码字段所属的掩码类型为第一掩码类型。S201 can be implemented as: the data processing end obtains at least one mask field included in the configuration file in the configuration file, uses any one mask field in the at least one mask field as the first mask field, and determines in the configuration file The mask type to which the first mask field belongs is the first mask type.
S202、数据处理端获得第一数组和针对所述第一掩码字段的第一数值。S202. The data processing end obtains a first array and a first value for the first mask field.
第一数组包括按照数值大小排序的至少三个第二数值。其中,一个第二数值用于指向一个日志信息。The first array includes at least three second values sorted by numerical magnitude. Wherein, a second value is used to point to a log information.
本申请实施例对获取第二数值的方式不作具体限定,可以根据实际需求进行配置。The embodiment of the present application does not specifically limit the manner of obtaining the second value, which may be configured according to actual requirements.
在一种可能的实施方式中,可以将日志信息中的功能字段的首字母转换为美国信息交换标准代码(American Standard Code for Information Interchange,ASCII),将该ASCII码作为指向该日志信息的第二数值。In a possible implementation, the first letter of the function field in the log information can be converted into American Standard Code for Information Interchange (ASCII), and the ASCII code can be used as the second key pointing to the log information. value.
在另一种可能的实施方式中,可以将以日志信息中的功能字段为标准,对日志信息进行编号,将该编号作为指向该日志信息的第二数值。其中,对于同一个功能字段的编号应保持相同,对于不同功能字段的编号应保持不同。In another possible implementation manner, the log information may be numbered using the function field in the log information as a standard, and the number may be used as a second value pointing to the log information. Wherein, the numbers for the same function field should be kept the same, and the numbers for different function fields should be kept different.
日志信息包括至少三个日志信息,对应的,第二数值包括至少三个第二数值;将该至少三个第二数值组成第一数组。The log information includes at least three log information, and correspondingly, the second value includes at least three second values; the at least three second values form a first array.
示例1,假设需要对交易1的日志信息进行掩码处理,交易1包括4个日 志信息,分别为:日志信息1(客户卡号:111111111111),日志信息2(卡号对应的姓名:王小二),日志信息3(绑定的他行的银行卡号:2222222222222),日志信息4(绑定他行的银行卡号对应的姓名:张小四)。则,日志信息1对应的第二数值可以为3,日志信息2对应的第二数值可以为7,日志信息3对应的第二数值可以为15,日志信息4对应的第二数值为18。则第一数组为[3,7,15,18]。Example 1, assuming that the log information of transaction 1 needs to be masked, transaction 1 includes 4 log information, namely: log information 1 (customer card number: 111111111111), log information 2 (name corresponding to the card number: Wang Xiaoer) , log information 3 (the bank card number bound to another bank: 2222222222222), log information 4 (the name corresponding to the bank card number bound to other bank: Zhang Xiaosi). Then, the second value corresponding to log information 1 may be 3, the second value corresponding to log information 2 may be 7, the second value corresponding to log information 3 may be 15, and the second value corresponding to log information 4 may be 18. Then the first array is [3, 7, 15, 18].
第一数值用于指向第一掩码字段。本申请实施例对获取第一数值的方式不作具体限定,可以根据实际需求进行配置。需要说明的是,获取第一数值的方式应于获取第二数值的方式保持一致。The first value is used to point to the first mask field. The embodiment of the present application does not specifically limit the manner of obtaining the first value, which may be configured according to actual requirements. It should be noted that the manner of obtaining the first value should be consistent with the method of obtaining the second value.
基于示例1,示例2:在第一掩码字段为绑定他行的银行卡号对应的姓名的情况下,第一数值可以为18。Based on example 1, example 2: when the first mask field is the name corresponding to the bank card number bound to another bank, the first value can be 18.
在一种可能的实施方式中,S202可以实施为:数据处理端通过至少三个日志信息生成至少三个第二数值,将至少三个第二数值按照大小进行排列得到第一数组,数据处理端根据第一掩码字段生成第一数值。In a possible implementation manner, S202 can be implemented as: the data processing end generates at least three second values through at least three log information, arranges the at least three second values according to the size to obtain the first array, and the data processing end A first value is generated according to the first mask field.
在另一种可能的实施方式中,数据处理端预先对针对所有的日志信息生成指向日志信息的第二数值,S202可以实施为:数据处理端在多个第二数值中确定该至少三个日志信息对应的至少三个第二数值,将至少三个第二数值按照大小进行排列得到第一数组;数据处理端根据第一掩码字段在多个第二数值中,确定第一掩码字段对应的第二数值为第一数值。In another possible implementation manner, the data processing end generates a second value pointing to log information for all log information in advance, and S202 may be implemented as: the data processing end determines the at least three log values among multiple second values. For at least three second values corresponding to the information, arrange the at least three second values according to the size to obtain the first array; the data processing end determines that the first mask field corresponds to the plurality of second values according to the first mask field. The second value of is the first value.
S203、数据处理端在所述第一数组中,查找与所述第一数值相同的目标第二数值。S203. The data processing end searches the first array for a target second value that is the same as the first value.
本申请实施例对具体的查找算法不作唯一限定,可以根据实际需求进行配置。在一示例中,可以采用二分查找算法在第一数组中,查找与第一数值相同的目标第二数值。在另一示例中,还可以采用遍历的算法在第一数组中,查找与第一数值相同的目标第二数值。The embodiment of the present application does not uniquely limit the specific search algorithm, which can be configured according to actual requirements. In an example, a binary search algorithm may be used to search for a target second value identical to the first value in the first array. In another example, a traversal algorithm may also be used to search for a target second value that is the same as the first value in the first array.
可以理解的,还可以通过其他查找算法在第一数组中,查找与第一数值相同的目标第二数值,此处不再一一列举。It can be understood that other search algorithms can also be used to search for the target second value that is the same as the first value in the first array, which will not be listed here.
S204、数据处理端将所述目标第二数值指向的日志信息,确定为待掩码数据。S204. The data processing end determines the log information pointed to by the target second value as the data to be masked.
本申请实施例对通过第二数值确定日志信息的具体实现不作限定,可以根据实际需求进行配置。The embodiment of the present application does not limit the specific implementation of determining the log information through the second value, and it may be configured according to actual requirements.
例如,日志信息可以为表格的形式,表格的行包括功能字段,功能字段的值,对应的第二数值。这样,在确定第二目标数值后,可以对应的第二数值列先查找目标第二数值,将目标第二数值所在行的日志信息确定为待掩码数据。For example, the log information may be in the form of a table, and the rows of the table include a function field, a value of the function field, and a corresponding second value. In this way, after the second target value is determined, the target second value can be searched in the corresponding second value column first, and the log information of the row where the target second value is located is determined as the data to be masked.
S205、数据处理端对所述待掩码数据,执行所述第一掩码类型对应的掩码操作。S205. The data processing end performs a masking operation corresponding to the first masking type on the data to be masked.
其中,一个掩码类型对应一个掩码操作。Among them, a mask type corresponds to a mask operation.
示例性的,在掩码类型为卡号字段类型的情况下,对应的掩码操作为:显示后四位卡号,其他位进行掩码;在掩码类型为姓名字段类型的情况下,对应的掩码操作为:显示第一位字符,其他位字符进行掩码。Exemplarily, when the mask type is the card number field type, the corresponding mask operation is: display the last four digits of the card number, and mask the other digits; when the mask type is the name field type, the corresponding mask The code operation is: display the first character, and mask other characters.
S205可以实施为:数据处理端确定第一掩码类型对应的掩码操作,对待掩码数据,执行该掩码操作。S205 may be implemented as: the data processing end determines a mask operation corresponding to the first mask type, and executes the mask operation on the masked data.
本申请实施例对具体的掩码方式不作限定,可以根据实际需求进行配置。在一种可能的实施方式中,可以通过任一替换算法进行掩码。例如,可以将需要掩码的字符替换为特殊字符。The embodiment of the present application does not limit the specific masking method, which can be configured according to actual requirements. In a possible implementation manner, any replacement algorithm may be used for masking. For example, characters that require a mask can be replaced with special characters.
本申请实施例对特殊字符的具体内容不作限定,可以根据实际需求进行配置。例如,特殊字符可以包括以下任一项:“*”、“#”、“&”。The embodiment of the present application does not limit the specific content of the special characters, which can be configured according to actual needs. For example, special characters can include any of the following: "*", "#", "&".
本申请实施例提供的数据处理方案包括:在配置文件中,确定第一掩码字段和第一掩码类型;所述第一掩码类型为所述第一掩码字段所属的掩码类型;获得第一数组和针对所述第一掩码字段的第一数值;所述第一数组包括按照数值大小排序的至少三个第二数值;一个所述第二数值用于指向一个日志信息;所述第一数值用于指向所述第一掩码字段;在所述第一数组中,查找与所述第一数值相同的目标第二数值;将所述目标第二数值指向的日志信息,确定为待掩码数据;对所述待掩码数据,执行所述第一掩码类型对应的掩码操作。对于 本申请的方案,获得第一掩码字段对应的第一数值,以及多个日志信息对应的第二数值;将在多个日志信息中查找待掩码数据的过程,转换为在多个第二数值组成的第一数组中查找第一数值的过程;由于数值的查找过程数据处理量较小,所以本申请的方案在对日志信息进行掩码时,减小了数据处理量,提高了处理效率。The data processing solution provided by the embodiment of the present application includes: in the configuration file, determining a first mask field and a first mask type; the first mask type is the mask type to which the first mask field belongs; Obtaining a first array and a first value for the first mask field; the first array includes at least three second values sorted according to the magnitude of the values; one of the second values is used to point to a log message; the The first numerical value is used to point to the first mask field; in the first array, search for the same target second numerical value as the first numerical value; and determine the log information pointed to by the target second numerical value is the data to be masked; for the data to be masked, perform a masking operation corresponding to the first mask type. For the solution of this application, the first numerical value corresponding to the first mask field and the second numerical value corresponding to multiple log information are obtained; the process of finding the data to be masked in multiple log information is converted into The process of searching for the first numerical value in the first array composed of two numerical values; since the data processing amount of the numerical value search process is small, the scheme of the present application reduces the data processing amount and improves the processing when masking the log information. efficiency.
下面,对S203数据处理端在所述第一数组中,查找与所述第一数值相同的目标第二数值的实现过程进行说明。如图3所示,该过程可以包括但不限于下述S2031至S2035。Next, an implementation process of S203 where the data processing end searches for the target second value that is the same as the first value in the first array will be described. As shown in Fig. 3, the process may include but not limited to the following S2031 to S2035.
S2031、数据处理端确定第一参数和第二参数。S2031. The data processing end determines the first parameter and the second parameter.
其中,第一参数的初始值为一;第二参数的初始值为N,N用于表征所述第一数组的长度,即N大于等于3。Wherein, the initial value of the first parameter is one; the initial value of the second parameter is N, and N is used to represent the length of the first array, that is, N is greater than or equal to 3.
数据处理端给第一参数赋值一,给第二参数赋值N。The data processing end assigns a value of one to the first parameter, and assigns a value of N to the second parameter.
示例性的,S2031可以实施为:Low:1;High:N。其中,Low表示第一参数,Low:1表示给Low赋值1;High表示第二参数,High:N表示给High赋值N。Exemplarily, S2031 may be implemented as: Low: 1; High: N. Among them, Low indicates the first parameter, Low: 1 indicates assigning 1 to Low; High indicates the second parameter, and High: N indicates assigning N to High.
S2032、数据处理端判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系。S2032. The data processing terminal judges the magnitude relationship between the second numerical value whose subscript is the third parameter in the first array and the first numerical value.
第三参数为第一参数与第二参数的平均值。The third parameter is the average value of the first parameter and the second parameter.
示例3:若第一数组为[1,3,7,15,18,24,25],第一数值为18;则第一参数为1,第二参数为7,则第三参数为4;下标为第三参数的第二数值为15;即下标为第三参数的第二数值15小于第一数值18。Example 3: If the first array is [1, 3, 7, 15, 18, 24, 25], the first value is 18; then the first parameter is 1, the second parameter is 7, and the third parameter is 4; The second value whose subscript is the third parameter is 15; that is, the second value 15 whose subscript is the third parameter is smaller than the first value 18.
需要说明的是若第一参数与第二参数的平均值不是整数,则第三参数可以为该平均值向后取整的数值。例如,如第一参数与第二参数的平均值为3.5,则第三参数可以为4。It should be noted that if the average value of the first parameter and the second parameter is not an integer, the third parameter may be a value rounded back from the average value. For example, if the average value of the first parameter and the second parameter is 3.5, then the third parameter may be 4.
若下标为第三参数的第二数值等于第一数值,则执行下述S2033;若下标为第三参数的第二数值小于第一数值,则执行下述S2034;若下标为第三参数的第二数值大于第一数值,则执行下述S2035。If the subscript is that the second numerical value of the third parameter is equal to the first numerical value, then execute the following S2033; if the subscript is that the second numerical value of the third parameter is less than the first numerical value, then execute the following S2034; if the subscript is the third If the second value of the parameter is greater than the first value, execute the following S2035.
S2033、数据处理端确定所述目标第二数值为所述下标为第三参数的第二数值。S2033. The data processing end determines that the target second value is the second value whose subscript is the third parameter.
数据处理端将该下标为第三参数的第二数值确定为目标数值。The data processing end determines the second numerical value subscripted as the third parameter as the target numerical value.
S2034、数据处理端修改所述第一参数为所述第三参数加一。S2034. The data processing end modifies the first parameter by adding one to the third parameter.
数据处理端修改第一参数为第三参数加一,并重新执行:S2032数据处理端判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系。The data processing end modifies the first parameter to add one to the third parameter, and re-executes: S2032 The data processing end judges the size between the second value of the third parameter and the first value in the first array whose subscript is relation.
基于示例3,示例4:数据处理端修改第一参数为5,第二参数为7;第三参数为6,重新判断数组中,下标为第三参数的第二数值为24,24大于第一数值18。Based on example 3, example 4: the data processing end modifies the first parameter to 5, the second parameter to 7; the third parameter to 6, re-judgment in the array, the subscript is the second value of the third parameter is 24, 24 is greater than the first A value of 18.
S2035、数据处理端修改所述第二参数为所述第三参数减一。S2035. The data processing end modifies the second parameter to be the third parameter minus one.
数据处理端修改第二参数为第三参数减一,并重新执行:S2032数据处理端判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系。The data processing end modifies the second parameter to be the third parameter minus one, and re-executes: S2032 The data processing end judges the size between the second value of the third parameter and the first value in the first array whose subscript is relation.
基于示例4,示例5,数据处理端修改第二参数为5,第一参数为5,则第三参数为5,则下标为第三参数的第二数值为18,确定下标为第三参数的第二数值为18,将18确定为目标第二数值。Based on example 4 and example 5, the data processing end modifies the second parameter to 5, the first parameter to 5, then the third parameter to 5, then the subscript is the second value of the third parameter to 18, and the subscript is determined to be the third The second value of the parameter is 18, and 18 is determined as the target second value.
本申请实施例提供的数据处理方法,在执行S2032数据处理端判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系之前,还可以先对第一数组中的第二数值进行过滤。其中,该过滤过程可以包括但不限于下述实施方式A至实施方式C中的任一项。In the data processing method provided by the embodiment of the present application, before executing S2032, the data processing terminal judges the size relationship between the second value whose subscript is the third parameter and the first value in the first array, it may also first Filter the second value in the first array. Wherein, the filtering process may include but not limited to any one of the following Embodiments A to C.
实施方式A、按照下标增大的方向,对第一数组中第二数值进行过滤;Embodiment A. Filter the second value in the first array according to the increasing direction of the subscript;
实施方式B、按照下标减小的方向,对第一数组中第二数值进行过滤;Embodiment B. Filter the second value in the first array according to the direction in which the subscript decreases;
实施方式C、按照下标增大的方向以及下标减小的方向,同时对第一数组中第二数值进行过滤。Embodiment C. Simultaneously filter the second value in the first array according to the increasing direction of the subscript and the decreasing direction of the subscript.
下面,对实施方式A按照下标增大的方向,对第一数组中第二数值进行过滤的过程进行说明。该过程可以包括但不限于下述SA01至SA04。Next, the process of filtering the second value in the first array according to the increasing direction of the subscript in Embodiment A will be described. This process may include but not limited to SA01 to SA04 described below.
SA01、数据处理端基于所述第一数组,确定第一参考值。SA01. The data processing end determines a first reference value based on the first array.
第一参考值用于辅助确定第一筛选数量。本申请实施例对第一参考值的确定方式不作限定,可以根据实际进行配置。The first reference value is used to assist in determining the first screening quantity. The embodiment of the present application does not limit the manner of determining the first reference value, which may be configured according to actual conditions.
SA02、数据处理端将所述第一数值减去,所述第一数组中下标为所述第一参数的第二数值的结果,确定为第二参考值。SA02. The data processing end subtracts the first value, and the result of subscripting the second value of the first parameter in the first array is determined as the second reference value.
示例性的,在第一参数为一的情况下,数据处理端确定第二参考值为第一数值减去第一数组中的第一个第二数值(下标为一的第二数值)的结果。Exemplarily, when the first parameter is one, the data processing end determines that the second reference value is the first value minus the first second value in the first array (the second value whose subscript is one). result.
SA03、数据处理端确定第一筛选数量为所述第二参考值除以所述第一参考值的结果。SA03. The data processing end determines the first screening quantity as a result of dividing the second reference value by the first reference value.
数据处理端计算第二参考值除以第一参考值的结果,并将该结果作为第一筛选数量。The data processing end calculates the result of dividing the second reference value by the first reference value, and uses the result as the first screening quantity.
SA04、数据处理端按照下标增大的方向,将所述第一数组中下标为所述第一参数起,所述第一筛选数量的第二数值过滤掉。SA04. The data processing end filters out the second value of the first screening quantity starting from the subscript of the first parameter in the first array according to the increasing direction of the subscript.
示例性的,在第一参数为一,第一筛选数量为2,第一数组为[1,3,7,15,18,24,25]的情况下,数据处理端将第一数组中的1和3过滤掉。Exemplarily, when the first parameter is one, the first screening number is 2, and the first array is [1, 3, 7, 15, 18, 24, 25], the data processing end will use the 1 and 3 are filtered out.
下面,对实施方式B按照下标减小的方向,对第一数组中第二数值进行过滤的过程进行说明。该过程可以包括但不限于下述SB01至SB04。Next, the process of filtering the second value in the first array according to the direction of decreasing subscript in Embodiment B will be described. This process may include but not limited to the following SB01 to SB04.
SB01、数据处理端基于所述第一数组,确定第一参考值。SB01. The data processing end determines a first reference value based on the first array.
SB01的具体实施方式与SA01相同,具体实施可以参考SA01,此处不再赘述。The specific implementation manner of SB01 is the same as that of SA01, and reference may be made to SA01 for the specific implementation, which will not be repeated here.
SB02、数据处理端将所述第一数组中下标为所述第二参数的第二数值,减去所述第一数值的结果,确定为第三参考值。SB02. The data processing end determines the result of subtracting the first value from the second value in the first array as the subscript of the second parameter as the third reference value.
示例性的,在第二参数为N的情况下,数据处理端确定第三参考值为第一数组中下标为第二参数的第二数值(最后一个第二数值)减去第一数值的结果。Exemplarily, when the second parameter is N, the data processing end determines that the third reference value is the second value (the last second value) of the second parameter in the first array minus the first value. result.
SB03、数据处理端确定第二筛选数量为所述第三参考值除以所述第一参考值的结果。SB03. The data processing end determines the second screening quantity as a result of dividing the third reference value by the first reference value.
数据处理端计算第三参考值除以第一参考值的结果,并将该结果作为第二 筛选数量。The data processing end calculates the result of dividing the third reference value by the first reference value, and uses the result as the second screening quantity.
SB04、数据处理端按照下标减小的方向,将所述第一数组中下标为所述第二参数起,所述第二筛选数量的第二数值过滤掉。SB04. The data processing end filters out the second numerical value of the second screening quantity from the first array whose subscript is the second parameter according to the direction in which the subscript decreases.
示例性的,在第二参数为7,第二筛选数量为2,第一数组为[1,3,7,15,18,24,25]的情况下,数据处理端将第一数组中的24和25过滤掉。Exemplarily, when the second parameter is 7, the second screening number is 2, and the first array is [1, 3, 7, 15, 18, 24, 25], the data processing end will use the 24 and 25 are filtered out.
对于实施方式C的具体实现过程可以参考实施方式A和实施方式B的详细描述,此处不再一一赘述。For the specific implementation process of Embodiment C, reference may be made to the detailed descriptions of Embodiments A and B, which will not be repeated here.
下面对SA01数据处理端基于所述第一数组,确定第一参考值的过程,SB01数据处理端基于所述第一数组,确定第一参考值进行说明。具体可以包括但不限于下述SA011和SA012。The process of determining the first reference value at the SA01 data processing end based on the first array, and the determination of the first reference value at the SB01 data processing end based on the first array will be described below. Specifically, it may include but not limited to the following SA011 and SA012.
SA011、数据处理端计算所述第一数组中,所述至少三个第二数值中相邻两个第二数值之差,得到至少两个相邻差。SA011. The data processing end calculates the difference between two adjacent second values among the at least three second values in the first array to obtain at least two adjacent differences.
示例6,若第一数组为[1,3,7,15,18,24,25],则至少两个相邻差包括:2、4、8、3、6和1。Example 6, if the first array is [1, 3, 7, 15, 18, 24, 25], then at least two adjacent differences include: 2, 4, 8, 3, 6 and 1.
SA012、数据处理端确定所述第四参考值为所述至少两个相邻差中的最大值。SA012. The data processing end determines that the fourth reference value is the maximum value of the at least two adjacent differences.
基于示例6,示例7:第四参考值为8。Based on example 6, example 7: the fourth reference value is 8.
需要说明的是,对于实施方式A,在第一筛选数量小于零的情况下,确定第一数组中不存在目标第二数值。It should be noted that, for implementation A, when the first screening quantity is less than zero, it is determined that the target second value does not exist in the first array.
对于实施方式B,在第二筛选数量小于零的情况下,确定第一数组中不存在目标第二数值。For implementation B, if the second screening quantity is less than zero, it is determined that the target second value does not exist in the first array.
对于实施方式C,在第一筛选数量小于零,或者第二筛选数量小于零的情况下,确定第一数组中不存在目标第二数值。For implementation C, when the first screening quantity is less than zero, or the second screening quantity is less than zero, it is determined that the target second value does not exist in the first array.
本申请实施例提供的数据处理方法还可以对配置文件进行修改。The data processing method provided in the embodiment of the present application can also modify the configuration file.
在一种可能的实施方式中,可以增加或减少配置文件中的掩码字段;In a possible implementation manner, the mask field in the configuration file can be increased or decreased;
在另一种可能的实施方式中,还可以修改待处理的日志信息。In another possible implementation manner, the log information to be processed may also be modified.
现以在配置文件中增加第一掩码字段为例,对配置文件的修改方法进行说 明。如图4所示,该过程可以包括但不限于下述S401和S402。Now take adding the first mask field in the configuration file as an example to describe how to modify the configuration file. As shown in Fig. 4, the process may include but not limited to the following S401 and S402.
S401、数据处理端在所述配置文件中增加第一字典项。S401. The data processing end adds a first dictionary item to the configuration file.
S402、数据处理端在所述第一字典项中配置所述第一掩码字段,以及所述第一掩码字段所属的掩码类型,以得到新配置文件。S402. The data processing end configures the first mask field and the mask type to which the first mask field belongs in the first dictionary item, so as to obtain a new configuration file.
示例性的,<dict-item value=“telphone_no”name=“2”/>表示:创建一个字典项,在掩码字段类型为2的情况下,敏感字段的字段信息可以包括客户手机号。Exemplarily, <dict-item value="telphone_no" name="2"/> means: create a dictionary item, and when the mask field type is 2, the field information of the sensitive field may include the customer's mobile phone number.
对应的,S201数据处理端在配置文件中,确定第一掩码字段和第一掩码类型可以实现为:在所述新配置文件中,确定第一掩码字段和第一掩码类型。Correspondingly, in S201 the data processing end determines the first mask field and the first mask type in the configuration file may be implemented as: determining the first mask field and the first mask type in the new configuration file.
下面,以支交易过程为例,对本申请实施例提供的数据处理方法进行说明。In the following, the data processing method provided by the embodiment of the present application will be described by taking the branch transaction process as an example.
为了便于理解,先对本实施例中的部分技术术语进行解释。For ease of understanding, some technical terms in this embodiment are explained first.
日志:指软件系统的交易记录,其主要用于辅助解决问题的定位和后续审计。Log: Refers to the transaction records of the software system, which are mainly used to assist in problem solving and subsequent auditing.
相关技术中,关于日志信息的掩码问题,通常需要知道哪个交易需要脱敏(掩码),然后针对该需掩码的交易的日志进行全文检索是否含有敏感字段(掩码字段),找到敏感字段后再进行判断该掩码字段对应的掩码类,通过与该掩码字段对应的掩码类进行掩码处理,从而实现对某类交易日志的掩码处理。In related technologies, regarding the masking of log information, it is usually necessary to know which transaction needs to be desensitized (masked), and then perform a full-text search for the log of the transaction that needs to be masked to see if it contains sensitive fields (masked fields), and find sensitive fields. Then judge the mask class corresponding to the mask field, and perform mask processing through the mask class corresponding to the mask field, so as to realize the mask processing of a certain type of transaction log.
但是,如果需要增加新的掩码类或者需掩码的交易等情况下,往往需要针对性的进行二次开发。However, if it is necessary to add a new mask class or a transaction that needs to be masked, it is often necessary to carry out targeted secondary development.
相关技术存在以下缺点:There is following shortcoming in related technology:
(1)开发时效性差;(1) Poor development timeliness;
一方面,应对互联网的快速变化,在增加掩码类型、需掩码的交易、掩码字段等情况下,需要针对性的进行二次开发,导致开发时间较长;另一方面,在开发的过程中,需掩码的信息没有及时被掩码,信息存在泄露的风险;从而影响互联网的安全性能。On the one hand, in response to the rapid changes of the Internet, in the case of adding mask types, transactions that need to be masked, mask fields, etc., it is necessary to carry out secondary development in a targeted manner, resulting in a long development time; on the other hand, in the development During the process, the information that needs to be masked is not masked in time, and there is a risk of information leakage; thus affecting the security performance of the Internet.
(2)定位掩码字段以及掩码处理的效率低下。(2) The efficiency of locating the mask field and mask processing is low.
当日志信息量大时,相关技术中定位掩码字段以及掩码处理的过程耗时较 长,可能会影响正常交易。When the amount of log information is large, the process of locating the mask field and mask processing in related technologies takes a long time, which may affect normal transactions.
本申请的实施例具有以下特点:Embodiments of the application have the following characteristics:
第一点、针对金融数据特点,利用经过优化的二分查找算法,在查找过程摒弃掉无效数据,从而能在较短的时间快速找到需掩码字段,并通过任意替换的算法进行掩码处理。The first point is that according to the characteristics of financial data, the optimized binary search algorithm is used to discard invalid data during the search process, so that the field to be masked can be quickly found in a short period of time, and the masking process can be performed through an arbitrary replacement algorithm.
第二点、通过配置掩码信息文件(相当于配置文件),动态实现在增加或减少掩码类型、增加或减少需掩码的交易、增加或减少掩码的字段的情况下的自动处理过程,整理实现不需要进行程序的二次开发,减少开发工作量并保证业务的快速响应;同时可以jar包的形式将配置掩码信息文件输出给其他系统以实现配置掩码信息文件的通用。The second point, by configuring the mask information file (equivalent to the configuration file), dynamically realize the automatic processing process in the case of increasing or decreasing the mask type, increasing or decreasing the transaction that needs to be masked, or increasing or decreasing the masked field , sorting out the need for secondary development of the program, reducing the development workload and ensuring rapid business response; at the same time, the configuration mask information file can be exported to other systems in the form of a jar package to realize the common configuration of the configuration mask information file.
下面,对本申请实施例提供的数据处理方法进行详细说明。Next, the data processing method provided by the embodiment of the present application will be described in detail.
处理流程可以参考图5,可以包括但不限于下述S501至S507。The processing flow may refer to FIG. 5 , and may include but not limited to the following S501 to S507.
S501、读取日志,获取配置文件的敏感字段。S501. Read the log to obtain sensitive fields of the configuration file.
其中,该敏感字段相当于掩码字段。Wherein, the sensitive field is equivalent to a mask field.
S502、判断日志有无需掩码的敏感信息。S502. Determine that the log contains sensitive information that does not need to be masked.
若日志有需掩码的敏感信息,则执行下述S503;若日志无需掩码的敏感信息,则执行下述S507。If the log contains sensitive information that needs to be masked, perform S503 below; if the log does not need sensitive information that needs to be masked, perform S507 below.
S503、复制日志信息。S503. Copy the log information.
S504、获取配置文件中敏感字段对应的掩码类型。S504. Obtain the mask type corresponding to the sensitive field in the configuration file.
S505、采用优化后的二分查找算法,基于该掩码类型进行掩码处理。S505. Perform mask processing based on the mask type by using an optimized binary search algorithm.
S506、输出掩码后的日志。S506. Output the masked log.
S507、结束。S507, end.
示例性的,由于支付柜面系统涉及资金支付、收付款账号和名称涉以及客户隐私信息,所以,需要对转账交易所涉及的客户账号和客户姓名,进行脱敏处理(掩码处理)。其中,客户账号与客户姓名属于不同的掩码类型,所以脱敏方式(掩码处理方式)也有所区别,需要针对客户账号与客户姓名分别进行个性化处理。Exemplarily, since the payment counter system involves fund payment, payment account number and name, and customer privacy information, it is necessary to perform desensitization (masking) on the customer account number and customer name involved in the transfer transaction. Wherein, the customer account number and the customer name belong to different mask types, so the desensitization methods (masking processing methods) are also different, and individualized processing needs to be performed on the customer account number and the customer name respectively.
首先,读取系统公共日志(含对外转账部分),判断是否含有掩码配置文件的字段信息。若含有掩码配置文件的字段信息,则继续下面的掩码处理;若未含有掩码配置文件的字段信息,则结束。First, read the system public log (including the external transfer part), and judge whether it contains the field information of the mask configuration file. If it contains the field information of the mask configuration file, continue the following mask processing; if it does not contain the field information of the mask configuration file, it will end.
示例性的,配置文件可以包括:Exemplarily, configuration files may include:
<dict-group field=“mask_fields”describe=“掩码字段类型0-卡号账号,1-姓名”><dict-group field="mask_fields" describe="mask field type 0-card number and account number, 1-name">
<dict-item value=“bind_card_no”name=“0”/><dict-item value="bind_card_no" name="0"/>
<dict-item value=“card_no”name=“0”/><dict-item value="card_no" name="0"/>
<dict-item value=“cust_name”name=“1”/><dict-item value="cust_name" name="1"/>
<dict-item value=“rcv_name”name=“1”/><dict-item value="rcv_name" name="1"/>
</dict-group></dict-group>
其中,<dict-group field=“mask_fields”describe=“掩码字段类型0-卡号账号,1-姓名”>表示:创建两个掩码类型,卡号账号和姓名,卡号账号对应掩码字段类型为0,姓名对应掩码字段类型为1。Among them, <dict-group field="mask_fields" describe="mask field type 0-card number and account, 1-name"> means: create two mask types, card number and name, and the corresponding mask field type of card number is 0, the mask field type corresponding to the name is 1.
<dict-item value=“bind_card_no”name=“0”/>表示:创建一个字典项,在掩码字段类型为0的情况下,敏感字段的字段信息可以包括绑定的他行的银行卡号。<dict-item value="bind_card_no" name="0"/> means: create a dictionary item, and when the mask field type is 0, the field information of the sensitive field can include the bound bank card number of other banks.
<dict-item value=“card_no”name=“0”/>表示:创建一个字典项,在掩码字段类型为0的情况下,敏感字段的字段信息可以包括客户卡号。<dict-item value="card_no" name="0"/> means: create a dictionary item, and when the mask field type is 0, the field information of the sensitive field can include the customer card number.
<dict-item value=“cust_name”name=“1”/>表示:创建一个字典项,在掩码字段类型为1的情况下,敏感字段的字段信息可以包括卡号对应的姓名。<dict-item value="cust_name" name="1"/> means: create a dictionary item, and when the mask field type is 1, the field information of the sensitive field can include the name corresponding to the card number.
<dict-item value=“rcv_name”name=“1”/>表示:创建一个字典项,在掩码字段类型为1的情况下,敏感字段的字段信息可以包括绑定他行的银行卡号对应的姓名。<dict-item value="rcv_name" name="1"/> means: create a dictionary item, and when the mask field type is 1, the field information of the sensitive field can include the bank card number corresponding to other banks Name.
</dict-group>表示:该配置文件为字典类型。</dict-group> means: the configuration file is a dictionary type.
其次,为了避免掩码处理影响到联机交易,先复制待处理的日志信息,把所有敏感字段作为一个数组(例如[bind_card_no,card_no,cust_name, rcv_name]),循环取出日志信息中的每一个字段,来判断日志信息中是否含有上述数组中的敏感字段,通过优化后的二分查找算法能快速定位出敏感信息。Secondly, in order to avoid mask processing from affecting online transactions, first copy the log information to be processed, and use all sensitive fields as an array (such as [bind_card_no, card_no, cust_name, rcv_name]), and loop out each field in the log information, To determine whether the log information contains sensitive fields in the above array, the optimized binary search algorithm can quickly locate sensitive information.
其中,将日志信息数组的每个信息按首字母对应的ASCII码(相当于第二数值)进行升序进行排序,由于对于一个日志信息,对应唯一的首字母,那么只需在有序数组中,通过算法查找出目标敏感字段对应的数值,即可找到目标敏感信息。Among them, each information in the log information array is sorted in ascending order according to the ASCII code corresponding to the first letter (equivalent to the second value). Since a log information corresponds to a unique initial letter, then only in the ordered array, The target sensitive information can be found by finding the value corresponding to the target sensitive field through the algorithm.
这里对优化后的二分查找算法的原理进行说明:优化后的二分查找算法实际上是在每次做二分查找之前,先做一次筛选工作,把不必要的元素过滤掉,这样可以极大地提高查找速度。Here is an explanation of the principle of the optimized binary search algorithm: the optimized binary search algorithm actually performs a screening work before each binary search to filter out unnecessary elements, which can greatly improve the search speed.
筛选的实现过程主要可以包括:例如,需要从数组1、3、7、15、18、24(相当于第一数组)中,查找到18(相当于第一数值也可以称为目标数值),则采用优化后的二分查找算法的处理过程为:计算数组中相邻两个数的差分别为2、4、8、3和6,取相邻差的最大值E(相当于第一参考值)为8,计算目标数值18与最小数1(相当于第一数组中下标为第一参数的第二数值)的第一差值为17(相当于第二参考值),将第一差值17与相邻差的最大值8相除得到一个第一参考值2.125(相当于第一筛选数量,也可以称为前向筛选元素的个数),在查找的过程中,直接将数组中序号从前向起的前3个数据都直接过滤掉,即直接过滤掉1、3和7;计算要查找的数18与最大数24(相当于第一数组中下标为第二参数的第二数值)的第二差值为6(相当于第三参考值),将第二差值6与最大值8相除得到一个第二参考值0.75(相当于第二筛选数量,也可以称为后向筛选元素的个数),将数组中序号从后向前起的1个数据直接过滤掉,即过滤掉24;这样,筛选后的数据包括15、18。即只需通过普通的二分查找算法在15和18中查找目标数值,这样筛选下来可以排除大量的不必要查找的元素,从而大大降低查找的比较次数,提高查找速度。The realization process of screening may mainly include: for example, it is necessary to find 18 (equivalent to the first value or also called the target value) from the arrays 1, 3, 7, 15, 18, 24 (equivalent to the first array), Then the processing procedure of the optimized binary search algorithm is adopted: the difference between two adjacent numbers in the calculation array is 2, 4, 8, 3 and 6 respectively, and the maximum value E of the adjacent difference (equivalent to the first reference value ) is 8, the first difference between the calculated target value 18 and the minimum number 1 (equivalent to the second value of the first parameter in the first array) is 17 (equivalent to the second reference value), and the first difference The value 17 is divided by the maximum value 8 of the adjacent difference to obtain a first reference value of 2.125 (equivalent to the first screening quantity, which can also be called the number of forward screening elements). The first 3 data from the front of the serial number are directly filtered out, that is, 1, 3 and 7 are directly filtered out; the number 18 to be searched and the maximum number 24 are calculated (equivalent to the second parameter whose subscript is the second parameter in the first array) value) is 6 (equivalent to the third reference value), divide the second difference value 6 by the maximum value 8 to get a second reference value 0.75 (equivalent to the second screening quantity, which can also be called the post The number of elements to be screened), directly filter out 1 piece of data whose sequence number is from the back to the front in the array, that is, filter out 24; in this way, the filtered data includes 15 and 18. That is, you only need to find the target value in 15 and 18 through the ordinary binary search algorithm, so that a large number of unnecessary search elements can be eliminated after filtering, thereby greatly reducing the number of search comparisons and improving the search speed.
示例性的,优化后的二分查找算法对应的程序可以包括:Exemplarily, the program corresponding to the optimized binary search algorithm may include:
Low(相当于第一参数):1 //表示给变量Low赋值1//Low (equivalent to the first parameter): 1 //Indicates the assignment of 1 to the variable Low//
High(相当于第二参数):n //表示给变量High赋值n//High (equivalent to the second parameter): n // means to assign a value to the variable High n//
Index:0 //表示给变量Index赋值0//Index: 0 //Indicates assigning a value of 0 to the variable Index//
Low_span(相当于第一筛选数量):0 //表示给变量Low_span赋值0//Low_span (equivalent to the first screening quantity): 0 //Indicates that the variable Low_span is assigned a value of 0//
High_span(相当于第二筛选数量):0 //表示给变量High_span赋值0//High_span (equivalent to the second screening quantity): 0 //Indicates that the variable High_span is assigned a value of 0//
While Low<=High and Low_span>=0 and High_span>=0 //表示在Low小于或等于high,且Low_span大于或等于0,且High_span大于或等于0的情况下,执行下述do语句//While Low<=High and Low_span>=0 and High_span>=0 //Indicates that when Low is less than or equal to high, and Low_span is greater than or equal to 0, and High_span is greater than or equal to 0, execute the following do statement//
do Low_span=(X-A[Low])/M //表示给变量Low_span的赋值等于X与数组A中的序号为Low的数值之差除以M的结果//do Low_span=(X-A[Low])/M //Indicates that the assignment to the variable Low_span is equal to the result of dividing the difference between X and the value whose serial number is Low in array A divided by M//
Low=Low+[Low_span] //表示在一次循环中,Low的赋值等于Low上次赋值与[Low_span]之和//Low=Low+[Low_span] //Indicates that in a cycle, the assignment of Low is equal to the sum of the last assignment of Low and [Low_span]//
High_span=(A[High]-X)/M //表示High_span的赋值等于A中的序号为High的数值与X之差除以M的结果//High_span=(A[High]-X)/M //Indicates that the assignment of High_span is equal to the result of dividing the difference between the value of High in A and X by M//
High=High-[High_span] //表示在一次循环中,High的赋值等于High上次赋值与[High_span]之和//High=High-[High_span] //Indicates that in a loop, the assignment of High is equal to the sum of the last assignment of High and [High_span]//
Mid(相当于第三参数):(Low+High)/2 //表示Low的赋值等于High的赋值与High的赋值的平均值//Mid (equivalent to the third parameter): (Low+High)/2 //Indicates that the assignment of Low is equal to the average of the assignment of High and the assignment of High//
If X=A[Mid] //表示如果X与A中的序号为Mid的数值相同,则执行下述Then语句,否则跳出If语句//If X=A[Mid] //Indicates that if X is the same as the value of Mid in A, execute the following Then statement, otherwise jump out of the If statement//
Then Index=mid //表示Then语句为给Index赋值为mid//Then Index=mid //Indicates that the Then statement is to assign mid to Index//
Break; //表示跳出If语句//Break; //Indicates to jump out of the If statement//
Else if X<A[Mid] //表示如果X小于A中的序号为Mid的数值,则执行下述Then语句,否则执行下述Else语句//Else if X<A[Mid] //Indicates that if X is less than the value of Mid in A, execute the following Then statement, otherwise execute the following Else statement//
Then High=Mid-1 //表示Then语句为给High赋值为mid-1//Then High=Mid-1 //Indicates that the Then statement is to assign High to mid-1//
Else Low=Mid+1 //表示Else语句为给Low赋值为mid+1//Else Low=Mid+1 //Indicates that the Else statement is to assign mid+1 to Low//
Return Index //表示返回Index的值//Return Index //Represents the return value of Index//
其中,Low表示数组的起始序号;High表示数组的长度;Index表示返回序号;Low_span表示前向筛选的元素个数;High_span表示后向筛选的元素个 数;A为数组;X为目标数值;M为数组中相邻两数差的最大值。Among them, Low indicates the starting serial number of the array; High indicates the length of the array; Index indicates the return serial number; Low_span indicates the number of elements filtered forward; High_span indicates the number of elements filtered backward; A is the array; X is the target value; M is the maximum value of the difference between two adjacent numbers in the array.
由于数组A中任何2个相邻元素之差都不大于M;因此,如果X≥A[Low],则A中从A[Low]到A[Low+t](这里t=(X-A[Low])/M)之间的元素必定小于X;这样在接下来的查找过程中就可以直接跳过这些元素。同样,如果X≤A[High],则A中从A[High-t]到A[High](这里t=(A[High]-X)/M)之间的元素必定大于X,同样也可以直接跳过这些元素。Since the difference between any two adjacent elements in the array A is not greater than M; therefore, if X≥A[Low], then from A[Low] to A[Low+t] in A (here t=(X-A[Low] ])/M) must be smaller than X; in this way, these elements can be directly skipped in the next search process. Similarly, if X≤A[High], the elements in A from A[High-t] to A[High] (here t=(A[High]-X)/M) must be greater than X, and also These elements can be skipped directly.
相对于二分查找算法增加了2个指示变量:Low_span和High_span,Low_span表示从A[Low]向前筛选的元素的个数,High_span表示从A[High]向后筛选的元素的个数。如果Low_span和High_span其中之一小于0,说明所要查找的元素X不在A[Low]和A[High]之间,也就是说A中根本不存在元素,那么可以直接结束循环。Compared with the binary search algorithm, two indicator variables are added: Low_span and High_span. Low_span indicates the number of elements filtered forward from A[Low], and High_span indicates the number of elements filtered backward from A[High]. If one of Low_span and High_span is less than 0, it means that the element X to be searched is not between A[Low] and A[High], that is to say, there is no element in A, then the loop can be ended directly.
这样通过上述优化后的二分查找算法可以最大效率查找需脱敏的信息进行处理。In this way, through the above-mentioned optimized binary search algorithm, the information to be desensitized can be found and processed with maximum efficiency.
通过上述优化后的二分查找算法可以定位出敏感字段,对于每一个敏感字段值需进行脱敏处理(掩码),从配置文件读取敏感字段对应的类型(如bind_card_no对应的是0-客户账号类型),针对bind_card_no对应的日志明文信息进行感字段对应的类型对应的掩码处理。Sensitive fields can be located through the above optimized binary search algorithm, each sensitive field value needs to be desensitized (masked), and the type corresponding to the sensitive field is read from the configuration file (for example, bind_card_no corresponds to 0-customer account type), for the log plaintext information corresponding to bind_card_no, perform mask processing corresponding to the type corresponding to the sense field.
考虑信息量大且要求处理速度快、数据不需进行处理后的还原,这里的掩码处理方式可以使用任意替换的算法,例如,可以通过特殊字符(*等)代替部分真值,结合金融账号数据的特点,只显示卡号的前六位和尾数后四位,其他位则以“*”覆盖处理,拼接好掩码后的日志信息待循环处理结束后统一输出打印。如图6所示,经过掩码处理过后的日志文件,配置文件涉及到的敏感字段已进行掩码处理,从性能方面看,通过查找和脱敏的算法处理,在打印日志的同时不影响交易速度。Considering that the amount of information is large and requires fast processing speed, and the data does not need to be restored after processing, the mask processing method here can use any replacement algorithm. For example, special characters (*, etc.) can be used to replace part of the true value, combined with financial account numbers The characteristics of the data, only the first six digits of the card number and the last four digits of the mantissa are displayed, and the other digits are covered with "*". The log information after splicing the mask will be output and printed uniformly after the loop processing is completed. As shown in Figure 6, the masked log file, the sensitive fields involved in the configuration file have been masked. From the performance point of view, through the search and desensitization algorithm processing, the transaction will not be affected while printing the log speed.
最后,上述的配置文件可以进行横向和纵向的扩展,即增加敏感字段和掩码类型。例如,可以增加掩码类型(如2-身份证,3-手机号)。对于后续纵向扩展敏感字段(敏感字段变化相对高频)不需动代码二次开发,只需更改配置文 件即可,这样大大提高了开发效率且针对突发客户信息安全事件能快速响应(如中信的客户信息泄露事件),做到短时开发。Finally, the above-mentioned configuration files can be expanded horizontally and vertically, that is, adding sensitive fields and mask types. For example, mask types (such as 2-ID card, 3-mobile phone number) can be added. For the subsequent vertical expansion of sensitive fields (sensitive fields change relatively frequently), there is no need to change the code for secondary development, just change the configuration file, which greatly improves the development efficiency and can quickly respond to unexpected customer information security incidents (such as CITIC customer information leakage incidents), to achieve short-term development.
为实现上述数据处理方法,本申请实施例的一种数据处理装置,下面结合图7所示的数据处理装置的结构示意图进行说明。In order to implement the above data processing method, a data processing device according to an embodiment of the present application will be described below in conjunction with the schematic structural diagram of the data processing device shown in FIG. 7 .
如图7所示,数据处理装置70包括:第一确定单元701、获得单元702、查找单元703、第二确定单元704和执行单元705。其中:As shown in FIG. 7 , the data processing device 70 includes: a first determining unit 701 , an obtaining unit 702 , a searching unit 703 , a second determining unit 704 and an executing unit 705 . in:
第一确定单元701,配置为在配置文件中,确定第一掩码字段和第一掩码类型;所述第一掩码类型为所述第一掩码字段所属的掩码类型;The first determining unit 701 is configured to determine a first mask field and a first mask type in the configuration file; the first mask type is the mask type to which the first mask field belongs;
获得单元702,配置为获得第一数组和针对所述第一掩码字段的第一数值;所述第一数组包括按照数值大小排序的至少三个第二数值;一个所述第二数值用于指向一个日志信息;所述第一数值用于指向所述第一掩码字段;The obtaining unit 702 is configured to obtain a first array and a first numerical value for the first mask field; the first array includes at least three second numerical values sorted by numerical value; one of the second numerical values is used for point to a log message; the first value is used to point to the first mask field;
查找单元703,配置为在所述第一数组中查找与所述第一数值相同的目标第二数值;A search unit 703, configured to search for a target second value that is the same as the first value in the first array;
第二确定单元704,配置为将所述目标第二数值指向的日志信息,确定为待掩码数据;The second determining unit 704 is configured to determine the log information pointed to by the target second value as the data to be masked;
执行单元705,配置为对所述待掩码数据,执行所述第一掩码类型对应的掩码操作。The executing unit 705 is configured to execute a masking operation corresponding to the first masking type on the data to be masked.
在一些实施例中,查找单元703还配置为:In some embodiments, the search unit 703 is further configured to:
确定第一参数和第二参数;所述第一参数为一;所述第二参数为N;所述N用于表征所述第一数组的长度;Determining a first parameter and a second parameter; the first parameter is one; the second parameter is N; the N is used to characterize the length of the first array;
判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系;所述第三参数为所述第一参数与所述第二参数的平均值;judging in the first array, the subscript is the magnitude relationship between the second value of the third parameter and the first value; the third parameter is the average value of the first parameter and the second parameter ;
若所述下标为第三参数的第二数值等于所述第一数值,则确定所述目标第二数值为所述下标为第三参数的第二数值;If the second numerical value of the third parameter whose subscript is equal to the first numerical value, then determine that the second numerical value of the target is the second numerical value of the third parameter whose subscript is;
若所述下标为第三参数的第二数值小于所述第一数值,则修改所述第一参数为所述第三参数加一,并执行所述判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系;If the subscript is that the second value of the third parameter is smaller than the first value, then modify the first parameter to add one to the third parameter, and perform the determination of the subscript in the first array is the magnitude relationship between the second value of the third parameter and the first value;
若所述下标为第三参数的第二数值大于所述第一数值,则修改所述第二参数为所述第三参数减一,并执行所述判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系。If the subscript is that the second value of the third parameter is greater than the first value, then modify the second parameter to be the third parameter minus one, and perform the determination of the subscript in the first array is the magnitude relationship between the second value of the third parameter and the first value.
在一些实施例中,查找单元703还配置为:在判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系之前执行:In some embodiments, the search unit 703 is further configured to: before judging the magnitude relationship between the second value whose subscript is the third parameter and the first value in the first array:
基于所述第一数组,确定第一参考值;determining a first reference value based on the first array;
将所述第一数值减去,所述第一数组中下标为所述第一参数的第二数值的结果,确定为第二参考值;Subtracting the first numerical value, the result of subscripting the second numerical value of the first parameter in the first array, is determined as the second reference value;
确定第一筛选数量为所述第二参考值除以所述第一参考值的结果;determining the first screening quantity as the result of dividing the second reference value by the first reference value;
按照下标增大的方向,将所述第一数组中下标为所述第一参数起,所述第一筛选数量的第二数值过滤掉。According to the increasing direction of the subscript, from the subscript in the first array to the first parameter, the second value of the first screening quantity is filtered out.
在一些实施例中,查找单元703还配置为:在判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系之前执行:In some embodiments, the search unit 703 is further configured to: before judging the magnitude relationship between the second value whose subscript is the third parameter and the first value in the first array:
基于所述第一数组,确定第一参考值;determining a first reference value based on the first array;
将所述第一数组中下标为所述第二参数的第二数值,减去所述第一数值的结果,确定为第三参考值;Determining the result of subtracting the first numerical value from the second numerical value of the second parameter as the subscript in the first array as the third reference value;
确定第二筛选数量为所述第三参考值除以所述第一参考值的结果;determining a second screening quantity as the result of dividing the third reference value by the first reference value;
按照下标减小的方向,将所述第一数组中下标为所述第二参数起,所述第二筛选数量的第二数值过滤掉。According to the decreasing direction of the subscript, from the subscript in the first array to the second parameter, the second numerical value of the second screening quantity is filtered out.
在一些实施例中,查找单元703还配置为:In some embodiments, the search unit 703 is further configured to:
计算所述第一数组中,所述至少三个第二数值中相邻两个第二数值之差,得到至少两个相邻差;calculating the difference between two adjacent second values among the at least three second values in the first array to obtain at least two adjacent differences;
确定所述第四参考值为所述至少两个相邻差中的最大值。Determining the fourth reference value as a maximum value among the at least two adjacent differences.
在一些实施例中,查找单元703还配置为:In some embodiments, the search unit 703 is further configured to:
在第一筛选数量小于零的情况下,确定所述第一数组中不存在所述目标第二数值;In the case that the first screening quantity is less than zero, determining that the target second value does not exist in the first array;
或者,在第二筛选数量小于零的情况下,确定所述第一数组中不存在所述 目标第二数值。Or, in the case that the second screening quantity is less than zero, it is determined that the target second value does not exist in the first array.
在一些实施例中,数据处理装置70还可以包括配置单元,配置单元配置为:所述配置文件不包括所述第一掩码字段;在执行所述在配置文件中,确定第一掩码字段和第一掩码类型之前执行:In some embodiments, the data processing device 70 may further include a configuration unit configured to: the configuration file does not include the first mask field; when executing the configuration file, determine the first mask field and execute before the first mask type:
在所述配置文件中增加第一字典项;Add the first dictionary entry in the configuration file;
在所述第一字典项中配置所述第一掩码字段,以及所述第一掩码字段所属的掩码类型,以得到新配置文件;Configuring the first mask field and the mask type to which the first mask field belongs in the first dictionary item to obtain a new configuration file;
对应的,第一确定单元701还配置为:Correspondingly, the first determining unit 701 is further configured to:
在所述新配置文件中,确定第一掩码字段和第一掩码类型。In the new configuration file, a first mask field and a first mask type are determined.
需要说明的是,本申请实施例提供的数据处理装置包括所包括的各单元,可以通过电子设备中的处理器来实现;当然也可通过具体的逻辑电路实现;在实施的过程中,处理器可以为中央处理器(CPU,Central Processing Unit)、微处理器(MPU,Micro Processor Unit)、数字信号处理器(DSP,Digital Signal Processor)或现场可编程门阵列(FPGA,Field-Programmable Gate Array)等。It should be noted that the data processing device provided in the embodiment of the present application includes each included unit, which can be realized by a processor in an electronic device; of course, it can also be realized by a specific logic circuit; in the process of implementation, the processor It can be a central processing unit (CPU, Central Processing Unit), a microprocessor (MPU, Micro Processor Unit), a digital signal processor (DSP, Digital Signal Processor) or a field programmable gate array (FPGA, Field-Programmable Gate Array) wait.
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。The description of the above device embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the device embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.
需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的数据处理方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。It should be noted that, in the embodiment of the present application, if the above-mentioned data processing method is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solutions of the embodiments of the present application or the part that contributes to the related technologies can be embodied in the form of software products. The computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
为实现上述数据处理方法,本申请实施例提供一种电子设备,包括存储器 和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述实施例中提供的数据处理方法中的步骤。In order to implement the above data processing method, an embodiment of the present application provides an electronic device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the above implementation when executing the program The steps in the data processing method provided in the example.
下面结合图8所示的电子设备80,对电子设备的结构图进行说明。The structural diagram of the electronic device will be described below with reference to the electronic device 80 shown in FIG. 8 .
在一示例中,电子设备80可以为上述电子设备。如图8所示,所述电子设备80包括:一个处理器801、至少一个通信总线802、用户接口803、至少一个外部通信接口804和存储器805。其中,通信总线802配置为实现这些组件之间的连接通信。其中,用户接口803可以包括显示屏,外部通信接口804可以包括标准的有线接口和无线接口。In an example, the electronic device 80 may be the above-mentioned electronic device. As shown in FIG. 8 , the electronic device 80 includes: a processor 801 , at least one communication bus 802 , a user interface 803 , at least one external communication interface 804 and a memory 805 . Wherein, the communication bus 802 is configured to realize connection and communication between these components. Wherein, the user interface 803 may include a display screen, and the external communication interface 804 may include a standard wired interface and a wireless interface.
存储器805配置为存储由处理器801可执行的指令和应用,还可以缓存待处理器801以及电子设备中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。The memory 805 is configured to store instructions and applications executable by the processor 801, and can also cache data to be processed or processed by the processor 801 and various modules in the electronic device (for example, image data, audio data, voice communication data and video data) Communication data), which can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
第四方面,本申请实施例提供一种存储介质,也就是计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述实施例中提供的数据处理方法中的步骤。In a fourth aspect, the embodiments of the present application provide a storage medium, that is, a computer-readable storage medium, on which a computer program is stored. When the computer program is executed by a processor, the steps in the data processing method provided in the above-mentioned embodiments are implemented. .
这里需要指出的是:以上存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请存储介质和设备实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。It should be pointed out here that: the descriptions of the above storage medium and device embodiments are similar to the descriptions of the above method embodiments, and have similar beneficial effects to those of the method embodiments. For technical details not disclosed in the storage medium and device embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一些实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。It should be understood that reference throughout the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application. Thus, appearances of "in one embodiment" or "in some embodiments" throughout this specification are not necessarily referring to the same embodiments. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation. The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意 在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods, such as: multiple units or components can be combined, or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration The unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps to realize the above method embodiments can be completed by hardware related to program instructions, and the aforementioned programs can be stored in computer-readable storage media. When the program is executed, the execution includes: The steps of the foregoing method embodiments; and the foregoing storage media include: removable storage devices, read-only memory (Read Only Memory, ROM), magnetic disks or optical disks and other media that can store program codes.
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样 的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated units of the present application are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solutions of the embodiments of the present application or the part that contributes to the related technologies can be embodied in the form of software products. The computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only the embodiment of the present application, but the scope of protection of the present application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, and should covered within the scope of protection of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims (10)

  1. 一种数据处理方法,所述方法包括:A data processing method, the method comprising:
    在配置文件中,确定第一掩码字段和第一掩码类型;所述第一掩码类型为所述第一掩码字段所属的掩码类型;In the configuration file, a first mask field and a first mask type are determined; the first mask type is the mask type to which the first mask field belongs;
    获得第一数组和针对所述第一掩码字段的第一数值;所述第一数组包括按照数值大小排序的至少三个第二数值;一个所述第二数值用于指向一个日志信息;所述第一数值用于指向所述第一掩码字段;Obtaining a first array and a first numerical value for the first mask field; the first array includes at least three second numerical values sorted by numerical value; one of the second numerical values is used to point to a log information; the The first value is used to point to the first mask field;
    在所述第一数组中,查找与所述第一数值相同的目标第二数值;In the first array, find a target second value that is the same as the first value;
    将所述目标第二数值指向的日志信息,确定为待掩码数据;Determining the log information pointed to by the second value of the target as the data to be masked;
    对所述待掩码数据,执行所述第一掩码类型对应的掩码操作。For the data to be masked, perform a masking operation corresponding to the first mask type.
  2. 根据权利要求1所述的方法,所述在所述第一数组中,查找与所述第一数值相同的目标第二数值,包括:The method according to claim 1, wherein in the first array, searching for a target second value identical to the first value comprises:
    确定第一参数和第二参数;所述第一参数为一;所述第二参数为N;所述N用于表征所述第一数组的长度;Determining a first parameter and a second parameter; the first parameter is one; the second parameter is N; the N is used to characterize the length of the first array;
    判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系;所述第三参数为所述第一参数与所述第二参数的平均值;judging in the first array, the subscript is the magnitude relationship between the second value of the third parameter and the first value; the third parameter is the average value of the first parameter and the second parameter ;
    若所述下标为第三参数的第二数值等于所述第一数值,则确定所述目标第二数值为所述下标为第三参数的第二数值;If the second numerical value of the third parameter whose subscript is equal to the first numerical value, then determine that the second numerical value of the target is the second numerical value of the third parameter whose subscript is;
    若所述下标为第三参数的第二数值小于所述第一数值,则修改所述第一参数为所述第三参数加一,并执行所述判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系;If the subscript is that the second value of the third parameter is smaller than the first value, then modify the first parameter to add one to the third parameter, and perform the determination of the subscript in the first array is the magnitude relationship between the second value of the third parameter and the first value;
    若所述下标为第三参数的第二数值大于所述第一数值,则修改所述第二参数为所述第三参数减一,并执行所述判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系。If the subscript is that the second value of the third parameter is greater than the first value, then modify the second parameter to be the third parameter minus one, and perform the determination of the subscript in the first array is the magnitude relationship between the second value of the third parameter and the first value.
  3. 根据权利要求2所述的方法,在所述判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系之前,所述方法还包括:According to the method according to claim 2, before the judgment of the size relationship between the second value of the third parameter and the first value in the first array, the method further includes:
    基于所述第一数组,确定第一参考值;determining a first reference value based on the first array;
    将所述第一数值减去,所述第一数组中下标为所述第一参数的第二数值的结果,确定为第二参考值;Subtracting the first numerical value, the result of subscripting the second numerical value of the first parameter in the first array, is determined as the second reference value;
    确定第一筛选数量为所述第二参考值除以所述第一参考值的结果;determining the first screening quantity as the result of dividing the second reference value by the first reference value;
    按照下标增大的方向,将所述第一数组中下标为所述第一参数起,所述第一筛选数量的第二数值过滤掉。According to the increasing direction of the subscript, from the subscript in the first array to the first parameter, the second value of the first screening quantity is filtered out.
  4. 根据权利要求2所述的方法,在所述判断所述第一数组中,下标为第三参数的第二数值与所述第一数值之间的大小关系之前,所述方法还包括:According to the method according to claim 2, before the judgment of the size relationship between the second value of the third parameter and the first value in the first array, the method further includes:
    基于所述第一数组,确定第一参考值;determining a first reference value based on the first array;
    将所述第一数组中下标为所述第二参数的第二数值,减去所述第一数值的结果,确定为第三参考值;Determining the result of subtracting the first numerical value from the second numerical value of the second parameter as the subscript in the first array as the third reference value;
    确定第二筛选数量为所述第三参考值除以所述第一参考值的结果;determining a second screening quantity as the result of dividing the third reference value by the first reference value;
    按照下标减小的方向,将所述第一数组中下标为所述第二参数起,所述第二筛选数量的第二数值过滤掉。According to the decreasing direction of the subscript, from the subscript in the first array to the second parameter, the second numerical value of the second screening quantity is filtered out.
  5. 根据权利要求3或4所述的方法,所述基于所述第一数组,确定第一参考值,包括:The method according to claim 3 or 4, said determining a first reference value based on said first array, comprising:
    计算所述第一数组中,所述至少三个第二数值中相邻两个第二数值之差,得到至少两个相邻差;calculating the difference between two adjacent second values among the at least three second values in the first array to obtain at least two adjacent differences;
    确定所述第四参考值为所述至少两个相邻差中的最大值。Determining the fourth reference value as a maximum value among the at least two adjacent differences.
  6. 根据权利要求3或4所述的方法,所述方法还包括:The method according to claim 3 or 4, said method further comprising:
    在第一筛选数量小于零的情况下,确定所述第一数组中不存在所述目标第二数值;In the case that the first screening quantity is less than zero, determining that the target second value does not exist in the first array;
    或者,在第二筛选数量小于零的情况下,确定所述第一数组中不存在所述目标第二数值。Or, in the case that the second screening quantity is less than zero, it is determined that the target second value does not exist in the first array.
  7. 根据权利要求1所述的方法,所述配置文件不包括所述第一掩码字段;在执行所述在配置文件中,确定第一掩码字段和第一掩码类型之前,所述方法还包括:The method according to claim 1, wherein the configuration file does not include the first mask field; before performing the in the configuration file, before determining the first mask field and the first mask type, the method further includes include:
    在所述配置文件中增加第一字典项;Add the first dictionary entry in the configuration file;
    在所述第一字典项中配置所述第一掩码字段,以及所述第一掩码字段所属的掩码类型,以得到新配置文件;Configuring the first mask field and the mask type to which the first mask field belongs in the first dictionary item to obtain a new configuration file;
    对应的,所述在配置文件中,确定第一掩码字段和第一掩码类型,包括:Correspondingly, in the configuration file, the first mask field and the first mask type are determined, including:
    在所述新配置文件中,确定第一掩码字段和第一掩码类型。In the new configuration file, a first mask field and a first mask type are determined.
  8. 一种数据处理装置,所述装置包括:A data processing device, said device comprising:
    第一确定单元,配置为在配置文件中,确定第一掩码字段和第一掩码类型;所述第一掩码类型为所述第一掩码字段所属的掩码类型;The first determining unit is configured to determine a first mask field and a first mask type in the configuration file; the first mask type is the mask type to which the first mask field belongs;
    获得单元,配置为获得第一数组和针对所述第一掩码字段的第一数值;所述第一数组包括按照数值大小排序的至少三个第二数值;一个所述第二数值用于指向一个日志信息;所述第一数值用于指向所述第一掩码字段;An obtaining unit configured to obtain a first array and a first value for the first mask field; the first array includes at least three second values sorted by value; one second value is used to point to A log message; the first value is used to point to the first mask field;
    查找单元,配置为在所述第一数组中查找与所述第一数值相同的目标第二数值;a search unit configured to search for a target second value identical to the first value in the first array;
    第二确定单元,配置为将所述目标第二数值指向的日志信息,确定为待掩码数据;The second determination unit is configured to determine the log information pointed to by the target second value as the data to be masked;
    执行单元,配置为对所述待掩码数据,执行所述第一掩码类型对应的掩码操作。The executing unit is configured to execute a masking operation corresponding to the first masking type on the data to be masked.
  9. 一种电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至7任一项所述的数据处理方法。An electronic device, comprising a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the data processing method according to any one of claims 1 to 7 when executing the program .
  10. 一种存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现权利要求1至7任一项所述的数据处理方法。A storage medium on which a computer program is stored, and when the computer program is executed by a processor, the data processing method according to any one of claims 1 to 7 is realized.
PCT/CN2022/100534 2021-12-16 2022-06-22 Data processing method and apparatus, device, and storage medium WO2023109066A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111545063.1 2021-12-16
CN202111545063.1A CN114444114A (en) 2021-12-16 2021-12-16 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023109066A1 true WO2023109066A1 (en) 2023-06-22

Family

ID=81364002

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100534 WO2023109066A1 (en) 2021-12-16 2022-06-22 Data processing method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN114444114A (en)
WO (1) WO2023109066A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114444114A (en) * 2021-12-16 2022-05-06 深圳前海微众银行股份有限公司 Data processing method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321363A (en) * 2019-04-19 2019-10-11 中国工商银行股份有限公司 Data retrieval method and device
CN111832070A (en) * 2020-06-12 2020-10-27 北京百度网讯科技有限公司 Data mask method and device, electronic equipment and storage medium
CN112307070A (en) * 2020-11-23 2021-02-02 深圳前海微众银行股份有限公司 Mask data query method, device and equipment
CN113569291A (en) * 2021-08-02 2021-10-29 京东科技控股股份有限公司 Log mask method and device
CN114444114A (en) * 2021-12-16 2022-05-06 深圳前海微众银行股份有限公司 Data processing method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321363A (en) * 2019-04-19 2019-10-11 中国工商银行股份有限公司 Data retrieval method and device
CN111832070A (en) * 2020-06-12 2020-10-27 北京百度网讯科技有限公司 Data mask method and device, electronic equipment and storage medium
CN112307070A (en) * 2020-11-23 2021-02-02 深圳前海微众银行股份有限公司 Mask data query method, device and equipment
CN113569291A (en) * 2021-08-02 2021-10-29 京东科技控股股份有限公司 Log mask method and device
CN114444114A (en) * 2021-12-16 2022-05-06 深圳前海微众银行股份有限公司 Data processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114444114A (en) 2022-05-06

Similar Documents

Publication Publication Date Title
EP3602351B1 (en) Apparatus and method for distributed query processing utilizing dynamically generated in-memory term maps
WO2021083239A1 (en) Graph data query method and apparatus, and device and storage medium
TW202016815A (en) Method, apparatus and system for executing block chain transaction in parallelized manner
CN111913955A (en) Data sorting processing device, method and storage medium
CN111813805A (en) Data processing method and device
CN106462633B (en) Efficiently storing related sparse data in a search index
WO2016177279A1 (en) Data processing method and system
TWI686705B (en) Paging query method and device and electronic equipment
WO2022052396A1 (en) Advertisement blocking method and apparatus
Slagter et al. An adaptive and memory efficient sampling mechanism for partitioning in MapReduce
WO2023124217A1 (en) Method and device for acquiring comprehensively sorted data of multi-column data
US11126478B2 (en) System and method for processing of events
WO2023109066A1 (en) Data processing method and apparatus, device, and storage medium
CN106909554A (en) A kind of loading method and device of database text table data
CN111625718A (en) User portrait construction method based on user search keyword data
US20210397621A1 (en) System and Method for Processing of Events
US20200379796A1 (en) Cluster expansion method and apparatus, electronic device and storage medium
US20230153455A1 (en) Query-based database redaction
KR20220094551A (en) System for performing searching and analysis based on in-memory computing for real-time data processing, analysis method, and computer program
CA3144051A1 (en) Data sorting method, device, and system
US20130173647A1 (en) String matching device based on multi-core processor and string matching method thereof
CN103164491A (en) Method and device for processing and retrieving data
CN113821514A (en) Data splitting method and device, electronic equipment and readable storage medium
CN113419792A (en) Event processing method and device, terminal equipment and storage medium
JP6870454B2 (en) Analytical equipment, analytical programs and analytical methods

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22905832

Country of ref document: EP

Kind code of ref document: A1