WO2023109066A1

WO2023109066A1 - Data processing method and apparatus, device, and storage medium

Info

Publication number: WO2023109066A1
Application number: PCT/CN2022/100534
Authority: WO
Inventors: 康志鹏
Original assignee: 深圳前海微众银行股份有限公司
Priority date: 2021-12-16
Filing date: 2022-06-22
Publication date: 2023-06-22
Also published as: CN114444114A

Abstract

The present application discloses a data processing method and apparatus, a device, and a storage medium. The method comprises: determining a first mask field and a first mask type in a configuration file, wherein the first mask type is a mask type to which the first mask field belongs; obtaining a first array and a first numerical value for the first mask field, wherein the first array comprises at least three second numerical values sorted according to the magnitude of the numerical value, one second numerical value is used for pointing to one piece of log information, and the first numerical value is used for pointing to the first mask field; searching for a target second numerical value which is the same as the first numerical value in the first array; determining, as data to be masked, the log information to which the target second numerical value points; and performing a mask operation corresponding to the first mask type on said data. According to the solution of the present application, when the log information is masked, the data processing amount is reduced, and the processing efficiency is improved.

Description

A data processing method, device, equipment and storage medium

Cross References to Related Applications

This application is based on a Chinese patent application with application number 202111545063.1 and a filing date of December 16, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated into this application by reference.

technical field

This application relates to the technical field of data processing, involving but not limited to data processing methods, devices, equipment and storage media.

Background technique

With the rapid development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually transforming into financial technology (Fintech). However, due to the security and real-time requirements of the financial industry, more and more technical requirements high demands.

In related technologies, for the masking of log information, it is usually necessary to know which transaction needs to be masked, and then perform a full-text search for multiple log information of the transaction that needs to be masked, and find the target log containing sensitive fields (masked fields) Information (data to be masked); then determine the mask type corresponding to the mask field, and perform mask processing corresponding to the mask type on the target log information.

Wherein, when the data to be masked is determined, the general processing method is: through traversal, log information including the first mask field is searched in a plurality of log information as the data to be masked. In this way, when masking the log information, the amount of data processing is relatively large, and the processing efficiency is low.

Contents of the invention

The present application provides a data processing method, device, device, and storage medium, which reduce the amount of data processing and improve processing efficiency when masking log information.

The technical scheme of the present application is realized like this:

The present application provides a data processing method, the method comprising: in a configuration file, determining a first mask field and a first mask type; the first mask type is the first mask field to which the first mask field belongs mask type;

Obtaining a first array and a first numerical value for the first mask field; the first array includes at least three second numerical values sorted by numerical value; one of the second numerical values is used to point to a log information; the The first value is used to point to the first mask field;

In the first array, find a target second value that is the same as the first value;

Determining the log information pointed to by the second value of the target as the data to be masked;

For the data to be masked, perform a masking operation corresponding to the first mask type.

The present application provides a data processing device, the device comprising:

The first determining unit is configured to determine a first mask field and a first mask type in the configuration file; the first mask type is the mask type to which the first mask field belongs;

An obtaining unit configured to obtain a first array and a first value for the first mask field; the first array includes at least three second values sorted by value; one second value is used to point to A log message; the first value is used to point to the first mask field;

a search unit configured to search for a target second value identical to the first value in the first array;

The second determination unit is configured to determine the log information pointed to by the target second value as the data to be masked;

The executing unit is configured to execute a masking operation corresponding to the first masking type on the data to be masked.

The present application also provides an electronic device, including: a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the above data processing method when executing the program.

The present application also provides a storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above data processing method is realized.

The data processing method, device, device, and storage medium provided by the present application include: in the configuration file, determining the first mask field and the first mask type; the first mask type is the first mask The mask type to which the field belongs; a first array and a first numerical value for the first mask field are obtained; the first array includes at least three second numerical values sorted according to the magnitude of the numerical values; one second numerical value is used is used to point to a log information; the first value is used to point to the first mask field; in the first array, the second value of the target that is the same as the first value is searched; the second value of the target is The log information pointed to by the value is determined as the data to be masked; and the masking operation corresponding to the first mask type is performed on the data to be masked. For the solution of this application, the first numerical value corresponding to the first mask field and the second numerical value corresponding to the multiple log information are obtained; the process of finding the data to be masked in the multiple log information is converted into The process of searching for the first numerical value in the first array composed of two numerical values; because the data processing amount of the numerical value search process is small, the scheme of the present application reduces the data processing amount and improves the processing when masking the log information. efficiency.

Description of drawings

FIG. 1 is an optional structural schematic diagram of a data processing system provided in an embodiment of the present application;

Figure 2 is an optional schematic flow chart of the data processing method provided by the embodiment of the present application

FIG. 3 is an optional schematic flowchart of a data processing method provided in an embodiment of the present application;

FIG. 4 is an optional schematic flowchart of a data processing method provided in an embodiment of the present application;

FIG. 5 is an optional schematic flowchart of a data processing method provided in an embodiment of the present application;

FIG. 6 is an optional schematic diagram of a masked log file provided in an embodiment of the present application;

FIG. 7 is an optional structural schematic diagram of a data processing device provided in an embodiment of the present application;

FIG. 8 is a schematic structural diagram of an optional electronic device provided in an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the specific technical solutions of the application will be further described in detail below in conjunction with the drawings in the embodiments of the present application. The following examples are used to illustrate the present application, but not to limit the scope of the present application.

In the following description, references to "some embodiments" describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.

In the following description, the term "first\second\third" is used as an example to distinguish different objects, and does not represent a specific order for the objects, and does not have a limitation on the sequence. It can be understood that "first\second\third" can be interchanged in a specific order or sequential order if allowed, so that the embodiments of the application described here can be used in a manner other than what is illustrated or described here implemented sequentially.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.

Embodiments of the present application may provide a data processing method, device, device, and storage medium. In practical applications, the data processing method can be realized by a data processing device, and each functional entity in the data processing device can be composed of hardware resources of electronic equipment, such as computing resources such as processors, and communication resources (such as used to support the realization of optical cables, cellular, etc. mode of communication) collaborative implementation.

The data processing method provided in the embodiment of the present application is applied to a data processing system, and the data processing system includes a data processing terminal.

The data processing terminal is used to execute: in the configuration file, determine the first mask field and the first mask type; the first mask type is the mask type to which the first mask field belongs; obtain the first array and the first numerical value for the first mask field; the first array includes at least three second numerical values sorted by numerical value; one of the second numerical values is used to point to a log information; the first numerical value It is used to point to the first mask field; in the first array, search for a target second value that is the same as the first value; determine the log information pointed to by the target second value as to be masked data; performing a mask operation corresponding to the first mask type on the data to be masked.

Optionally, the data processing system may also include a client. The client is used to collect log information, and send the collected log information to the data processing end for processing.

As an example, the structure of the data processing system may be as shown in FIG. 1 , including: a data processing terminal 10 and a client terminal 20 . The data processing terminal 10 and the client terminal 20 can communicate through the network 30 .

Here, the data processing terminal 10 is used to execute: in the configuration file, determine the first mask field and the first mask type; the first mask type is the mask type to which the first mask field belongs; obtain A first array and a first value for the first mask field; the first array includes at least three second values sorted by value; one of the second values is used to point to a log information; the The first value is used to point to the first mask field; in the first array, search for the same target second value as the first value; determine the log information pointed to by the target second value as Data to be masked; performing a masking operation corresponding to the first mask type on the data to be masked.

Wherein, the data processing terminal 10 may include a physical machine (such as a server, etc.), or a virtual machine (such as a cloud platform, etc.).

The client 20 is used to collect log information, and send the collected log information to the data processing end for processing.

Wherein, the client 20 may include a mobile terminal device (such as a mobile phone, a tablet computer, etc.), or a non-mobile terminal device (such as a desktop computer, a server, etc.).

The network 30 is used for communication between the data processing terminal 10 and the client 20 . Wherein, the network 30 may be a wired network, or a wireless network and so on.

It should be noted that the data processing terminal 10 and the client terminal 20 may be deployed on the same electronic device, or may be deployed on different electronic devices.

Below, with reference to the schematic diagram of the data processing system shown in FIG. 1 , various embodiments of the data processing method, device, device, and storage medium provided by the embodiments of the present application will be described.

In the first aspect, the embodiment of the present application provides a data processing method, which is applied to a data processing device; wherein, the data processing device can be deployed at the data processing terminal 10 in FIG. 1 . Next, the data processing process provided by the embodiment of the present application will be described.

FIG. 2 shows a schematic flowchart of an optional data processing method. The data processing method provided in the embodiment of the present application is used to mask log information.

Wherein, the data processing method is the same for each mask field when masking is performed, and the data processing method is now described by taking the first mask field as an example. If multiple mask fields need to be masked, the following data processing methods are respectively executed for each mask field, so as to realize masking of multiple mask fields.

The data processing method may include but not limited to S201 to S205 shown in FIG. 2 .

S201. The data processing end determines a first mask field and a first mask type in a configuration file.

The mask field is used to represent the fields that need to be masked. The embodiment of the present application does not limit the specific content of the mask field, which can be configured according to actual requirements. Exemplarily, the mask field can be customer name, customer card number and so on.

Field types are used to classify masked fields. The embodiment of the present application does not limit the specific classification method, the specific field type, and the mask fields included in the field type, which may be defined according to actual requirements.

In an example, the field types may include a card number field type and a name field type. Among them, the card number field type can include: mask field 1 (the bank card number of another bank bound) and mask field 2 (customer card number); the name field type can include mask field 3 (the name corresponding to the card number) and mask Field 4 (the name corresponding to the bank card number bound to other banks).

The configuration file defines at least one mask field and the mask type to which each mask field in the at least one mask field belongs. Wherein, the first mask field is any one of the at least one mask field; the first mask type is the mask type to which the first mask field belongs.

The embodiment of the present application does not specifically limit the expression form of the configuration file, which may be configured according to actual requirements. Exemplarily, the configuration file may be in the form of a dictionary. Wherein, the configuration file in the form of a dictionary includes a plurality of dictionary items, and each dictionary item defines a specific mask field and a mask type to which the mask field belongs.

Exemplarily, configuration files may include:

<dict-group field="mask_fields" describe="mask field type 0-card number and account number, 1-name">

<dict-item value="bind_card_no" name="0"/>

<dict-item value="card_no" name="0"/>

<dict-item value="cust_name" name="1"/>

<dict-item value="rcv_name" name="1"/>

</dict-group>

Among them, <dict-group field="mask_fields" describe="mask field type 0-card number and account, 1-name"> means: create two mask types, card number and name, and the corresponding mask field type of card number is 0, the mask field type corresponding to the name is 1.

<dict-item value="bind_card_no" name="0"/> means: create a dictionary item, and when the mask field type is 0, the field information of the sensitive field can include the bound bank card number of other banks.

<dict-item value="card_no" name="0"/> means: create a dictionary item, and when the mask field type is 0, the field information of the sensitive field can include the customer card number.

<dict-item value="cust_name" name="1"/> means: create a dictionary item, and when the mask field type is 1, the field information of the sensitive field can include the name corresponding to the card number.

<dict-item value="rcv_name" name="1"/> means: create a dictionary item, and when the mask field type is 1, the field information of the sensitive field can include the bank card number corresponding to other banks Name.

</dict-group> means: the configuration file is a dictionary type.

S201 can be implemented as: the data processing end obtains at least one mask field included in the configuration file in the configuration file, uses any one mask field in the at least one mask field as the first mask field, and determines in the configuration file The mask type to which the first mask field belongs is the first mask type.

S202. The data processing end obtains a first array and a first value for the first mask field.

The first array includes at least three second values sorted by numerical magnitude. Wherein, a second value is used to point to a log information.

The embodiment of the present application does not specifically limit the manner of obtaining the second value, which may be configured according to actual requirements.

In a possible implementation, the first letter of the function field in the log information can be converted into American Standard Code for Information Interchange (ASCII), and the ASCII code can be used as the second key pointing to the log information. value.

In another possible implementation manner, the log information may be numbered using the function field in the log information as a standard, and the number may be used as a second value pointing to the log information. Wherein, the numbers for the same function field should be kept the same, and the numbers for different function fields should be kept different.

The log information includes at least three log information, and correspondingly, the second value includes at least three second values; the at least three second values form a first array.

Example 1, assuming that the log information of transaction 1 needs to be masked, transaction 1 includes 4 log information, namely: log information 1 (customer card number: 111111111111), log information 2 (name corresponding to the card number: Wang Xiaoer) , log information 3 (the bank card number bound to another bank: 2222222222222), log information 4 (the name corresponding to the bank card number bound to other bank: Zhang Xiaosi). Then, the second value corresponding to log information 1 may be 3, the second value corresponding to log information 2 may be 7, the second value corresponding to log information 3 may be 15, and the second value corresponding to log information 4 may be 18. Then the first array is [3, 7, 15, 18].

The first value is used to point to the first mask field. The embodiment of the present application does not specifically limit the manner of obtaining the first value, which may be configured according to actual requirements. It should be noted that the manner of obtaining the first value should be consistent with the method of obtaining the second value.

Based on example 1, example 2: when the first mask field is the name corresponding to the bank card number bound to another bank, the first value can be 18.

In a possible implementation manner, S202 can be implemented as: the data processing end generates at least three second values through at least three log information, arranges the at least three second values according to the size to obtain the first array, and the data processing end A first value is generated according to the first mask field.

In another possible implementation manner, the data processing end generates a second value pointing to log information for all log information in advance, and S202 may be implemented as: the data processing end determines the at least three log values among multiple second values. For at least three second values corresponding to the information, arrange the at least three second values according to the size to obtain the first array; the data processing end determines that the first mask field corresponds to the plurality of second values according to the first mask field. The second value of is the first value.

S203. The data processing end searches the first array for a target second value that is the same as the first value.

The embodiment of the present application does not uniquely limit the specific search algorithm, which can be configured according to actual requirements. In an example, a binary search algorithm may be used to search for a target second value identical to the first value in the first array. In another example, a traversal algorithm may also be used to search for a target second value that is the same as the first value in the first array.

It can be understood that other search algorithms can also be used to search for the target second value that is the same as the first value in the first array, which will not be listed here.

S204. The data processing end determines the log information pointed to by the target second value as the data to be masked.

The embodiment of the present application does not limit the specific implementation of determining the log information through the second value, and it may be configured according to actual requirements.

For example, the log information may be in the form of a table, and the rows of the table include a function field, a value of the function field, and a corresponding second value. In this way, after the second target value is determined, the target second value can be searched in the corresponding second value column first, and the log information of the row where the target second value is located is determined as the data to be masked.

S205. The data processing end performs a masking operation corresponding to the first masking type on the data to be masked.

Among them, a mask type corresponds to a mask operation.

Exemplarily, when the mask type is the card number field type, the corresponding mask operation is: display the last four digits of the card number, and mask the other digits; when the mask type is the name field type, the corresponding mask The code operation is: display the first character, and mask other characters.

S205 may be implemented as: the data processing end determines a mask operation corresponding to the first mask type, and executes the mask operation on the masked data.

The embodiment of the present application does not limit the specific masking method, which can be configured according to actual requirements. In a possible implementation manner, any replacement algorithm may be used for masking. For example, characters that require a mask can be replaced with special characters.

The embodiment of the present application does not limit the specific content of the special characters, which can be configured according to actual needs. For example, special characters can include any of the following: "*", "#", "&".

The data processing solution provided by the embodiment of the present application includes: in the configuration file, determining a first mask field and a first mask type; the first mask type is the mask type to which the first mask field belongs; Obtaining a first array and a first value for the first mask field; the first array includes at least three second values sorted according to the magnitude of the values; one of the second values is used to point to a log message; the The first numerical value is used to point to the first mask field; in the first array, search for the same target second numerical value as the first numerical value; and determine the log information pointed to by the target second numerical value is the data to be masked; for the data to be masked, perform a masking operation corresponding to the first mask type. For the solution of this application, the first numerical value corresponding to the first mask field and the second numerical value corresponding to multiple log information are obtained; the process of finding the data to be masked in multiple log information is converted into The process of searching for the first numerical value in the first array composed of two numerical values; since the data processing amount of the numerical value search process is small, the scheme of the present application reduces the data processing amount and improves the processing when masking the log information. efficiency.

Next, an implementation process of S203 where the data processing end searches for the target second value that is the same as the first value in the first array will be described. As shown in Fig. 3, the process may include but not limited to the following S2031 to S2035.

S2031. The data processing end determines the first parameter and the second parameter.

Wherein, the initial value of the first parameter is one; the initial value of the second parameter is N, and N is used to represent the length of the first array, that is, N is greater than or equal to 3.

The data processing end assigns a value of one to the first parameter, and assigns a value of N to the second parameter.

Exemplarily, S2031 may be implemented as: Low: 1; High: N. Among them, Low indicates the first parameter, Low: 1 indicates assigning 1 to Low; High indicates the second parameter, and High: N indicates assigning N to High.

S2032. The data processing terminal judges the magnitude relationship between the second numerical value whose subscript is the third parameter in the first array and the first numerical value.

The third parameter is the average value of the first parameter and the second parameter.

Example 3: If the first array is [1, 3, 7, 15, 18, 24, 25], the first value is 18; then the first parameter is 1, the second parameter is 7, and the third parameter is 4; The second value whose subscript is the third parameter is 15; that is, the second value 15 whose subscript is the third parameter is smaller than the first value 18.

It should be noted that if the average value of the first parameter and the second parameter is not an integer, the third parameter may be a value rounded back from the average value. For example, if the average value of the first parameter and the second parameter is 3.5, then the third parameter may be 4.

If the subscript is that the second numerical value of the third parameter is equal to the first numerical value, then execute the following S2033; if the subscript is that the second numerical value of the third parameter is less than the first numerical value, then execute the following S2034; if the subscript is the third If the second value of the parameter is greater than the first value, execute the following S2035.

S2033. The data processing end determines that the target second value is the second value whose subscript is the third parameter.

The data processing end determines the second numerical value subscripted as the third parameter as the target numerical value.

S2034. The data processing end modifies the first parameter by adding one to the third parameter.

The data processing end modifies the first parameter to add one to the third parameter, and re-executes: S2032 The data processing end judges the size between the second value of the third parameter and the first value in the first array whose subscript is relation.

Based on example 3, example 4: the data processing end modifies the first parameter to 5, the second parameter to 7; the third parameter to 6, re-judgment in the array, the subscript is the second value of the third parameter is 24, 24 is greater than the first A value of 18.

S2035. The data processing end modifies the second parameter to be the third parameter minus one.

The data processing end modifies the second parameter to be the third parameter minus one, and re-executes: S2032 The data processing end judges the size between the second value of the third parameter and the first value in the first array whose subscript is relation.

Based on example 4 and example 5, the data processing end modifies the second parameter to 5, the first parameter to 5, then the third parameter to 5, then the subscript is the second value of the third parameter to 18, and the subscript is determined to be the third The second value of the parameter is 18, and 18 is determined as the target second value.

In the data processing method provided by the embodiment of the present application, before executing S2032, the data processing terminal judges the size relationship between the second value whose subscript is the third parameter and the first value in the first array, it may also first Filter the second value in the first array. Wherein, the filtering process may include but not limited to any one of the following Embodiments A to C.

Embodiment A. Filter the second value in the first array according to the increasing direction of the subscript;

Embodiment B. Filter the second value in the first array according to the direction in which the subscript decreases;

Embodiment C. Simultaneously filter the second value in the first array according to the increasing direction of the subscript and the decreasing direction of the subscript.

Next, the process of filtering the second value in the first array according to the increasing direction of the subscript in Embodiment A will be described. This process may include but not limited to SA01 to SA04 described below.

SA01. The data processing end determines a first reference value based on the first array.

The first reference value is used to assist in determining the first screening quantity. The embodiment of the present application does not limit the manner of determining the first reference value, which may be configured according to actual conditions.

SA02. The data processing end subtracts the first value, and the result of subscripting the second value of the first parameter in the first array is determined as the second reference value.

Exemplarily, when the first parameter is one, the data processing end determines that the second reference value is the first value minus the first second value in the first array (the second value whose subscript is one). result.

SA03. The data processing end determines the first screening quantity as a result of dividing the second reference value by the first reference value.

The data processing end calculates the result of dividing the second reference value by the first reference value, and uses the result as the first screening quantity.

SA04. The data processing end filters out the second value of the first screening quantity starting from the subscript of the first parameter in the first array according to the increasing direction of the subscript.

Exemplarily, when the first parameter is one, the first screening number is 2, and the first array is [1, 3, 7, 15, 18, 24, 25], the data processing end will use the 1 and 3 are filtered out.

Next, the process of filtering the second value in the first array according to the direction of decreasing subscript in Embodiment B will be described. This process may include but not limited to the following SB01 to SB04.

SB01. The data processing end determines a first reference value based on the first array.

The specific implementation manner of SB01 is the same as that of SA01, and reference may be made to SA01 for the specific implementation, which will not be repeated here.

SB02. The data processing end determines the result of subtracting the first value from the second value in the first array as the subscript of the second parameter as the third reference value.

Exemplarily, when the second parameter is N, the data processing end determines that the third reference value is the second value (the last second value) of the second parameter in the first array minus the first value. result.

SB03. The data processing end determines the second screening quantity as a result of dividing the third reference value by the first reference value.

The data processing end calculates the result of dividing the third reference value by the first reference value, and uses the result as the second screening quantity.

SB04. The data processing end filters out the second numerical value of the second screening quantity from the first array whose subscript is the second parameter according to the direction in which the subscript decreases.

Exemplarily, when the second parameter is 7, the second screening number is 2, and the first array is [1, 3, 7, 15, 18, 24, 25], the data processing end will use the 24 and 25 are filtered out.

For the specific implementation process of Embodiment C, reference may be made to the detailed descriptions of Embodiments A and B, which will not be repeated here.

The process of determining the first reference value at the SA01 data processing end based on the first array, and the determination of the first reference value at the SB01 data processing end based on the first array will be described below. Specifically, it may include but not limited to the following SA011 and SA012.

SA011. The data processing end calculates the difference between two adjacent second values among the at least three second values in the first array to obtain at least two adjacent differences.

Example 6, if the first array is [1, 3, 7, 15, 18, 24, 25], then at least two adjacent differences include: 2, 4, 8, 3, 6 and 1.

SA012. The data processing end determines that the fourth reference value is the maximum value of the at least two adjacent differences.

Based on example 6, example 7: the fourth reference value is 8.

It should be noted that, for implementation A, when the first screening quantity is less than zero, it is determined that the target second value does not exist in the first array.

For implementation B, if the second screening quantity is less than zero, it is determined that the target second value does not exist in the first array.

For implementation C, when the first screening quantity is less than zero, or the second screening quantity is less than zero, it is determined that the target second value does not exist in the first array.

The data processing method provided in the embodiment of the present application can also modify the configuration file.

In a possible implementation manner, the mask field in the configuration file can be increased or decreased;

In another possible implementation manner, the log information to be processed may also be modified.

Now take adding the first mask field in the configuration file as an example to describe how to modify the configuration file. As shown in Fig. 4, the process may include but not limited to the following S401 and S402.

S401. The data processing end adds a first dictionary item to the configuration file.

S402. The data processing end configures the first mask field and the mask type to which the first mask field belongs in the first dictionary item, so as to obtain a new configuration file.

Exemplarily, <dict-item value="telphone_no" name="2"/> means: create a dictionary item, and when the mask field type is 2, the field information of the sensitive field may include the customer's mobile phone number.

Correspondingly, in S201 the data processing end determines the first mask field and the first mask type in the configuration file may be implemented as: determining the first mask field and the first mask type in the new configuration file.

In the following, the data processing method provided by the embodiment of the present application will be described by taking the branch transaction process as an example.

For ease of understanding, some technical terms in this embodiment are explained first.

Log: Refers to the transaction records of the software system, which are mainly used to assist in problem solving and subsequent auditing.

In related technologies, regarding the masking of log information, it is usually necessary to know which transaction needs to be desensitized (masked), and then perform a full-text search for the log of the transaction that needs to be masked to see if it contains sensitive fields (masked fields), and find sensitive fields. Then judge the mask class corresponding to the mask field, and perform mask processing through the mask class corresponding to the mask field, so as to realize the mask processing of a certain type of transaction log.

However, if it is necessary to add a new mask class or a transaction that needs to be masked, it is often necessary to carry out targeted secondary development.

There is following shortcoming in related technology:

(1) Poor development timeliness;

On the one hand, in response to the rapid changes of the Internet, in the case of adding mask types, transactions that need to be masked, mask fields, etc., it is necessary to carry out secondary development in a targeted manner, resulting in a long development time; on the other hand, in the development During the process, the information that needs to be masked is not masked in time, and there is a risk of information leakage; thus affecting the security performance of the Internet.

(2) The efficiency of locating the mask field and mask processing is low.

When the amount of log information is large, the process of locating the mask field and mask processing in related technologies takes a long time, which may affect normal transactions.

Embodiments of the application have the following characteristics:

The first point is that according to the characteristics of financial data, the optimized binary search algorithm is used to discard invalid data during the search process, so that the field to be masked can be quickly found in a short period of time, and the masking process can be performed through an arbitrary replacement algorithm.

The second point, by configuring the mask information file (equivalent to the configuration file), dynamically realize the automatic processing process in the case of increasing or decreasing the mask type, increasing or decreasing the transaction that needs to be masked, or increasing or decreasing the masked field , sorting out the need for secondary development of the program, reducing the development workload and ensuring rapid business response; at the same time, the configuration mask information file can be exported to other systems in the form of a jar package to realize the common configuration of the configuration mask information file.

Next, the data processing method provided by the embodiment of the present application will be described in detail.

The processing flow may refer to FIG. 5 , and may include but not limited to the following S501 to S507.

S501. Read the log to obtain sensitive fields of the configuration file.

Wherein, the sensitive field is equivalent to a mask field.

S502. Determine that the log contains sensitive information that does not need to be masked.

If the log contains sensitive information that needs to be masked, perform S503 below; if the log does not need sensitive information that needs to be masked, perform S507 below.

S503. Copy the log information.

S504. Obtain the mask type corresponding to the sensitive field in the configuration file.

S505. Perform mask processing based on the mask type by using an optimized binary search algorithm.

S506. Output the masked log.

S507, end.

Exemplarily, since the payment counter system involves fund payment, payment account number and name, and customer privacy information, it is necessary to perform desensitization (masking) on the customer account number and customer name involved in the transfer transaction. Wherein, the customer account number and the customer name belong to different mask types, so the desensitization methods (masking processing methods) are also different, and individualized processing needs to be performed on the customer account number and the customer name respectively.

First, read the system public log (including the external transfer part), and judge whether it contains the field information of the mask configuration file. If it contains the field information of the mask configuration file, continue the following mask processing; if it does not contain the field information of the mask configuration file, it will end.

Exemplarily, configuration files may include:

<dict-item value="bind_card_no" name="0"/>

<dict-item value="card_no" name="0"/>

<dict-item value="cust_name" name="1"/>

<dict-item value="rcv_name" name="1"/>

</dict-group>

</dict-group> means: the configuration file is a dictionary type.

Secondly, in order to avoid mask processing from affecting online transactions, first copy the log information to be processed, and use all sensitive fields as an array (such as [bind_card_no, card_no, cust_name, rcv_name]), and loop out each field in the log information, To determine whether the log information contains sensitive fields in the above array, the optimized binary search algorithm can quickly locate sensitive information.

Among them, each information in the log information array is sorted in ascending order according to the ASCII code corresponding to the first letter (equivalent to the second value). Since a log information corresponds to a unique initial letter, then only in the ordered array, The target sensitive information can be found by finding the value corresponding to the target sensitive field through the algorithm.

Here is an explanation of the principle of the optimized binary search algorithm: the optimized binary search algorithm actually performs a screening work before each binary search to filter out unnecessary elements, which can greatly improve the search speed.

The realization process of screening may mainly include: for example, it is necessary to find 18 (equivalent to the first value or also called the target value) from the arrays 1, 3, 7, 15, 18, 24 (equivalent to the first array), Then the processing procedure of the optimized binary search algorithm is adopted: the difference between two adjacent numbers in the calculation array is 2, 4, 8, 3 and 6 respectively, and the maximum value E of the adjacent difference (equivalent to the first reference value ) is 8, the first difference between the calculated target value 18 and the minimum number 1 (equivalent to the second value of the first parameter in the first array) is 17 (equivalent to the second reference value), and the first difference The value 17 is divided by the maximum value 8 of the adjacent difference to obtain a first reference value of 2.125 (equivalent to the first screening quantity, which can also be called the number of forward screening elements). The first 3 data from the front of the serial number are directly filtered out, that is, 1, 3 and 7 are directly filtered out; the number 18 to be searched and the maximum number 24 are calculated (equivalent to the second parameter whose subscript is the second parameter in the first array) value) is 6 (equivalent to the third reference value), divide the second difference value 6 by the maximum value 8 to get a second reference value 0.75 (equivalent to the second screening quantity, which can also be called the post The number of elements to be screened), directly filter out 1 piece of data whose sequence number is from the back to the front in the array, that is, filter out 24; in this way, the filtered data includes 15 and 18. That is, you only need to find the target value in 15 and 18 through the ordinary binary search algorithm, so that a large number of unnecessary search elements can be eliminated after filtering, thereby greatly reducing the number of search comparisons and improving the search speed.

Exemplarily, the program corresponding to the optimized binary search algorithm may include:

Low (equivalent to the first parameter): 1 //Indicates the assignment of 1 to the variable Low//

High (equivalent to the second parameter): n // means to assign a value to the variable High n//

Index: 0 //Indicates assigning a value of 0 to the variable Index//

Low_span (equivalent to the first screening quantity): 0 //Indicates that the variable Low_span is assigned a value of 0//

High_span (equivalent to the second screening quantity): 0 //Indicates that the variable High_span is assigned a value of 0//

While Low<=High and Low_span>=0 and High_span>=0 //Indicates that when Low is less than or equal to high, and Low_span is greater than or equal to 0, and High_span is greater than or equal to 0, execute the following do statement//

do Low_span＝(X-A[Low])/M //Indicates that the assignment to the variable Low_span is equal to the result of dividing the difference between X and the value whose serial number is Low in array A divided by M//

Low＝Low+[Low_span] //Indicates that in a cycle, the assignment of Low is equal to the sum of the last assignment of Low and [Low_span]//

High_span=(A[High]-X)/M //Indicates that the assignment of High_span is equal to the result of dividing the difference between the value of High in A and X by M//

High＝High-[High_span] //Indicates that in a loop, the assignment of High is equal to the sum of the last assignment of High and [High_span]//

Mid (equivalent to the third parameter): (Low+High)/2 //Indicates that the assignment of Low is equal to the average of the assignment of High and the assignment of High//

If X＝A[Mid] //Indicates that if X is the same as the value of Mid in A, execute the following Then statement, otherwise jump out of the If statement//

Then Index＝mid //Indicates that the Then statement is to assign mid to Index//

Break; //Indicates to jump out of the If statement//

Else if X<A[Mid] //Indicates that if X is less than the value of Mid in A, execute the following Then statement, otherwise execute the following Else statement//

Then High＝Mid-1 //Indicates that the Then statement is to assign High to mid-1//

Else Low＝Mid+1 //Indicates that the Else statement is to assign mid+1 to Low//

Return Index //Represents the return value of Index//

Among them, Low indicates the starting serial number of the array; High indicates the length of the array; Index indicates the return serial number; Low_span indicates the number of elements filtered forward; High_span indicates the number of elements filtered backward; A is the array; X is the target value; M is the maximum value of the difference between two adjacent numbers in the array.

Since the difference between any two adjacent elements in the array A is not greater than M; therefore, if X≥A[Low], then from A[Low] to A[Low+t] in A (here t=(X-A[Low] ])/M) must be smaller than X; in this way, these elements can be directly skipped in the next search process. Similarly, if X≤A[High], the elements in A from A[High-t] to A[High] (here t=(A[High]-X)/M) must be greater than X, and also These elements can be skipped directly.

Compared with the binary search algorithm, two indicator variables are added: Low_span and High_span. Low_span indicates the number of elements filtered forward from A[Low], and High_span indicates the number of elements filtered backward from A[High]. If one of Low_span and High_span is less than 0, it means that the element X to be searched is not between A[Low] and A[High], that is to say, there is no element in A, then the loop can be ended directly.

In this way, through the above-mentioned optimized binary search algorithm, the information to be desensitized can be found and processed with maximum efficiency.

Sensitive fields can be located through the above optimized binary search algorithm, each sensitive field value needs to be desensitized (masked), and the type corresponding to the sensitive field is read from the configuration file (for example, bind_card_no corresponds to 0-customer account type), for the log plaintext information corresponding to bind_card_no, perform mask processing corresponding to the type corresponding to the sense field.

Considering that the amount of information is large and requires fast processing speed, and the data does not need to be restored after processing, the mask processing method here can use any replacement algorithm. For example, special characters (*, etc.) can be used to replace part of the true value, combined with financial account numbers The characteristics of the data, only the first six digits of the card number and the last four digits of the mantissa are displayed, and the other digits are covered with "*". The log information after splicing the mask will be output and printed uniformly after the loop processing is completed. As shown in Figure 6, the masked log file, the sensitive fields involved in the configuration file have been masked. From the performance point of view, through the search and desensitization algorithm processing, the transaction will not be affected while printing the log speed.

Finally, the above-mentioned configuration files can be expanded horizontally and vertically, that is, adding sensitive fields and mask types. For example, mask types (such as 2-ID card, 3-mobile phone number) can be added. For the subsequent vertical expansion of sensitive fields (sensitive fields change relatively frequently), there is no need to change the code for secondary development, just change the configuration file, which greatly improves the development efficiency and can quickly respond to unexpected customer information security incidents (such as CITIC customer information leakage incidents), to achieve short-term development.

In order to implement the above data processing method, a data processing device according to an embodiment of the present application will be described below in conjunction with the schematic structural diagram of the data processing device shown in FIG. 7 .

As shown in FIG. 7 , the data processing device 70 includes: a first determining unit 701 , an obtaining unit 702 , a searching unit 703 , a second determining unit 704 and an executing unit 705 . in:

The first determining unit 701 is configured to determine a first mask field and a first mask type in the configuration file; the first mask type is the mask type to which the first mask field belongs;

The obtaining unit 702 is configured to obtain a first array and a first numerical value for the first mask field; the first array includes at least three second numerical values sorted by numerical value; one of the second numerical values is used for point to a log message; the first value is used to point to the first mask field;

A search unit 703, configured to search for a target second value that is the same as the first value in the first array;

The second determining unit 704 is configured to determine the log information pointed to by the target second value as the data to be masked;

The executing unit 705 is configured to execute a masking operation corresponding to the first masking type on the data to be masked.

In some embodiments, the search unit 703 is further configured to:

Determining a first parameter and a second parameter; the first parameter is one; the second parameter is N; the N is used to characterize the length of the first array;

judging in the first array, the subscript is the magnitude relationship between the second value of the third parameter and the first value; the third parameter is the average value of the first parameter and the second parameter ;

If the second numerical value of the third parameter whose subscript is equal to the first numerical value, then determine that the second numerical value of the target is the second numerical value of the third parameter whose subscript is;

If the subscript is that the second value of the third parameter is smaller than the first value, then modify the first parameter to add one to the third parameter, and perform the determination of the subscript in the first array is the magnitude relationship between the second value of the third parameter and the first value;

If the subscript is that the second value of the third parameter is greater than the first value, then modify the second parameter to be the third parameter minus one, and perform the determination of the subscript in the first array is the magnitude relationship between the second value of the third parameter and the first value.

In some embodiments, the search unit 703 is further configured to: before judging the magnitude relationship between the second value whose subscript is the third parameter and the first value in the first array:

determining a first reference value based on the first array;

Subtracting the first numerical value, the result of subscripting the second numerical value of the first parameter in the first array, is determined as the second reference value;

determining the first screening quantity as the result of dividing the second reference value by the first reference value;

According to the increasing direction of the subscript, from the subscript in the first array to the first parameter, the second value of the first screening quantity is filtered out.

determining a first reference value based on the first array;

Determining the result of subtracting the first numerical value from the second numerical value of the second parameter as the subscript in the first array as the third reference value;

determining a second screening quantity as the result of dividing the third reference value by the first reference value;

According to the decreasing direction of the subscript, from the subscript in the first array to the second parameter, the second numerical value of the second screening quantity is filtered out.

In some embodiments, the search unit 703 is further configured to:

calculating the difference between two adjacent second values among the at least three second values in the first array to obtain at least two adjacent differences;

Determining the fourth reference value as a maximum value among the at least two adjacent differences.

In some embodiments, the search unit 703 is further configured to:

In the case that the first screening quantity is less than zero, determining that the target second value does not exist in the first array;

Or, in the case that the second screening quantity is less than zero, it is determined that the target second value does not exist in the first array.

In some embodiments, the data processing device 70 may further include a configuration unit configured to: the configuration file does not include the first mask field; when executing the configuration file, determine the first mask field and execute before the first mask type:

Add the first dictionary entry in the configuration file;

Configuring the first mask field and the mask type to which the first mask field belongs in the first dictionary item to obtain a new configuration file;

Correspondingly, the first determining unit 701 is further configured to:

In the new configuration file, a first mask field and a first mask type are determined.

It should be noted that the data processing device provided in the embodiment of the present application includes each included unit, which can be realized by a processor in an electronic device; of course, it can also be realized by a specific logic circuit; in the process of implementation, the processor It can be a central processing unit (CPU, Central Processing Unit), a microprocessor (MPU, Micro Processor Unit), a digital signal processor (DSP, Digital Signal Processor) or a field programmable gate array (FPGA, Field-Programmable Gate Array) wait.

The description of the above device embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the device embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.

It should be noted that, in the embodiment of the present application, if the above-mentioned data processing method is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solutions of the embodiments of the present application or the part that contributes to the related technologies can be embodied in the form of software products. The computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

In order to implement the above data processing method, an embodiment of the present application provides an electronic device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the above implementation when executing the program The steps in the data processing method provided in the example.

The structural diagram of the electronic device will be described below with reference to the electronic device 80 shown in FIG. 8 .

In an example, the electronic device 80 may be the above-mentioned electronic device. As shown in FIG. 8 , the electronic device 80 includes: a processor 801 , at least one communication bus 802 , a user interface 803 , at least one external communication interface 804 and a memory 805 . Wherein, the communication bus 802 is configured to realize connection and communication between these components. Wherein, the user interface 803 may include a display screen, and the external communication interface 804 may include a standard wired interface and a wireless interface.

The memory 805 is configured to store instructions and applications executable by the processor 801, and can also cache data to be processed or processed by the processor 801 and various modules in the electronic device (for example, image data, audio data, voice communication data and video data) Communication data), which can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM).

In a fourth aspect, the embodiments of the present application provide a storage medium, that is, a computer-readable storage medium, on which a computer program is stored. When the computer program is executed by a processor, the steps in the data processing method provided in the above-mentioned embodiments are implemented. .

It should be pointed out here that: the descriptions of the above storage medium and device embodiments are similar to the descriptions of the above method embodiments, and have similar beneficial effects to those of the method embodiments. For technical details not disclosed in the storage medium and device embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.

It should be understood that reference throughout the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application. Thus, appearances of "in one embodiment" or "in some embodiments" throughout this specification are not necessarily referring to the same embodiments. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation. The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.

It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.

In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods, such as: multiple units or components can be combined, or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.

The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration The unit can be realized in the form of hardware or in the form of hardware plus software functional unit.

Those of ordinary skill in the art can understand that all or part of the steps to realize the above method embodiments can be completed by hardware related to program instructions, and the aforementioned programs can be stored in computer-readable storage media. When the program is executed, the execution includes: The steps of the foregoing method embodiments; and the foregoing storage media include: removable storage devices, read-only memory (Read Only Memory, ROM), magnetic disks or optical disks and other media that can store program codes.

Alternatively, if the above-mentioned integrated units of the present application are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solutions of the embodiments of the present application or the part that contributes to the related technologies can be embodied in the form of software products. The computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.

The above is only the embodiment of the present application, but the scope of protection of the present application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, and should covered within the scope of protection of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims

A data processing method, the method comprising:

In the configuration file, a first mask field and a first mask type are determined; the first mask type is the mask type to which the first mask field belongs;

Obtaining a first array and a first numerical value for the first mask field; the first array includes at least three second numerical values sorted by numerical value; one of the second numerical values is used to point to a log information; the The first value is used to point to the first mask field;

In the first array, find a target second value that is the same as the first value;

Determining the log information pointed to by the second value of the target as the data to be masked;

For the data to be masked, perform a masking operation corresponding to the first mask type.
The method according to claim 1, wherein in the first array, searching for a target second value identical to the first value comprises:

Determining a first parameter and a second parameter; the first parameter is one; the second parameter is N; the N is used to characterize the length of the first array;

judging in the first array, the subscript is the magnitude relationship between the second value of the third parameter and the first value; the third parameter is the average value of the first parameter and the second parameter ;

If the second numerical value of the third parameter whose subscript is equal to the first numerical value, then determine that the second numerical value of the target is the second numerical value of the third parameter whose subscript is;

If the subscript is that the second value of the third parameter is smaller than the first value, then modify the first parameter to add one to the third parameter, and perform the determination of the subscript in the first array is the magnitude relationship between the second value of the third parameter and the first value;

If the subscript is that the second value of the third parameter is greater than the first value, then modify the second parameter to be the third parameter minus one, and perform the determination of the subscript in the first array is the magnitude relationship between the second value of the third parameter and the first value.
According to the method according to claim 2, before the judgment of the size relationship between the second value of the third parameter and the first value in the first array, the method further includes:

determining a first reference value based on the first array;

Subtracting the first numerical value, the result of subscripting the second numerical value of the first parameter in the first array, is determined as the second reference value;

determining the first screening quantity as the result of dividing the second reference value by the first reference value;

According to the increasing direction of the subscript, from the subscript in the first array to the first parameter, the second value of the first screening quantity is filtered out.
According to the method according to claim 2, before the judgment of the size relationship between the second value of the third parameter and the first value in the first array, the method further includes:

determining a first reference value based on the first array;

Determining the result of subtracting the first numerical value from the second numerical value of the second parameter as the subscript in the first array as the third reference value;

determining a second screening quantity as the result of dividing the third reference value by the first reference value;

According to the decreasing direction of the subscript, from the subscript in the first array to the second parameter, the second numerical value of the second screening quantity is filtered out.
The method according to claim 3 or 4, said determining a first reference value based on said first array, comprising:

calculating the difference between two adjacent second values among the at least three second values in the first array to obtain at least two adjacent differences;

Determining the fourth reference value as a maximum value among the at least two adjacent differences.
The method according to claim 3 or 4, said method further comprising:

In the case that the first screening quantity is less than zero, determining that the target second value does not exist in the first array;

Or, in the case that the second screening quantity is less than zero, it is determined that the target second value does not exist in the first array.
The method according to claim 1, wherein the configuration file does not include the first mask field; before performing the in the configuration file, before determining the first mask field and the first mask type, the method further includes include:

Add the first dictionary entry in the configuration file;

Configuring the first mask field and the mask type to which the first mask field belongs in the first dictionary item to obtain a new configuration file;

Correspondingly, in the configuration file, the first mask field and the first mask type are determined, including:

In the new configuration file, a first mask field and a first mask type are determined.
A data processing device, said device comprising:

The first determining unit is configured to determine a first mask field and a first mask type in the configuration file; the first mask type is the mask type to which the first mask field belongs;

An obtaining unit configured to obtain a first array and a first value for the first mask field; the first array includes at least three second values sorted by value; one second value is used to point to A log message; the first value is used to point to the first mask field;

a search unit configured to search for a target second value identical to the first value in the first array;

The second determination unit is configured to determine the log information pointed to by the target second value as the data to be masked;

The executing unit is configured to execute a masking operation corresponding to the first masking type on the data to be masked.
An electronic device, comprising a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the data processing method according to any one of claims 1 to 7 when executing the program .
A storage medium on which a computer program is stored, and when the computer program is executed by a processor, the data processing method according to any one of claims 1 to 7 is realized.