CN116719907A - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN116719907A
CN116719907A CN202310764425.9A CN202310764425A CN116719907A CN 116719907 A CN116719907 A CN 116719907A CN 202310764425 A CN202310764425 A CN 202310764425A CN 116719907 A CN116719907 A CN 116719907A
Authority
CN
China
Prior art keywords
target
request
data
preset
desensitization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310764425.9A
Other languages
Chinese (zh)
Inventor
陈君豪
李志亮
任龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Zhilian Beijing Technology Co Ltd
Apollo Zhixing Technology Guangzhou Co Ltd
Original Assignee
Apollo Zhilian Beijing Technology Co Ltd
Apollo Zhixing Technology Guangzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apollo Zhilian Beijing Technology Co Ltd, Apollo Zhixing Technology Guangzhou Co Ltd filed Critical Apollo Zhilian Beijing Technology Co Ltd
Priority to CN202310764425.9A priority Critical patent/CN116719907A/en
Publication of CN116719907A publication Critical patent/CN116719907A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The disclosure provides a data processing method, a device, equipment and a storage medium, relates to the field of artificial intelligence, and particularly relates to the technical fields of automatic driving, intelligent transportation and the like. The specific implementation scheme is as follows: user entity conversion processing is carried out on original log data of a target service system to obtain entity data and non-entity data, desensitization processing is carried out on corresponding preset sensitive fields in the entity data by utilizing preset field desensitization rules, word segmentation processing is carried out on the non-entity data, category labels corresponding to the obtained segmented words are determined by utilizing a preset classification model, the category labels comprise non-sensitive labels and a plurality of different sensitive labels, desensitization processing is carried out on target segmented words corresponding to the sensitive labels by utilizing preset word segmentation desensitization rules corresponding to the sensitive labels, and preset word segmentation desensitization rules corresponding to the different sensitive labels are different. By adopting the technical scheme, the efficiency, accuracy and comprehensiveness of desensitizing the log data can be ensured.

Description

Data processing method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to the technical fields of autopilot, intelligent transportation, and the like.
Background
Currently, during the operation of various service systems, a large amount of sensitive data is usually involved in log data and database data of the system, for example, in a vehicle traffic service system, related sensitive data of a vehicle owner and a passenger may be included, and the data needs to be protected to avoid leakage.
Disclosure of Invention
The present disclosure provides a data processing method, apparatus, device, and storage medium.
According to an aspect of the present disclosure, there is provided a data processing method including:
user entity conversion processing is carried out on the original log data of the target service system, and entity data and non-entity data are obtained;
performing desensitization processing on a corresponding preset sensitive field in the entity data by using a preset field desensitization rule;
performing word segmentation on the non-entity data, and determining a category label corresponding to the obtained word segmentation by using a preset classification model, wherein the category label comprises a non-sensitive label and a plurality of different sensitive labels;
aiming at target word segmentation corresponding to the sensitive tag, a preset word segmentation desensitization rule corresponding to the sensitive tag is adopted to conduct desensitization treatment, wherein the preset word segmentation desensitization rules corresponding to different sensitive tags are different.
According to another aspect of the present disclosure, there is provided a data processing apparatus including:
the entity conversion module is used for carrying out user entity conversion processing on the original log data of the target service system to obtain entity data and non-entity data;
the first desensitization module is used for carrying out desensitization treatment on the corresponding preset sensitive fields in the entity data by utilizing preset field desensitization rules;
the word segmentation processing module is used for carrying out word segmentation processing on the non-entity data;
the classification label determining module is used for determining a classification label corresponding to the obtained segmentation by utilizing a preset classification model, wherein the classification label comprises a non-sensitive label and a plurality of different sensitive labels;
the second desensitization module is used for carrying out desensitization processing on target word segmentation corresponding to the sensitive label by adopting a preset word segmentation desensitization rule corresponding to the sensitive label, wherein the preset word segmentation desensitization rules corresponding to different sensitive labels are different.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods described by the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any embodiment of the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a data processing method provided in accordance with an embodiment of the present disclosure;
FIG. 2 is a flow chart of another data processing method provided in accordance with an embodiment of the present disclosure;
FIG. 3 is a flow chart of yet another data processing method provided in accordance with an embodiment of the present disclosure;
FIG. 4 is a flow chart of yet another data processing method provided in accordance with an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a data processing framework provided in accordance with an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a data processing apparatus provided according to an embodiment of the present disclosure;
fig. 7 is a block diagram of an electronic device for implementing a data processing method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present disclosure, where the embodiment of the present disclosure may be applicable to a case of processing sensitive data in a service system. The method may be performed by a data processing apparatus, which may be implemented in hardware and/or software, and may be configured in an electronic device. Referring to fig. 1, the method specifically includes the following:
S101, performing user entity conversion processing on original log data of a target service system to obtain entity data and non-entity data;
s102, performing desensitization treatment on a corresponding preset sensitive field in the entity data by using a preset field desensitization rule;
s103, performing word segmentation on the non-entity data, and determining a category label corresponding to the obtained word segmentation by using a preset classification model, wherein the category label comprises a non-sensitive label and a plurality of different sensitive labels;
s104, aiming at target word segmentation corresponding to the sensitive tag, performing desensitization processing by adopting a preset word segmentation desensitization rule corresponding to the sensitive tag, wherein the preset word segmentation desensitization rules corresponding to different sensitive tags are different.
The specific type of the target service system is not limited, and may be, for example, a vehicle traffic service system, a media content push service system, a map service system, a search engine service system, and the like. During the operation of the target service system, a large amount of log data is generated, and the log data contains a lot of sensitive data related to the user, such as personal information of the user, and possibly contains interactive content, dialogue content and the like of the user and the system. Taking a vehicle traffic service system as an example, for example, the vehicle traffic service system may be a safety supervision platform of an operating vehicle, and relevant sensitive data related to the vehicle owners and passengers in the log data may be related to personal information, vehicle information, journey, location, network address, interface parameters, user request parameters, user dialogue content and the like of the vehicle owners and passengers. It can be seen that the sensitive data contained in the log data are various, and the log data are large in volume, so that analysis and full desensitization are difficult to perform.
In the embodiment of the present disclosure, the original log data may be understood as log data that is not subjected to the desensitization processing. The log data of the target service system can be analyzed in advance, the data related to the user contained in the log data can be determined, the user entity structure is preset, and the fields contained in the user entity structure are defined. And extracting the data of the fields from the original log data, and filling the data into a user entity structure as example data to realize the materialization of the user object, namely, carrying out user materialization processing on the original log data. Alternatively, the structured data may be extracted from the original log data as entity data, where the structured data includes, for example, database structure data or entity objects encapsulated for a front end, and the like.
For example, the user entity structure may include a driver license number, a mobile phone number, an identity card number, a name, a number, and the like, and the entity data obtained after the user materialization process may be expressed as:
drivecode=12345
phone=“12345678900”
idcode=“123456789098765432”
name= "Zhang san"
id=111
For the entity data, a sensitive field (i.e., a preset sensitive field) to be desensitized, which is included in the entity data, for example, a driver license number, a mobile phone number, an identification card number, and a name in the above examples, may be preset. For different preset sensitive fields, corresponding field desensitization rules (namely preset field desensitization rules) can be preset, so that the preset sensitive fields can be rapidly desensitized by using the preset field desensitization rules. For example, taking a name as an example, the corresponding preset field desensitization rule may be to replace the second character with a preset character, for example ". X", and then after the above example is subjected to desensitization, name= "Zhang" may be obtained.
For example, user entity conversion processing may be performed on the original log data before serialization, desensitization processing is performed on the entity data, then serialization operation is performed, and object data in a preset format, such as object data in JSON format, is obtained after serialization operation.
As exemplified above, the serialization process may result in:
{
drivecode=1***5
phone=“123****8900”
idcode=“123456*********432”
name= "Zhang"
id=111}
Optionally, a preset desensitization annotation can be added to a preset field to be desensitized in the user entity structure, the annotation can play a role in identifying the field to be desensitized, the preset desensitization annotation is associated with a preset field desensitization rule, for example, the preset desensitization annotation contains the preset field desensitization rule, or a mapping relation exists between the preset desensitization annotation and the preset field desensitization rule, and the preset field desensitization rule corresponding to the current preset desensitization annotation is searched according to the mapping relation.
For example, the data that cannot be converted into the entity data in the original log data may be recorded as non-entity data, for example, may include unstructured data such as context data, where the non-entity data cannot be desensitized according to a preset field desensitization rule, so as to ensure data security and implement more comprehensive desensitization, and in the embodiment of the disclosure, the desensitization is also performed on the non-entity data. Various information in non-physical data will typically appear in the log in relatively short character lengths, and each type of information has a corresponding text format, possibly containing character types such as chinese, english letters, special characters, and numbers. In the related technology, the sensitive fields are matched in a regular expression matching mode, however, regular expressions corresponding to different types of sensitive fields are different, and when regular matching is carried out, priority exists in each regular expression, if a regular expression with a certain priority is matched with a corresponding text, desensitization is carried out by adopting a desensitization rule corresponding to the current regular expression, error use of the desensitization rule is easy to cause, and the expected desensitization effect cannot be achieved.
For example, taking a mobile phone number as an example, assuming that the mobile phone number is formed by 11 digits, if a text contains a structure of 11 digits, such as longitude and latitude data, when a regular expression of the mobile phone number is matched with the longitude and latitude data, the mobile phone number is also desensitized by adopting a desensitization rule corresponding to the mobile phone number, and the mobile phone number cannot be desensitized by adopting a desensitization rule corresponding to the longitude and latitude data, so that the false use of the desensitization rule is caused, and the expected desensitization effect cannot be achieved.
In the embodiment of the disclosure, firstly, word segmentation processing is performed on non-entity data, then, the word segmentation is classified by utilizing a preset classification model, and for the word segmentation belonging to a sensitive type, the desensitization processing is performed by adopting a preset word segmentation desensitization rule of a corresponding type. For the word segmentation category, different category labels can be set, the word segmentation corresponding to the non-sensitive label can be recorded as the target word segmentation without desensitization treatment, and the word segmentation corresponding to the sensitive label can be recorded as the target word segmentation without desensitization treatment. Different preset word segmentation desensitization rules are set for different sensitive labels, so that targeted desensitization processing of different types of sensitive words is realized.
In the word segmentation process for non-entity data, chinese characters except for sentence breaking symbols (such as "," | "and". Taking the mailbox as an example, the number, the special character (@) and the English letters appear continuously, and can be divided into a word segmentation.
The specific type of the preset classification model is not limited, and may be a machine learning model, for example, a K-nearest neighbor (K-NearestNeighbor, KNN) model, a decision tree model, or a deep learning model.
For the above example, after classification, the mobile phone number and the latitude and longitude data can be classified, that is, the data of the mobile phone number type can be desensitized by adopting the preset word segmentation desensitization rule corresponding to the mobile phone number, and the data of the latitude and longitude type can be desensitized by adopting the preset word segmentation desensitization rule corresponding to the latitude and longitude type, so that the error use of the desensitization rule can not occur, and the expected desensitization effect can be achieved.
According to the data processing scheme provided by the embodiment of the disclosure, user entity conversion processing is performed on original log data of a target service system to obtain entity data and non-entity data, desensitization processing is performed on corresponding preset sensitive fields in the entity data by using preset field desensitization rules, word segmentation processing is performed on the non-entity data, a class label corresponding to the obtained word segmentation is determined by using a preset classification model, wherein the class label comprises a non-sensitive label and a plurality of different sensitive labels, the target word segmentation corresponding to the sensitive labels is subjected to desensitization processing by using preset word segmentation desensitization rules corresponding to the sensitive labels, and preset word segmentation desensitization rules corresponding to the different sensitive labels are different. By adopting the technical scheme, log data are converted into entity data and non-entity data, desensitization treatment is carried out respectively, the high efficiency and the comprehensiveness of desensitization are considered, the data security is ensured, the high efficiency desensitization is carried out on the preset sensitive fields by adopting the preset field desensitization rules for the entity data, the word segmentation is carried out before the reclassification for the non-entity data, the targeted desensitization is carried out by adopting the corresponding word segmentation desensitization rules for the word segmentation of the sensitive category, and the desensitization accuracy is ensured.
In an optional implementation manner, the determining, by using a preset classification model, a class label corresponding to the obtained segmentation includes: determining feature information corresponding to the obtained segmentation words, wherein the feature information comprises text length and character features; inputting the characteristic information into a preset classification model, and determining a class label corresponding to the segmentation according to the output of the preset classification model, wherein the preset classification model comprises a K neighbor model obtained through pre-training. The advantage of this arrangement is that the category of each word can be determined efficiently and accurately.
Illustratively, data of the type that may occur in the log record is first collected and stored in a database. Each piece of data is added with two characteristics of text length and character characteristics, category labels are respectively set for the data, and table 1 shows a data set taking a mobile phone number, an identity card, a mailbox and a dialogue as types.
TABLE 1 features and tag settings for datasets
In the KNN algorithm, k is a super-parameter representing the number of selected nearest neighbor samples. Since the core idea of the KNN algorithm is to classify or regression-predict according to the distance between the feature value and the label value of the sample, the selection of the k value will have an intuitive effect on the final budget result.
To ensure accuracy of the results, the k-value may be determined using a 6-fold cross-validation (6-flop cross validation) approach. The dataset was divided into training, validation and test groups and set into three parts in a ratio of 5:1:2. Under the condition that the test group is unchanged, the training group and the verification group are distributed according to different sequences in a ratio of 5:1, and 6 different sets are obtained again.
A set of training sets and validation sets is first entered,
T p ={(x 1 ,y 1 ),(x 1 ,y 1 ),...,(x 2 ,y n )}
wherein x is i As the feature vector of the data set, the maximum value of i is 2 because the features are classified into only two types in the embodiment of the present disclosure. y is i For example, the label class has i value i=1, 2.
In the setting of the distance metric, the minkofsky distance (Minkowski Distance) is followed, i.e
Wherein x is ia And x ib Respectively representing the coordinates between the point a and the point b in the ith dimension, m represents the dimension of the data, and p is a way of controlling the distance. For example, if p is set to p=2, the above formula is changed to a two-dimensional Euclidean distance (Euclidean distance), i.e
Then, a parameter k in the range of 1-10 is set i Is set k= {1,2,3,..10 }, and each K is set to i Is brought into cross-validation. For each k i Can obtain 6 different accuracies k r Averaging these 6 accuracies gives A k This procedure was repeated 10 times. Finally, A with highest average accuracy in the 10 tests k The values are used as hyper-parameters for the training model. After the k value is determined, the test group is used for training the modelAnd performing row test, and obtaining a preset classification model after passing the test.
In an optional implementation manner, the sensitive label comprises a sensitive keyword label, and the preset word segmentation desensitization rule corresponding to the sensitive keyword label comprises a first character number and a second character number; the desensitizing processing for the target word segmentation corresponding to the sensitive label by adopting a preset word segmentation desensitizing rule corresponding to the sensitive label comprises the following steps: aiming at a first target word segmentation corresponding to the sensitive keyword label, acquiring the first character quantity and the second character quantity in a preset word segmentation desensitization rule corresponding to the sensitive keyword label; and replacing characters of the first character number before the first target word, the first target word and characters of the second character number after the first target word in the non-entity data with preset desensitization characters. The advantage of this arrangement is that the surrounding information of sensitive words appearing in the text, such as conversations, can be desensitized, further ensuring data security.
By way of example, the sensitive keyword may be understood as a keyword with a higher probability of occurrence of sensitive information in the context, such as an "account number", then specific account number information may follow, and if a preset classification model is adopted to fail to output a sensitive label of an account number in a special format, then desensitization of some characters before and after the "account number" may be achieved by identifying the sensitive keyword, so as to ensure that the account number information is not leaked. The first character number and the second character number can be preset, for example, 3, and can be dynamically determined according to the first word segmentation number and the second word segmentation number, wherein the first character number is equal to the total word number of the word segmentation of the first word segmentation number, and the second character number is equal to the total word number of the word segmentation of the second word segmentation number. For example, the first word segmentation number is 2, the total number of characters of 2 word segments before the first target word segmentation is the first number of characters, for example, the second word segmentation number is 3, and the total number of characters of 3 word segments after the first target word segmentation is the first number of characters.
FIG. 2 is a flow chart of another data processing method provided in accordance with an embodiment of the present disclosure, optimized based on the above-described alternative embodiments, as shown in FIG. 2, the method comprising:
S201, user entity conversion processing is carried out on the original log data of the target service system, and entity data and non-entity data are obtained.
S202, traversing father entity class and son entity class in entity data, determining a preset field carrying a preset desensitization annotation as a preset sensitive field, and carrying out desensitization treatment on the preset sensitive field by utilizing a preset field desensitization rule associated with the preset desensitization annotation.
The data structure of the entity data includes a class structure in a class, the class structure in the class includes a parent entity class and a sub-entity class, and the original log data includes association relationship information of a user corresponding to the parent entity class and a user corresponding to the sub-entity class. The association relationship information may include, for example, information that can establish a relationship between users, such as a top-bottom relationship. In order to convert the original log data into the structured entity data more and reduce the non-entity data, the embodiment of the disclosure designs that there is a nested object entity, namely a class-in-class structure, and the structure comprises a father entity class and a sub-entity class, so that more log data can be converted into the entity data, and more comprehensive desensitization is realized.
For example, for entity data with a class structure in the class, the father entity class and the son entity class can contain the same preset sensitive field, and by adding preset desensitization notes to the preset sensitive field, the preset sensitive fields in the father entity class and the son entity class can be quickly found, so that corresponding desensitization treatment is performed, and the desensitization efficiency is improved. The association between the preset desensitization annotation and the preset field desensitization rule may specifically refer to that the preset desensitization annotation contains the preset field desensitization rule, or that a mapping relationship exists between the preset desensitization annotation and the preset field desensitization rule, and the preset field desensitization rule corresponding to the current preset desensitization annotation is searched according to the mapping relationship. For example, the mobile phone number fields in the father entity class and the son entity class carry the same preset desensitization annotation, so that unified desensitization treatment can be performed by adopting the same preset field desensitization rule, and the desensitization efficiency is further improved.
Optionally, when traversing the parent entity class and the child entity class in the entity data, desensitizing can be performed sequentially according to the traversing order, if a preset sensitive field is encountered, then desensitizing the corresponding field data; each entity class (possibly having multi-layer nested relation, that is, one sub-entity class may be the parent entity class of another entity class) can be recursively traversed, and after all preset sensitive fields are found, the desensitization processing is performed uniformly.
S203, word segmentation processing is carried out on the non-entity data.
By way of example, assuming that the non-physical data contains a user's dialogue, such as "i now get to and get to the road, and also 5 minutes to, if you get to contact Zhang three first, his phone is 12345678910", after the word segmentation process we can get: "me", "now", "get on", "get to", "and" get on "," also ","5"," minute "," get on "," if "," you "," get on "," can "," first "," contact "," Zhang Sanhe "," he "," phone "," yes ","12345678910".
S204, determining the feature information corresponding to the obtained segmentation.
Wherein the feature information includes text length and character features.
S205, inputting the characteristic information into a preset classification model, and determining a class label corresponding to the segmentation according to the output of the preset classification model.
The preset classification model comprises a K neighbor model which is obtained through pre-training.
S206, aiming at target word segmentation corresponding to the sensitive label, adopting a preset word segmentation desensitization rule corresponding to the sensitive label to carry out desensitization treatment.
Wherein the preset word segmentation desensitization rules corresponding to different sensitive labels are different.
As exemplified above, the target word may include "drive" of sensitive keyword tags, "Zhang Sano" of nametags and "12345678910" of cell phone number tags, aiming at three different types of target word segmentation, the corresponding preset word segmentation desensitization rules are adopted to carry out desensitization, and the log data after desensitization can be obtained as 'I' in the way, and 5 minutes later, if you arrive, you can contact a call first, his phone is 123 8910", wherein for the sensitive keyword" get a car ", the first 2 characters, itself and the last 3 characters are replaced to realize desensitization, so as to ensure that the destination information of the user is not leaked.
In order to better embody the effects of the technical solutions of the embodiments of the present disclosure, the processing effects of the conventional desensitization method and the data processing method of the present disclosure are compared below through table 2. The traditional desensitization method adopts a regular matching method.
TABLE 2 comparison of the effects of conventional desensitization with non-physical data desensitization of the present disclosure
As shown in Table 2, for desensitization of the mailbox, because the mailbox contains 11 digits, the traditional desensitization method is easy to be mistakenly matched with the regular expression of the mobile phone number, and the desensitization rule of the mobile phone number is mistakenly used for desensitization, so that the desensitization effect cannot be achieved. For longitude and latitude, it is assumed that desensitization is not needed, and the traditional desensitization method is also easy to be mistakenly identified as mobile phone number for desensitization. For sensitive dialogue, the traditional desensitization method cannot realize desensitization of specific account information, and the scheme disclosed by the invention can identify the word segmentation of the sensitive keyword category of account, so as to desensitize the context and ensure the safety of account information.
According to the data processing method provided by the embodiment of the disclosure, log data are converted into entity data and non-entity data, desensitization processing is carried out respectively, the high efficiency and the comprehensiveness of desensitization are considered, the data security is guaranteed, the entity data of a class structure in a class is traversed on a father entity class and a son entity class, the preset sensitive fields are efficiently desensitized by adopting a preset field desensitization rule associated with desensitization annotation, the non-entity data is classified efficiently and accurately by utilizing a KNN model after word segmentation, the words of a sensitive class are subjected to targeted desensitization by adopting a corresponding word segmentation desensitization rule, and the context of the sensitive word can be desensitized, so that the accuracy and the comprehensiveness of desensitization are further guaranteed.
In a business system, besides log data, a large amount of sensitive data exists in a database, and in order to ensure the security of the data in the database, sensitive fields are generally required to be encrypted. However, the database of the service system often needs to be modified by encrypting the sensitive field, that is, the data originally stored in the plaintext may need to be stored in the ciphertext. In the rectification process, the encryption and cleaning of the historical data and the encryption processing of the incremental data are a great difficulty in enabling the service to be seamless and transparent to migrate between the new data system and the old data system, and the normal use of the online service is difficult to ensure. In the embodiment of the disclosure, normal use of online service in the encryption rectifying process can be ensured under the condition that service codes are not modified by adding the ciphertext field and carrying out targeted overwriting on different types of original requests of the request end in the normal use process of the target service system.
FIG. 3 is a flow chart of yet another data processing method provided in accordance with an embodiment of the present disclosure, the method comprising:
s301, user entity conversion processing is carried out on the original log data of the target service system, and entity data and non-entity data are obtained.
S302, desensitizing the corresponding preset sensitive fields in the entity data by using a preset field desensitizing rule.
S303, performing word segmentation on the non-entity data, and determining a category label corresponding to the obtained word segmentation by using a preset classification model, wherein the category label comprises a non-sensitive label and a plurality of different sensitive labels.
S304, aiming at target word segmentation corresponding to the sensitive tag, performing desensitization processing by adopting a preset word segmentation desensitization rule corresponding to the sensitive tag, wherein the preset word segmentation desensitization rules corresponding to different sensitive tags are different.
S305, responding to the triggering of the encryption event of the target plaintext field in the target database of the target service system, and creating a target ciphertext field in the target database, wherein the target ciphertext field is used for storing ciphertext data obtained by encrypting plaintext data in the target plaintext field by adopting a preset encryption rule.
Illustratively, the target plaintext field may be understood as a field that needs to be encrypted due to system modification. It is assumed that the latitude and longitude field does not need to be encrypted before modification, and the latitude and longitude field needs to be encrypted after modification, so that the latitude and longitude field can be a target plaintext field.
Alternatively, a preset component may be introduced in the target service system, with which S305 to S308 are performed. Illustratively, the preset component is added into the dependence of the target service system, and the data source needing sensitive field encryption is configured in the form of a system configuration file. Connection information for the data source may be included in the configuration, such as a database uniform resource locator (Uniform Resource Locator, URL), a user name, a password, and the like. If the target service system comprises a plurality of data sources, a multi-data source configuration switch is started, and different data sources can be respectively configured, for example, a desensitization rule, an encryption algorithm, a name and the like of the configured data sources can be conveniently switched when the system is managed, if the target service system comprises one data source, the data source can be set as a default data source, and at the moment, an object automatically injected by the system is the data source configured by a preset component. The encryption Algorithm is not particularly limited, and may be, for example, an advanced encryption standard (Advanced Encryption Standard, AES) Algorithm or a Message-Digest Algorithm 5 (md5) Algorithm. For example, an encryption event may be detected by a preset component, and upon detecting that an encryption algorithm is configured for a target plaintext field, the encryption event may be considered triggered.
The created target ciphertext field is used for storing ciphertext data obtained by encrypting plaintext data in the target plaintext field by using a preset encryption rule configured by a preset component, and the plaintext data in the target plaintext field in the target database can be encrypted by using a pre-written script and stored in the target ciphertext field.
S306, acquiring an original request which is sent by a request terminal and aims at a target database and contains a target plaintext field.
For example, during the encryption rectification process, the target service system may normally receive a request sent by the request end, such as a structured query language (Structured Query Language, SQL) request, which may be denoted as an original request. The original request is intercepted through the preset component, whether the original request contains a target plaintext field is judged, if not, the original request can be directly forwarded to the target database, and the target database can normally process and respond to the original request. If so, in order to ensure that the original request can be normally responded in the rectification process, the original request can be rewritten by executing S307 and then sent to the target database.
S307, rewriting the original request according to the request type of the original request to obtain the target request.
For example, the request types may include an insert type, an update type, and a query type, where the insert type and the update type may be collectively referred to as a write type.
In the embodiment of the disclosure, a manner of identifying a target plaintext field required to be modified by encryption and rewriting a request is adopted, and compared with a manner of parsing a semantic tree for an SQL request in the related art, the SQL grammar is not limited, and can support comparison operation, calculation operation (such as expressions of greater than, less than or equal to SUM), SQL related to sub-queries and the like, so that compatibility is higher and processing efficiency is higher.
S308, sending a target request to a target database.
The data processing method provided by the embodiment of the disclosure can also realize online encryption rectification of the target service system on the basis of comprehensively and efficiently desensitizing the original log data of the target service system, and ensure the data security in the database and the normal operation of the service in the rectification process.
In an optional implementation manner, the rewriting the original request according to the request type of the original request to obtain a target request includes: and under the condition that the request type of the original request is a writing request, determining a first target request according to the original request, and replacing the target plaintext field in the original request with the target ciphertext field to obtain a second target request, wherein the first target request and the second target request are included in the target request, and the writing request comprises an inserting request and/or an updating request. The method has the advantages that under the condition that the service code does not need to be modified, the writing processing of the target plaintext field and the target ciphertext field is realized, the accuracy of data stored in the target plaintext field and the target ciphertext field is ensured, and the normal operation of the service is ensured.
Illustratively, prior to the rectification, the write request includes a target plaintext field for which the write request is directed only. During and after the rectification, the service code does not need to be modified, namely the original request can normally contain the target plaintext field, a second target request aiming at the target ciphertext field is newly added in a mode of rewriting the request, and the first target request and the second target request are both sent to the target database, so that the writing processing aiming at the target plaintext field and the target ciphertext field is realized. Furthermore, when an abnormality occurs in the rectification process, the data in the target plaintext field can be utilized to roll back, namely to a state before encryption rectification, so that the usability of the system is ensured.
In an optional implementation manner, the rewriting the original request according to the request type of the original request to obtain a target request includes: and under the condition that the request type of the original request is a query request, replacing the target plaintext field in the original request with the target ciphertext field, and encrypting plaintext data corresponding to the target plaintext field in the original request by adopting the preset encryption rule to obtain a target query request. Further, response data returned by the target database aiming at the target query request is received; decrypting the ciphertext data contained in the response data, and returning the decrypted response data to the request terminal. The method has the advantages that when the request end needs to perform data query, the service code does not need to be modified, namely the original request can normally contain the target plaintext field, the query for the target plaintext field is changed into the query for the target ciphertext field in a manner of rewriting the request, after ciphertext data are queried, the ciphertext data are decrypted and returned to the request end, and the encryption and rectification process is completed under the condition that the request end does not have perception on the encryption and decryption process.
For example, assuming that the target plaintext field is a mobile phone number, the request end wants to query which user a specific mobile phone number (plaintext data) belongs to, the original request includes the specific mobile phone number, encrypts the specific mobile phone number to obtain a target query request, and sends the target query request to the target database. The target database searches the encrypted mobile phone number in a target ciphertext field corresponding to the mobile phone number, if the encrypted mobile phone number is found, the encrypted mobile phone number and a user name (such as a user name in the same row in a database table) associated with the encrypted mobile phone number are used as response data to return, and the preset component decrypts the encrypted mobile phone number in the response data to obtain decrypted response data and returns the decrypted response data to the request terminal.
Optionally, if ciphertext data corresponding to the target plaintext field of the current query does not exist in the target ciphertext field, if the ciphertext data is not successfully migrated from the target plaintext field to the target ciphertext field, the preset database may return query failure information, and after receiving the query failure information, the original request is sent to the target database, so as to search from the target plaintext field, thereby ensuring normal operation of the online service.
Optionally, a preset switch may be set, when the preset switch is in an on state, encrypting plaintext data of the target plaintext field and storing the encrypted ciphertext data in the target ciphertext field, adopting to rewrite the original request, adopting the target ciphertext field to query and writing in the target plaintext field and the target ciphertext field; when the preset switch is in the off state, the key can be rolled back to the state before encryption rectification. Optionally, after the encryption rectification is determined, the target service system is stable, the target plaintext field and plaintext data in the target database can be deleted, and the target ciphertext field and ciphertext data are reserved.
In an alternative embodiment, after receiving the response data returned by the target database for the target query request, the method further includes: and checking the response data through a gateway layer, wherein the checking comprises checking of a data type and checking of the user authority of the request terminal, and the data type comprises a plaintext type and a ciphertext type. The advantage of setting up like this is that through increasing the gateway layer, add the secondary check-up at the gateway layer to data, can further ensure data security.
For example, the verification of the data type may specifically be detecting whether the data contained in the response data is of a plaintext type or a ciphertext type, and for the request for the target plaintext field, when the query operation is executed in the target database through the overwriting of the request, the target ciphertext field is queried, so that the returned data of the ciphertext type should be returned, if the returned data of the plaintext type is returned, it is indicated that the encryption correction is abnormal, the correction is not successfully implemented, and related measures need to be taken to intervene. The verification of the user authority may include determining, according to information such as a user role, a Token (Token), and a data owner, whether the user that the requesting end currently triggers to send the original request has the authority to access the response data, if not, the user may not decrypt the response data and directly return the response data to the requesting end, so that the requesting end may not acquire the decrypted response data, thereby further ensuring data security.
In an alternative embodiment, the method further comprises: under the condition that the decrypted data contains a resource address, token information and/or allowed access period information contained in the original request are added to the resource address to obtain modified decrypted data, wherein the allowed access period information is related to the user authority; wherein returning decrypted data to the requesting end comprises: and returning the modified decryption data to the request end. The advantage of this arrangement is that for the case of requesting access to the resource content at the resource address, the visibility of the resource content to the user can be controlled by adding information to the resource address, and the security of the sensitive resource content can be ensured.
Illustratively, after checking the user authority, if the user authority passes the checking, the gateway layer decrypts the encrypted resource address to obtain the decrypted resource address. However, the resource address is used for accessing resources such as pictures or videos, the target database does not encrypt the resource content, in order to ensure the security of the resource content, in the embodiment of the present disclosure, after the token information carried in the original request is added to the resource address, when the user accesses the resource address, if the token information is the token information with access rights, the user can access the specific resource content at the front end after clicking the resource address, and if the token information is the token information without access rights, the user cannot access the specific resource content after clicking the resource address, thereby further ensuring the security of the resource content.
For example, the gateway layer may further add the allowed access period information to the resource address, for example, if the user does not have authority, shorter access period information may be added, for example, 3 seconds, and if the user has authority, longer access period information may be added, for example, unlimited, etc.
FIG. 4 is a flow chart of yet another data processing method provided according to an embodiment of the present disclosure, as shown in FIG. 4, the method including:
S401, user entity conversion processing is carried out on the original log data of the target service system, and entity data and non-entity data are obtained.
S402, desensitizing the corresponding preset sensitive fields in the entity data by using preset field desensitizing rules.
S403, performing word segmentation on the non-entity data, and determining a category label corresponding to the obtained word segmentation by using a preset classification model, wherein the category label comprises a non-sensitive label and a plurality of different sensitive labels.
S404, aiming at target word segmentation corresponding to the sensitive tag, performing desensitization processing by adopting a preset word segmentation desensitization rule corresponding to the sensitive tag, wherein the preset word segmentation desensitization rules corresponding to different sensitive tags are different.
S405, in response to the encryption event of the target plaintext field in the target database of the target service system being triggered, creating a target ciphertext field in the target database.
Fig. 5 is a schematic diagram of a data processing framework provided according to an embodiment of the present disclosure, as shown in fig. 5, in which a data source is configured in a preset component, a desensitization rule is set, and a target ciphertext field is created in a target database.
S406, receiving an original request sent by a request end.
As shown in fig. 5, the target service system receives the SQL request sent by the request end, and intercepts the SQL request through the preset component.
S407, judging whether a target plaintext field exists in the original request, if so, executing S408; otherwise, S412 is performed.
As shown in fig. 5, the preset component is used to determine whether the sensitive field is included, that is, whether the target plaintext field needs to be modified by encryption, if the target plaintext field exists, the request type needs to be further determined so as to carry out targeted overwriting, and if the target plaintext field does not exist, the SQL request can be directly forwarded to the target database.
S408, judging whether the request type of the original request is a writing request, if so, executing S409; otherwise, S410 is performed.
S409, determining a first target request according to the original request, replacing a target plaintext field in the original request with a target ciphertext field to obtain a second target request, and sending the first target request and the second target request to a target database.
S410, replacing a target plaintext field in the original request with a target ciphertext field, and encrypting plaintext data corresponding to the target plaintext field in the original request by adopting a preset encryption rule to obtain a target query request and sending the target request to a target database.
S411, receiving response data returned by the target database aiming at the target query request, checking the response data through the gateway layer, decrypting ciphertext data contained in the response data if the response data passes the check, and returning the decrypted response data to the request terminal.
S412, sending the original request to the target database.
According to the data processing method provided by the embodiment of the disclosure, on the basis of comprehensively and efficiently desensitizing original log data of a target service system, online encryption rectification of the target service system can be realized, when a plaintext field is required to be changed to be stored in ciphertext, the ciphertext field is created for storing encrypted ciphertext data, when a request of a request end is received, if the request contains the target plaintext field, the original request is rewritten according to whether a specific request type is a write request or a query request, all SQL grammars can be compatible, compatibility of a scheme is improved, and when response data is returned, a gateway layer is utilized for secondary verification, data security in a database and normal operation of service in the rectification process are ensured, compared with the original SQL request, only encryption and decryption processes are increased, performance influence on the target service system is reduced, and processing efficiency of the SQL request is ensured.
Fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure, which is applicable to a case of processing sensitive data in a service system. The device can be realized in a hardware and/or software mode and can be configured in electronic equipment. Referring to fig. 6, the data processing apparatus 600 includes:
The entity conversion module 601 is configured to perform user entity conversion processing on original log data of the target service system to obtain entity data and non-entity data;
the first desensitization module 602 is configured to desensitize a corresponding preset sensitive field in the entity data by using a preset field desensitization rule;
a word segmentation processing module 603, configured to perform word segmentation processing on the non-entity data;
the category label determining module 604 is configured to determine a category label corresponding to the obtained segmentation by using a preset classification model, where the category label includes a non-sensitive label and a plurality of different sensitive labels;
the second desensitizing module 605 is configured to perform desensitization processing on the target word segmentation corresponding to the sensitive tag by using a preset word segmentation desensitization rule corresponding to the sensitive tag, where the preset word segmentation desensitization rules corresponding to different sensitive tags are different.
According to the data processing device provided by the embodiment of the disclosure, log data are converted into entity data and non-entity data, the desensitization processing is carried out respectively, the high efficiency and the comprehensiveness of the desensitization are considered, the data security is ensured, the high efficiency desensitization is carried out on the preset sensitive fields by adopting the preset field desensitization rules for the entity data, the word segmentation is carried out before the reclassification for the non-entity data, the targeted desensitization is carried out by adopting the corresponding word segmentation desensitization rules for the word segmentation of the sensitive category, and the desensitization accuracy is ensured.
In an optional implementation manner, the sensitive label comprises a sensitive keyword label, and the preset word segmentation desensitization rule corresponding to the sensitive keyword label comprises a first character number and a second character number;
wherein the second desensitizing module comprises:
the quantity acquisition unit is used for acquiring the first character quantity and the second character quantity in a preset word segmentation desensitization rule corresponding to the sensitive keyword label aiming at a first target word corresponding to the sensitive keyword label;
and the character replacing unit is used for replacing the characters of the first character number before the first target word, the first target word and the characters of the second character number after the first target word in the non-entity data with preset desensitization characters.
In an optional implementation manner, the data structure of the entity data includes a class-in-class structure, the class-in-class structure includes a parent entity class and a sub-entity class, and the original log data includes association relationship information of a user corresponding to the parent entity class and a user corresponding to the sub-entity class;
wherein the first desensitizing module comprises:
a preset sensitive field determining unit, configured to traverse the parent entity class and the child entity class in the entity data, and determine the preset field carrying a preset desensitization annotation as a preset sensitive field;
And the desensitization processing unit is used for carrying out desensitization processing on the preset sensitive fields by utilizing preset field desensitization rules associated with the preset desensitization notes.
In an alternative embodiment, the apparatus further comprises:
the system comprises a ciphertext field creation module, a target ciphertext field generation module and a target service system, wherein the ciphertext field creation module is used for responding to the triggering of an encryption event of a target plaintext field in a target database of the target service system, and creating a target ciphertext field in the target database, wherein the target ciphertext field is used for storing ciphertext data obtained by encrypting plaintext data in the target plaintext field by adopting a preset encryption rule;
the original request acquisition module is used for acquiring an original request which is sent by a request end and contains the target plaintext field for the target database;
the request rewriting module is used for rewriting the original request according to the request type of the original request to obtain a target request;
and the request sending module is used for sending the target request to the target database.
In an alternative embodiment, the request rewrite module includes:
and the first rewriting unit is used for determining a first target request according to the original request under the condition that the request type of the original request is a writing request, and replacing the target plaintext field in the original request with the target ciphertext field to obtain a second target request, wherein the target request comprises the first target request and the second target request, and the writing request comprises an inserting request and/or an updating request.
In an alternative embodiment, the request rewrite module includes:
the second rewriting unit is configured to replace the target plaintext field in the original request with the target ciphertext field when the request type of the original request is a query request, and encrypt plaintext data corresponding to the target plaintext field in the original request by using the preset encryption rule to obtain a target query request;
wherein the apparatus further comprises:
the response data receiving module is used for receiving response data returned by the target database aiming at the target query request;
the decryption module is used for decrypting ciphertext data contained in the response data;
and the decryption data return module is used for returning decryption data to the request end.
In an alternative embodiment, the apparatus further comprises:
and the verification module is used for verifying the response data through a gateway layer after receiving the response data returned by the target database aiming at the target query request, wherein the verification comprises a data type verification and a request end user permission verification, and the data type comprises a plaintext type and a ciphertext type.
In an alternative embodiment, the apparatus further comprises:
the information adding module is used for adding the token information and/or the allowed access period information contained in the original request to the resource address to obtain modified decrypted data under the condition that the decrypted data contains the resource address, wherein the allowed access period information is related to the user permission;
the decryption data return module is specifically configured to:
and returning the modified decryption data to the request end.
In the technical scheme of the disclosure, the related personal information of the user is collected, stored, used, processed, transmitted, provided, disclosed and the like, all conform to the regulations of related laws and regulations and do not violate the popular public order.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, such as a data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When a computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the data processing method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligent software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.
Cloud computing (cloud computing) refers to a technical system that a shared physical or virtual resource pool which is elastically extensible is accessed through a network, resources can comprise servers, operating systems, networks, software, applications, storage devices and the like, and resources can be deployed and managed in an on-demand and self-service mode. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions provided by the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (18)

1. A data processing method, comprising:
user entity conversion processing is carried out on the original log data of the target service system, and entity data and non-entity data are obtained;
performing desensitization processing on a corresponding preset sensitive field in the entity data by using a preset field desensitization rule;
performing word segmentation on the non-entity data, and determining a category label corresponding to the obtained word segmentation by using a preset classification model, wherein the category label comprises a non-sensitive label and a plurality of different sensitive labels;
aiming at target word segmentation corresponding to the sensitive tag, a preset word segmentation desensitization rule corresponding to the sensitive tag is adopted to conduct desensitization treatment, wherein the preset word segmentation desensitization rules corresponding to different sensitive tags are different.
2. The method of claim 1, wherein the sensitive label comprises a sensitive keyword label, and the preset word segmentation desensitization rule corresponding to the sensitive keyword label comprises a first character number and a second character number;
The desensitizing processing for the target word segmentation corresponding to the sensitive label by adopting a preset word segmentation desensitizing rule corresponding to the sensitive label comprises the following steps:
aiming at a first target word segmentation corresponding to the sensitive keyword label, acquiring the first character quantity and the second character quantity in a preset word segmentation desensitization rule corresponding to the sensitive keyword label;
and replacing characters of the first character number before the first target word, the first target word and characters of the second character number after the first target word in the non-entity data with preset desensitization characters.
3. The method of claim 1, wherein the data structure of the entity data comprises a class-in-class structure, the class-in-class structure comprises a parent entity class and a child entity class, and the original log data comprises association relation information of a user corresponding to the parent entity class and a user corresponding to the child entity class;
the desensitizing processing of the corresponding preset sensitive fields in the entity data by using the preset field desensitizing rule comprises the following steps:
traversing the father entity class and the child entity class in the entity data, determining the preset field carrying the preset desensitization annotation as a preset sensitive field, and carrying out desensitization treatment on the preset sensitive field by utilizing a preset field desensitization rule associated with the preset desensitization annotation.
4. The method of claim 1, further comprising:
responding to the triggering of an encryption event of a target plaintext field in a target database of the target service system, and creating a target ciphertext field in the target database, wherein the target ciphertext field is used for storing ciphertext data obtained by encrypting plaintext data in the target plaintext field by adopting a preset encryption rule;
acquiring an original request which is sent by a request terminal and aims at the target database and contains the target plaintext field;
rewriting the original request according to the request type of the original request to obtain a target request;
and sending the target request to the target database.
5. The method of claim 4, wherein the overwriting the original request according to the request type of the original request to obtain a target request comprises:
and under the condition that the request type of the original request is a writing request, determining a first target request according to the original request, and replacing the target plaintext field in the original request with the target ciphertext field to obtain a second target request, wherein the first target request and the second target request are included in the target request, and the writing request comprises an inserting request and/or an updating request.
6. The method of claim 4, wherein the overwriting the original request according to the request type of the original request to obtain a target request comprises:
under the condition that the request type of the original request is a query request, replacing the target plaintext field in the original request with the target ciphertext field, and encrypting plaintext data corresponding to the target plaintext field in the original request by adopting the preset encryption rule to obtain a target query request;
wherein the method further comprises:
receiving response data returned by the target database aiming at the target query request;
decrypting the ciphertext data contained in the response data, and returning the decrypted response data to the request terminal.
7. The method of claim 6, further comprising, after said receiving response data returned by said target database for said target query request:
and checking the response data through a gateway layer, wherein the checking comprises checking of a data type and checking of the user authority of the request terminal, and the data type comprises a plaintext type and a ciphertext type.
8. The method of claim 7, further comprising:
under the condition that the decrypted data contains a resource address, token information and/or allowed access period information contained in the original request are added to the resource address to obtain modified decrypted data, wherein the allowed access period information is related to the user authority;
wherein returning decrypted data to the requesting end comprises:
and returning the modified decryption data to the request end.
9. A data processing apparatus comprising:
the entity conversion module is used for carrying out user entity conversion processing on the original log data of the target service system to obtain entity data and non-entity data;
the first desensitization module is used for carrying out desensitization treatment on the corresponding preset sensitive fields in the entity data by utilizing preset field desensitization rules;
the word segmentation processing module is used for carrying out word segmentation processing on the non-entity data;
the classification label determining module is used for determining a classification label corresponding to the obtained segmentation by utilizing a preset classification model, wherein the classification label comprises a non-sensitive label and a plurality of different sensitive labels;
the second desensitization module is used for carrying out desensitization processing on target word segmentation corresponding to the sensitive label by adopting a preset word segmentation desensitization rule corresponding to the sensitive label, wherein the preset word segmentation desensitization rules corresponding to different sensitive labels are different.
10. The device of claim 9, wherein the sensitive label comprises a sensitive keyword label, and the preset word segmentation desensitization rule corresponding to the sensitive keyword label comprises a first character number and a second character number;
wherein the second desensitizing module comprises:
the quantity acquisition unit is used for acquiring the first character quantity and the second character quantity in a preset word segmentation desensitization rule corresponding to the sensitive keyword label aiming at a first target word corresponding to the sensitive keyword label;
and the character replacing unit is used for replacing the characters of the first character number before the first target word, the first target word and the characters of the second character number after the first target word in the non-entity data with preset desensitization characters.
11. The apparatus of claim 9, wherein the data structure of the entity data comprises a class-in-class structure, the class-in-class structure comprises a parent entity class and a child entity class, and the original log data comprises association relation information of a user corresponding to the parent entity class and a user corresponding to the child entity class;
wherein the first desensitizing module comprises:
A preset sensitive field determining unit, configured to traverse the parent entity class and the child entity class in the entity data, and determine the preset field carrying a preset desensitization annotation as a preset sensitive field;
and the desensitization processing unit is used for carrying out desensitization processing on the preset sensitive fields by utilizing preset field desensitization rules associated with the preset desensitization notes.
12. The apparatus of claim 9, further comprising:
the system comprises a ciphertext field creation module, a target ciphertext field generation module and a target service system, wherein the ciphertext field creation module is used for responding to the triggering of an encryption event of a target plaintext field in a target database of the target service system, and creating a target ciphertext field in the target database, wherein the target ciphertext field is used for storing ciphertext data obtained by encrypting plaintext data in the target plaintext field by adopting a preset encryption rule;
the original request acquisition module is used for acquiring an original request which is sent by a request end and contains the target plaintext field for the target database;
the request rewriting module is used for rewriting the original request according to the request type of the original request to obtain a target request;
and the request sending module is used for sending the target request to the target database.
13. The apparatus of claim 12, wherein the request rewrite module comprises:
and the first rewriting unit is used for determining a first target request according to the original request under the condition that the request type of the original request is a writing request, and replacing the target plaintext field in the original request with the target ciphertext field to obtain a second target request, wherein the target request comprises the first target request and the second target request, and the writing request comprises an inserting request and/or an updating request.
14. The apparatus of claim 12, wherein the request rewrite module comprises:
the second rewriting unit is configured to replace the target plaintext field in the original request with the target ciphertext field when the request type of the original request is a query request, and encrypt plaintext data corresponding to the target plaintext field in the original request by using the preset encryption rule to obtain a target query request;
wherein the apparatus further comprises:
the response data receiving module is used for receiving response data returned by the target database aiming at the target query request;
The decryption module is used for decrypting ciphertext data contained in the response data;
and the decryption data return module is used for returning decryption data to the request end.
15. The apparatus of claim 14, further comprising:
and the verification module is used for verifying the response data through a gateway layer after receiving the response data returned by the target database aiming at the target query request, wherein the verification comprises a data type verification and a request end user permission verification, and the data type comprises a plaintext type and a ciphertext type.
16. The apparatus of claim 15, further comprising:
the information adding module is used for adding the token information and/or the allowed access period information contained in the original request to the resource address to obtain modified decrypted data under the condition that the decrypted data contains the resource address, wherein the allowed access period information is related to the user permission;
the decryption data return module is specifically configured to:
and returning the modified decryption data to the request end.
17. An electronic device, comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.
CN202310764425.9A 2023-06-26 2023-06-26 Data processing method, device, equipment and storage medium Pending CN116719907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310764425.9A CN116719907A (en) 2023-06-26 2023-06-26 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310764425.9A CN116719907A (en) 2023-06-26 2023-06-26 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116719907A true CN116719907A (en) 2023-09-08

Family

ID=87865946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310764425.9A Pending CN116719907A (en) 2023-06-26 2023-06-26 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116719907A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117891929A (en) * 2024-03-18 2024-04-16 南京华飞数据技术有限公司 Knowledge graph intelligent question-answer information identification method of improved deep learning algorithm
CN117891929B (en) * 2024-03-18 2024-05-17 南京华飞数据技术有限公司 Knowledge graph intelligent question-answer information identification method of improved deep learning algorithm

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060005017A1 (en) * 2004-06-22 2006-01-05 Black Alistair D Method and apparatus for recognition and real time encryption of sensitive terms in documents
US8458487B1 (en) * 2010-03-03 2013-06-04 Liaison Technologies, Inc. System and methods for format preserving tokenization of sensitive information
CN108776762A (en) * 2018-06-08 2018-11-09 北京中电普华信息技术有限公司 A kind of processing method and processing device of data desensitization
CN110197083A (en) * 2019-06-05 2019-09-03 深圳市优网科技有限公司 Sensitive data desensitization system and processing method
US10878126B1 (en) * 2020-02-18 2020-12-29 Capital One Services, Llc Batch tokenization service
US20210209251A1 (en) * 2019-10-17 2021-07-08 Mentis Inc System and method for sensitive data retirement
US20210256149A1 (en) * 2020-02-18 2021-08-19 Capital One Services, Llc De-tokenization patterns and solutions
CN113723089A (en) * 2020-05-25 2021-11-30 阿里巴巴集团控股有限公司 Word segmentation model training method, word segmentation method, data processing method and data processing device
CN114626097A (en) * 2022-03-22 2022-06-14 中国平安人寿保险股份有限公司 Desensitization method, desensitization device, electronic apparatus, and storage medium
CN115238298A (en) * 2021-04-22 2022-10-25 中移动金融科技有限公司 Method and device for desensitizing sensitive field of database

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060005017A1 (en) * 2004-06-22 2006-01-05 Black Alistair D Method and apparatus for recognition and real time encryption of sensitive terms in documents
US8458487B1 (en) * 2010-03-03 2013-06-04 Liaison Technologies, Inc. System and methods for format preserving tokenization of sensitive information
CN108776762A (en) * 2018-06-08 2018-11-09 北京中电普华信息技术有限公司 A kind of processing method and processing device of data desensitization
CN110197083A (en) * 2019-06-05 2019-09-03 深圳市优网科技有限公司 Sensitive data desensitization system and processing method
US20210209251A1 (en) * 2019-10-17 2021-07-08 Mentis Inc System and method for sensitive data retirement
US10878126B1 (en) * 2020-02-18 2020-12-29 Capital One Services, Llc Batch tokenization service
US20210256149A1 (en) * 2020-02-18 2021-08-19 Capital One Services, Llc De-tokenization patterns and solutions
CN113723089A (en) * 2020-05-25 2021-11-30 阿里巴巴集团控股有限公司 Word segmentation model training method, word segmentation method, data processing method and data processing device
CN115238298A (en) * 2021-04-22 2022-10-25 中移动金融科技有限公司 Method and device for desensitizing sensitive field of database
CN114626097A (en) * 2022-03-22 2022-06-14 中国平安人寿保险股份有限公司 Desensitization method, desensitization device, electronic apparatus, and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117891929A (en) * 2024-03-18 2024-04-16 南京华飞数据技术有限公司 Knowledge graph intelligent question-answer information identification method of improved deep learning algorithm
CN117891929B (en) * 2024-03-18 2024-05-17 南京华飞数据技术有限公司 Knowledge graph intelligent question-answer information identification method of improved deep learning algorithm

Similar Documents

Publication Publication Date Title
US9787722B2 (en) Integrated development environment (IDE) for network security configuration files
US10102246B2 (en) Natural language consumer segmentation
CN108090351B (en) Method and apparatus for processing request message
CN104956376A (en) Method and technique for application and device control in a virtualized environment
US9219746B2 (en) Risk identification based on identified parts of speech of terms in a string of terms
US20220239674A1 (en) Security appliance to monitor networked computing environment
US10073618B2 (en) Supplementing a virtual input keyboard
CN111698207B (en) Method, equipment and storage medium for generating knowledge graph of network information security
US11494559B2 (en) Hybrid in-domain and out-of-domain document processing for non-vocabulary tokens of electronic documents
US11734325B2 (en) Detecting and processing conceptual queries
US10649970B1 (en) Methods and apparatus for detection of functionality
US11507747B2 (en) Hybrid in-domain and out-of-domain document processing for non-vocabulary tokens of electronic documents
US20230194302A1 (en) Method of updating map data, electronic device and storage medium
CN110618999A (en) Data query method and device, computer storage medium and electronic equipment
US20210294969A1 (en) Generation and population of new application document utilizing historical application documents
CN114244795A (en) Information pushing method, device, equipment and medium
US11968214B2 (en) Efficient retrieval and rendering of access-controlled computer resources
US10262061B2 (en) Hierarchical data classification using frequency analysis
CN111597336A (en) Processing method and device of training text, electronic equipment and readable storage medium
EP4102772B1 (en) Method and apparatus of processing security information, device and storage medium
WO2023154779A2 (en) Methods and systems for identifying anomalous computer events to detect security incidents
US9286348B2 (en) Dynamic search system
CN116719907A (en) Data processing method, device, equipment and storage medium
JP2024507029A (en) Web page identification methods, devices, electronic devices, media and computer programs
US10776376B1 (en) Systems and methods for displaying search results

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination