CN114611149A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN114611149A
CN114611149A CN202210265154.8A CN202210265154A CN114611149A CN 114611149 A CN114611149 A CN 114611149A CN 202210265154 A CN202210265154 A CN 202210265154A CN 114611149 A CN114611149 A CN 114611149A
Authority
CN
China
Prior art keywords
sensitive data
scoring
data
target
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210265154.8A
Other languages
Chinese (zh)
Inventor
陈思雨
刘琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang eCommerce Bank Co Ltd
Original Assignee
Zhejiang eCommerce Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang eCommerce Bank Co Ltd filed Critical Zhejiang eCommerce Bank Co Ltd
Priority to CN202210265154.8A priority Critical patent/CN114611149A/en
Publication of CN114611149A publication Critical patent/CN114611149A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification provides a data processing method and a data processing device, wherein the method comprises the steps of obtaining a sensitive data set containing initial sensitive data, and determining a data scene corresponding to the initial sensitive data; determining a data processing algorithm corresponding to the initial sensitive data and at least two target scoring units according to the data scene; under the condition that the initial sensitive data meet the preset processing conditions according to the data processing algorithm, performing sensitivity scoring on the initial sensitive data according to at least two target scoring units to obtain target scoring results; the method adopts a data processing algorithm and a target scoring unit to realize sensitivity scoring on each initial sensitive data, and can quickly and accurately determine the target sensitive data according to a target scoring result determined by the sensitivity scoring in the follow-up process so as to adjust the target sensitive data in the follow-up process.

Description

Data processing method and device
Technical Field
The embodiment of the specification relates to the technical field of data processing, in particular to a data processing method.
Background
With the effectiveness of data protection laws, individuals or organizations become more conscious of protection of sensitive data. Organizations generally spend a large amount of budget purchasing DLP (data leakage protection system) to monitor sensitive data, but DLP output alarm data daily often has huge magnitude, if the DLP output alarm data is processed in a manual mode, a large amount of manpower is consumed, real risks cannot be responded in time, and the whole cost is seriously wasted.
Disclosure of Invention
In view of this, the embodiments of the present specification provide a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve the technical deficiencies of the prior art.
According to a first aspect of embodiments herein, there is provided a data processing method including:
acquiring a sensitive data set containing initial sensitive data, and determining a data scene corresponding to the initial sensitive data;
determining a data processing algorithm corresponding to the initial sensitive data and at least two target scoring units according to the data scene;
and under the condition that the initial sensitive data meet the preset processing conditions according to the data processing algorithm, carrying out sensitivity scoring on the initial sensitive data according to the at least two target scoring units to obtain a target scoring result.
According to a second aspect of embodiments herein, there is provided a data processing apparatus comprising:
the data acquisition module is configured to acquire a sensitive data set containing initial sensitive data and determine a data scene corresponding to the initial sensitive data;
the algorithm determining module is configured to determine a data processing algorithm corresponding to the initial sensitive data and at least two target scoring units according to the data scene;
and the scoring module is configured to perform sensitivity scoring on the initial sensitive data according to the at least two target scoring units under the condition that the initial sensitive data meets the preset processing condition according to the data processing algorithm, so as to obtain a target scoring result.
According to a third aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is used for storing computer-executable instructions, and the processor is used for executing the computer-executable instructions, and the computer-executable instructions realize the steps of the data processing method when being executed by the processor.
According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the data processing method described above.
According to a fifth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the above-mentioned data processing method.
One embodiment of the present specification implements a data processing method and apparatus, wherein the method includes acquiring a sensitive data set including initial sensitive data, and determining a data scene corresponding to the initial sensitive data; determining a data processing algorithm corresponding to the initial sensitive data and at least two target scoring units according to the data scene; and under the condition that the initial sensitive data meet the preset processing conditions according to the data processing algorithm, carrying out sensitivity scoring on the initial sensitive data according to the at least two target scoring units to obtain a target scoring result.
Specifically, the data processing method adopts a data processing algorithm and a target scoring unit to realize sensitivity scoring of each initial sensitive data, and subsequently, the target sensitive data can be quickly and accurately determined according to a target scoring result determined by the sensitivity scoring so as to be convenient for subsequently adjusting the target sensitive data.
Drawings
Fig. 1 is a schematic diagram of a specific application scenario of a data processing method provided in an embodiment of the present specification;
FIG. 2 is a flow diagram of a method of data processing provided in one embodiment of the present description;
FIG. 3 is a flowchart illustrating a data processing method according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present specification;
fig. 5 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can be termed a second and, similarly, a second can be termed a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
First, the noun terms to which one or more embodiments of the present specification relate are explained.
Sensitive data: refers to personal privacy data such as: certificate number, mobile phone number, bank card number, etc.
DLP: a data leakage protection system.
SFTP: a secure file transfer protocol is encrypted.
Each organization evaluates the data as open, internal, confidential or absolute grade data by referring to the harm brought to the organization by sensitive data leakage and the value brought to the organization by the data, carries out classified and graded marking on the sensitive data in the environments of a host, a terminal, application and the like through a data classified and graded marking program, and combs out a data scene by the marked data to cover a plurality of data leakage strategies. The embodiment of the specification aims to solve the technical problems that alarm data output by the data leakage strategies are huge in magnitude, low in manual processing efficiency and high in cost.
In view of this, in the present specification, a data processing method is provided, and the present specification simultaneously relates to a data processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating a specific application scenario of a data processing method according to an embodiment of the present specification.
The data processing method provided by the embodiment of the present specification is applied to the data processing system in fig. 1, and the data processing system includes a data storage unit 102, an algorithm unit 104, a scorecard unit 106, a data analysis unit 108, and a data processing unit 110.
The data storage unit 102 stores a plurality of DLP alarm data generated by DLP, such as sensitive chat records, file transmission, system call and other data;
the algorithm unit 104 stores a plurality of algorithms, such as a semantic rule algorithm, a model algorithm, a business logic algorithm, and the like, and during specific calculation, the data processing system selects a corresponding algorithm in advance according to a data scene corresponding to each DLP alarm data, performs specific analysis on each DLP alarm data through the selected algorithm, further screens whether each DLP alarm data is high-risk data, and if so, sends the DLP alarm data to the score card unit 106, and if not, does not perform subsequent processing on the DLP alarm data;
the scoring card unit 106 stores a plurality of scoring cards, such as a personnel scoring card, a time scoring card, a file scoring card, an application scoring card, a data scoring card, an address scoring card, an equipment scoring card, a related person scoring card, and the like, each scoring card corresponds to a corresponding scoring interval (such as 1-10 scores, 0-5 scores, and the like), a scoring dimension (such as a position responsibility, a leave intention, a leave state), and a scoring policy (such as an addition factor or a multiplication factor, and the like), and subsequently, a target scoring result of each DLP alarm data can be obtained according to a sensitivity score of the scoring card corresponding to each DLP alarm data to the scoring card;
the data analysis unit 108 may update the dynamic scoring result of each DLP alarm data according to the target scoring result of each DLP alarm data, and determine whether each DLP alarm data is target sensitive data by combining with a preset processing rule, so as to determine which DLP alarm data needs to be subsequently processed, where the preset processing rule may be set according to actual application, and the embodiment of the present specification does not limit this;
the data processing unit 110 performs post-response processing, such as sending out a warning message, according to the initial sensitive data (i.e., the target sensitive data) that needs to be processed and is determined by the data analysis unit 108.
In the data processing method provided by the embodiment of the specification, risk identification factors covered by the dynamic scoring cards are adopted, each DLP alarm data can be comprehensively and accurately processed, the calculation factor of each scoring card can be dynamically adjusted along with different data scenes, for example, the device shares a stolen data scene, and the weight value of the calculation factor of the device scoring card can be automatically increased. In addition, the risk identification factor in each score card can also be dynamically adjusted, and three score cards are taken as an example for description in the following:
personnel rating card: the staff is the root cause of the data leakage event, risk factors such as post responsibility, leaving willingness, leaving state and the like of the staff are added with weights, and final grading is dynamically adjusted according to leaving application and post transferring change information submitted by the staff in real time;
file rating card: the file is an important carrier for loading sensitive data, the weight of the file is divided according to risk factors such as the size of the file, the content of the file, encrypted compression records and the like, and the weight and the score are analyzed according to the historical operation of the file;
the relationship person scoring card: the relatives can be staff relatives, co-workers, supervisors, friends, etc., and in order to prevent the behavior of collusion stealing data, the behavior of the relatives is an important factor for evaluating the risk.
In the embodiment of the description, the sensitivity score of the scoring card corresponding to each DLP alarm data is brought into a risk scoring formula to calculate the final risk score, and the risk score can also calculate the risk acceleration and deceleration so as to estimate the risk abnormal situation of the staff. The final score will influence the final disposition decision, such as ending the disposition flow, reaching a supervisor survey, automatically issuing an interception policy, etc. This will greatly reduce the human cost, reduce long in response time, promote the efficiency of risk hemostasis etc..
Referring to fig. 2, fig. 2 is a flowchart illustrating a data processing method according to an embodiment of the present disclosure, which specifically includes the following steps.
Step 202: acquiring a sensitive data set containing initial sensitive data, and determining a data scene corresponding to the initial sensitive data.
The data processing method provided by the embodiment of the specification is applied to a data leakage scene, the data leakage scene inside each organization can be generated in a plurality of place environments such as an office terminal computer, an application system, a data warehouse, an ecological mechanism, a host server and the like, the data leakage scene in the environments is combed and clarified in the initial stage, the organization can purchase basic monitoring software such as DLP data leakage protection software, a network DLP, an application system operation log, a host HIDS and the like, and original log data generated by the basic monitoring software can be understood as initial sensitive data of the embodiment of the specification based on the original log data. The specific implementation mode is as follows:
in specific implementation, the acquiring a sensitive data set including initial sensitive data includes:
and acquiring a sensitive data set which is determined by a preset risk identification terminal and contains initial sensitive data.
The preset risk identification terminal comprises but is not limited to basic monitoring software such as DLP data leakage protection software, network DLP, an application system operation log, a host HIDS and the like.
In practical application, the data scene corresponding to each initial sensitive data is determined at an initial stage, and as described above, the data leakage scene belongs to an office terminal computer, an application system, a data warehouse, an ecological institution, a host server, and the like.
Step 204: and determining a data processing algorithm corresponding to the initial sensitive data and at least two target scoring units according to the data scene.
In practical application, the data processing algorithm includes, but is not limited to, a semantic rule algorithm, a machine model algorithm, a business logic algorithm, etc.; the semantic rule method comprises the recognition capabilities of keywords, keywords and regular rules; the machine model algorithm comprises algorithms such as relational graph analysis, frequency amplitude fluctuation analysis and off-duty model analysis, and the strategies related to the relational class, the frequency class and the deviation class can be realized by using the method; the business logic algorithm is mainly based on business attributes in the organization, such as financial related attributes, to inquire a human account as an exception to a loan issuing department of the enterprise, and the business logic is made to be a strategy. And the target scoring units include, but are not limited to, a person scoring unit, a time scoring unit, a file scoring unit, an application scoring unit, a data scoring unit, an address scoring unit, an equipment scoring unit, a related person scoring unit, and the like.
Specifically, according to the difference of the data scene corresponding to each initial sensitive data, the data processing algorithm and the target scoring unit adopted for the subsequent processing of the initial sensitive data are also different.
Because the attributes of the data scenes corresponding to each initial sensitive data are different, it is determined from the initial stage which data processing algorithms and target scoring units each initial sensitive data needs to be analyzed; for example, if the data scene corresponding to the initial sensitive data is a data leakage scene of an office terminal computer (if a data leakage risk of equipment stealing exists), and the data scene is strongly related to the equipment, identifying keywords and keywords based on a semantic rule algorithm and calling an equipment scoring unit for analysis; if the data scene corresponding to the initial sensitive data is a data leakage scene of the application system (if the data leakage risk of illegal application query exists), and the data scene is strongly related to the relationship network of the query client, analyzing the relationship network of the relationship person based on the machine algorithm model and calling the application scoring unit for analysis.
Each scoring unit corresponds to a scoring interval, for example, the scoring interval corresponding to the personnel scoring unit is 1-10 points, the scoring interval corresponding to the time scoring unit is 0-5 points, the scoring interval corresponding to the file scoring unit is 0-100 points, the scoring interval corresponding to the application scoring unit is 0-100 points, the scoring interval corresponding to the data scoring unit is 0-100 points, the scoring interval corresponding to the address scoring unit is 0-10 points, the scoring interval corresponding to the equipment scoring unit is 0-10 points, the scoring interval corresponding to the related personnel scoring unit is 0-10 points, and the like.
And the weight value of the calculation factor of each scoring unit can be dynamically adjusted along with different data scenes, for example, the weight value of the calculation factor of the equipment scoring unit can be automatically increased in an equipment sharing appropriation scene. The specific implementation mode is as follows:
the determining a data processing algorithm corresponding to the initial sensitive data and at least two target scoring units according to the data scene includes:
determining a data processing algorithm corresponding to the initial sensitive data and at least two initial scoring units according to the data scene;
calculating the association degree of each initial scoring unit and the data scene;
and adjusting the scoring strategy of each initial scoring unit according to the relevance, and determining each adjusted initial scoring unit as at least two target scoring units corresponding to the initial sensitive data.
The scoring strategy can be understood as that the final scoring result of the initial scoring unit is addition, multiple increase or the like.
Specifically, the adjusting of the scoring policy of each initial scoring unit according to the association degree may be understood as adjusting the weight value of the calculation factor of each initial scoring unit according to the association degree, and determining the scoring policy of each initial scoring unit according to the weight value, for example, after the weight value is adjusted, the multiple of the last scoring result of the initial scoring unit may be increased, so that the score calculated by the initial scoring unit is more prominent.
In practical application, two or more initial scoring units corresponding to each data scene may be used to calculate the association degree between each initial scoring unit and the data scene under the condition that a plurality of initial scoring units correspond to each data scene, adjust the weight value of the calculation factor of each initial scoring unit according to the association degree, and determine and adjust the scoring policy according to the weight value, so as to obtain the target scoring unit corresponding to the initial sensitive data, where a specific calculation manner of the association degree between the initial scoring unit and the data scene may be any one, such as calculation based on the similarity of keywords, and the like, and the embodiments of the present specification are not limited in any way.
For example, when the data scene is an equipment sharing appropriation scene, the target scoring unit corresponding to the data scene includes an equipment scoring unit, a relationship scoring unit and the like, and it can be determined through the similarity of the keywords that the association degree of the equipment scoring unit with the data scene is high, so that the weight value of the calculation factor of the equipment scoring unit can be dynamically increased.
Step 206: and under the condition that the initial sensitive data meet the preset processing conditions according to the data processing algorithm, carrying out sensitivity scoring on the initial sensitive data according to the at least two target scoring units to obtain a target scoring result.
The role of the data processing algorithm in the embodiments of the present specification may be understood as further screening of the initial sensitive data. That is, before performing sensitivity scoring on certain initial sensitive data, further sensitivity processing needs to be performed on the certain initial sensitive data according to a data processing algorithm corresponding to the initial sensitive data, so as to determine whether to perform sensitivity scoring in the next step. That is, when it is determined that the initial sensitive data is high-risk data leakage through the corresponding data processing algorithm, sensitivity scoring is performed according to the corresponding target scoring algorithm.
Specifically, if it is determined that the initial sensitive data meets the preset processing condition according to the data processing algorithm, it may be understood that the initial sensitive data meets the preset processing condition under the condition that the initial sensitive data is determined to be high-risk leakage data according to the data processing algorithm.
In practical application, the data processing algorithm includes, but is not limited to, a semantic rule algorithm, a machine model algorithm, a business logic algorithm, and the like; the semantic rule algorithm comprises keyword, keyword and regular rule identification capabilities, and whether the initial sensitive data are high-risk leakage data or not can be determined according to the identification capabilities; the machine model algorithm comprises algorithms such as relation graph analysis, frequency amplitude fluctuation analysis and off-duty model analysis, and when the strategies of relation class, frequency class and deviation class in the initial sensitive data can be analyzed by using the method, so as to determine whether the initial sensitive data is high-risk leakage data or not; the business logic algorithm is a strategy formulated by business logic for inquiring a human account of a loan issuing department of an enterprise as abnormal based on business attributes in an organization, such as financial related attributes, and whether initial sensitive data is high-risk leakage data or not is determined through the algorithm.
In specific implementation, under the condition that the initial sensitive data meet preset processing conditions according to the data processing algorithm, sensitivity scoring is carried out on the initial sensitive data according to the at least two target scoring units, and a target scoring result is obtained. The specific implementation mode is as follows:
the sensitivity scoring of the initial sensitive data according to the at least two target scoring units comprises:
and performing sensitivity scoring on the initial sensitive data according to the scoring dimension and the scoring strategy corresponding to each target scoring unit in the at least two target scoring units.
The scoring dimensions corresponding to each target scoring unit are different, taking the target scoring unit as a personnel scoring unit as an example, and the scoring dimensions corresponding to the target scoring unit include but are not limited to position responsibility, leaving willingness, leaving state and the like of personnel; taking a target scoring unit as a file scoring unit as an example, scoring dimensions corresponding to the target scoring unit include but are not limited to file size, file content, encrypted compressed records and the like; taking the target scoring unit as the relationship scoring unit as an example, the scoring dimensions corresponding to the relationship scoring unit include, but are not limited to, employee relatives, colleagues, supervisors, friends, and the like.
In practical application, each target scoring unit corresponds to a scoring interval, and the scoring dimension corresponding to each target scoring unit is matched with the corresponding score based on the scoring interval. When the method is applied specifically, the target scoring unit corresponding to each initial sensitive data scores the sensitivity of each initial sensitive data according to the corresponding scoring dimension and the scoring strategy. Therefore, the target scoring result can be accurately obtained according to the sensitivity scoring of the target scoring unit corresponding to each initial sensitive data.
In specific implementation, each initial sensitive data corresponds to two or more than two target scoring units, and after the sensitivity score of each target scoring unit is obtained, the target scoring result of each initial sensitive data is accurately obtained through addition, multiplication or other calculation methods. The specific implementation mode is as follows:
the sensitivity scoring of the initial sensitive data according to the at least two target scoring units to obtain a target scoring result comprises the following steps:
performing sensitivity scoring on the initial sensitive data according to each target scoring unit of the at least two target scoring units to obtain an initial scoring result of each target scoring unit on the initial sensitive data;
and calculating the initial scoring result of the initial sensitive data according to a preset calculation rule to obtain a target scoring result of the initial sensitive data.
The preset calculation rule includes, but is not limited to, addition, multiplication, or other calculation rules.
Specifically, taking a preset calculation rule as an example for summation, under the condition that two or more target scoring units corresponding to certain initial sensitive data are provided, performing sensitivity scoring on the initial sensitive data according to each adjusted target scoring unit, and obtaining an initial scoring result of each target scoring unit for the initial sensitive data; and adding the initial scoring results of each target scoring unit aiming at the initial sensitive data to obtain the target scoring result of the initial sensitive data.
In practical application, after the target scoring result of each piece of initial sensitive data is obtained, whether the initial sensitive data is the target sensitive data or not can be judged according to the target scoring result, and then specific service processing can be performed on the target sensitive data, so that the safety of the whole service is ensured. The specific implementation mode is as follows:
after obtaining the target scoring result, the method further comprises:
and determining whether the initial sensitive data are target sensitive data according to the target scoring result.
In the specific implementation, the manner of determining whether the initial sensitive data is the target sensitive data according to the target scoring result includes at least three, and the following three manners are taken as examples to specifically introduce each manner of determining whether the initial sensitive data is the target sensitive data according to the target scoring result.
Firstly, determining an average scoring result according to a target scoring result, and judging whether the initial sensitive data is target sensitive data or not according to the average scoring result. The specific implementation mode is as follows:
the determining whether the initial sensitive data is target sensitive data according to the target scoring result includes:
determining target scoring results of all initial sensitive data in the sensitive data set;
determining an average scoring result according to the target scoring results of all initial sensitive data in the sensitive data set;
and determining whether the initial sensitive data are target sensitive data or not according to the average grading result.
In specific implementation, target scoring results of all initial sensitive data in the sensitive data set are obtained, and then an average scoring result is calculated according to the obtained target scoring results of all initial sensitive data in the sensitive data set; and comparing the target scoring result of each initial sensitive data with the average scoring result to determine whether the initial sensitive data are the target sensitive data.
For example, if the target scoring result is greater than the average scoring result, the initial sensitive data may be considered as target sensitive data; if the target scoring result is less than or equal to the average scoring result, the initial sensitive data may be considered not to be the target sensitive data. By the method, whether each initial sensitive data is the target sensitive data can be rapidly judged.
And secondly, acquiring target scoring results of all initial sensitive data in the sensitive data set, and determining whether the initial sensitive data are the target sensitive data according to the sequencing results of the target scoring results of all the initial sensitive data. The specific implementation mode is as follows:
the determining whether the initial sensitive data is target sensitive data according to the target scoring result includes:
determining target scoring results of all initial sensitive data in the sensitive data set;
sorting all initial sensitive data in the sensitive data set according to the target scoring results of all initial sensitive data in the sensitive data set;
and determining whether the initial sensitive data are target sensitive data according to the sequencing result.
For example, the target scoring results of all initial sensitive data in the sensitive data set are obtained, all initial sensitive data in the sensitive data set are sorted in a descending order according to the target scoring results of all initial sensitive data in the sensitive data set, and then the first 30 or the first 50 initial sensitive data are selected as the target sensitive data from the top according to the sorting result. In this way, the target sensitive data can be acquired quickly.
Thirdly, judging whether the initial sensitive data are target sensitive data or not according to the historical scoring result. The specific implementation mode is as follows:
the determining whether the initial sensitive data is target sensitive data according to the target scoring result includes:
and under the condition that the initial sensitive data has the historical scoring result, determining whether the initial sensitive data is the target sensitive data or not according to the historical scoring result of the initial sensitive data and the target scoring result.
Specifically, after obtaining the target scoring result of the initial sensitive data, if it is determined that the initial sensitive data has the historical scoring result, it may be determined whether each initial sensitive data is the target sensitive data according to the historical scoring result and the target scoring result of each initial sensitive data.
One way to achieve this is that, the determining whether the initial sensitive data is target sensitive data according to the historical scoring result of the initial sensitive data and the target scoring result includes:
calculating a difference value between the historical grading result and the target grading result according to the historical grading result and the target grading result of the initial sensitive data;
determining the initial sensitive data as target sensitive data under the condition that the difference value is larger than a preset difference value threshold value; and
and determining that the initial sensitive data is not target sensitive data under the condition that the difference is smaller than or equal to the preset difference threshold.
The preset difference threshold may be set according to practical applications, and this specification does not limit this. Such as 60 or 70, etc.
In specific implementation, after acquiring a historical scoring result (such as the last scoring result of the current scoring result) and a target scoring result of each piece of initial sensitive data; and acquiring a historical grading result of each initial sensitive data and a difference value of the target grading results, and determining whether each initial sensitive data is the target sensitive data according to the difference value.
For example, if the preset difference threshold is 70 points and the difference between the historical scoring result and the target scoring result of the initial sensitive data is 90 points, it may be determined that the difference is greater than the preset difference threshold, and it may be determined that the initial sensitive data is the target sensitive data; if the difference between the historical scoring result and the target scoring result of the initial sensitive data is 50 points, it may be determined that the difference is smaller than a preset difference threshold, and it may be determined that the initial sensitive data is not the target sensitive data.
Alternatively, a previous historical scoring result for each initial sensitive data may be obtained, as well as a previous historical scoring result for the previous historical scoring result; and then calculating the difference between the initial sensitive data and the previous historical scoring result, the difference between the previous historical scoring result and the previous historical scoring result of the previous historical scoring result, comparing the two differences, and if the subtracted value of the two differences is greater than a preset threshold value, determining the initial sensitive data as the target sensitive data.
The data processing method provided by the embodiment of the specification adopts a data processing algorithm and a target scoring unit to realize sensitivity scoring of each sensitive dimension of each initial sensitive data, and finally, the target sensitive data can be quickly and accurately determined according to a target scoring result determined by the sensitivity scoring so as to be convenient for subsequent adjustment of the target sensitive data.
The following description will further describe the data processing method by taking the data processing method provided in this specification as an example to process DLP alarm data with reference to fig. 3. Fig. 3 shows a processing procedure flowchart of a data processing method provided in an embodiment of the present specification, which specifically includes the following steps.
Step 302: and obtaining DLP alarm data.
Step 304: and determining a data scene of each DLP alarm data, and determining a data processing algorithm and at least two scoring cards corresponding to each DLP alarm data according to the data scene of each DLP alarm data.
The scoring card may be understood as the scoring unit described above.
Step 306: and under the condition that each DLP alarm data is determined to be high-risk leakage data according to the data processing algorithm corresponding to each DLP alarm data, the DLP alarm data is sent to the corresponding at least two target scoring units.
Step 308: and performing sensitivity scoring on each DLP alarm data according to at least two scoring cards corresponding to each DLP alarm data to obtain a target scoring result of each DLP alarm data.
Step 310: and determining whether each DLP alarm data is target sensitive data or not according to the target scoring result of each DLP alarm data.
According to the data processing method provided by the embodiment of the specification, the dynamic scoring card mechanism is adopted, the target scoring result of each DLP alarm data can be comprehensively and accurately obtained, the DLP alarm data needing to be post-processed can be obtained subsequently in an actual application scene according to the target scoring result, and the safety of the system is ensured.
Corresponding to the above method embodiment, this specification further provides a data processing apparatus embodiment, and fig. 4 shows a schematic structural diagram of a data processing apparatus provided in an embodiment of this specification. As shown in fig. 4, the apparatus includes:
a data obtaining module 402, configured to obtain a sensitive data set including initial sensitive data, and determine a data scene corresponding to the initial sensitive data;
an algorithm determining module 404 configured to determine a data processing algorithm corresponding to the initial sensitive data and at least two target scoring units according to the data scene;
the scoring module 406 is configured to perform sensitivity scoring on the initial sensitive data according to the at least two target scoring units to obtain a target scoring result under the condition that it is determined that the initial sensitive data meets a preset processing condition according to the data processing algorithm.
Optionally, the data obtaining module 402 is further configured to:
and acquiring a sensitive data set which is determined by a preset risk identification terminal and contains initial sensitive data.
Optionally, the algorithm determining module 404 is further configured to:
determining a data processing algorithm corresponding to the initial sensitive data and at least two initial scoring units according to the data scene;
calculating the association degree of each initial scoring unit and the data scene;
and adjusting the scoring strategy of each initial scoring unit according to the relevance, and determining each adjusted initial scoring unit as at least two target scoring units corresponding to the initial sensitive data.
Optionally, the scoring module 406 is further configured to:
and performing sensitivity scoring on the initial sensitive data according to the scoring dimension and the scoring strategy corresponding to each target scoring unit in the at least two target scoring units.
Optionally, the scoring module 406 is further configured to:
performing sensitivity scoring on the initial sensitive data according to each target scoring unit of the at least two target scoring units to obtain an initial scoring result of each target scoring unit on the initial sensitive data;
and calculating the initial scoring result of the initial sensitive data according to a preset calculation rule to obtain a target scoring result of the initial sensitive data.
Optionally, the apparatus further comprises:
a target data determination module configured to:
and determining whether the initial sensitive data are target sensitive data according to the target scoring result.
Optionally, the target data determination module is further configured to:
determining target scoring results of all initial sensitive data in the sensitive data set;
determining an average scoring result according to the target scoring results of all initial sensitive data in the sensitive data set;
and determining whether the initial sensitive data are target sensitive data or not according to the average grading result.
Optionally, the target data determination module is further configured to:
determining target scoring results of all initial sensitive data in the sensitive data set;
sorting all initial sensitive data in the sensitive data set according to the target scoring results of all initial sensitive data in the sensitive data set;
and determining whether the initial sensitive data are target sensitive data according to the sequencing result.
Optionally, the target data determination module is further configured to:
and under the condition that the initial sensitive data has a historical scoring result, determining whether the initial sensitive data is target sensitive data or not according to the historical scoring result of the initial sensitive data and the target scoring result.
Optionally, the target data determination module is further configured to:
calculating a difference value between the historical grading result and the target grading result according to the historical grading result and the target grading result of the initial sensitive data;
determining the initial sensitive data as target sensitive data under the condition that the difference is larger than a preset difference threshold; and
and determining that the initial sensitive data is not target sensitive data under the condition that the difference is smaller than or equal to the preset difference threshold.
The data processing device provided by the embodiment of the specification adopts a data processing algorithm and a target scoring unit to realize sensitivity scoring of each initial sensitive data, and subsequently, the target sensitive data can be quickly and accurately determined according to a target scoring result determined by the sensitivity scoring so as to be convenient for subsequently adjusting the target sensitive data.
The above is a schematic configuration of a data processing apparatus of the present embodiment. It should be noted that the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the data processing apparatus can be referred to the description of the technical solution of the data processing method.
FIG. 5 illustrates a block diagram of a computing device 500 provided in accordance with one embodiment of the present description. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530, and database 550 is used to store data.
Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 540 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a global microwave interconnect access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 500, as well as other components not shown in FIG. 5, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 5 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 500 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 500 may also be a mobile or stationary server.
Wherein the processor 520 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the data processing method described above.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method.
An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor implement the steps of the data processing method described above.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data processing method.
An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the data processing method.
The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computer program can be referred to the description of the technical solution of the data processing method.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in source code form, object code form, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims (13)

1. A method of data processing, comprising:
acquiring a sensitive data set containing initial sensitive data, and determining a data scene corresponding to the initial sensitive data;
determining a data processing algorithm corresponding to the initial sensitive data and at least two target scoring units according to the data scene;
and under the condition that the initial sensitive data meet the preset processing conditions according to the data processing algorithm, carrying out sensitivity scoring on the initial sensitive data according to the at least two target scoring units to obtain a target scoring result.
2. The data processing method of claim 1, the obtaining a set of sensitive data including initial sensitive data, comprising:
and acquiring a sensitive data set which is determined by a preset risk identification terminal and contains initial sensitive data.
3. The data processing method of claim 1, wherein the determining a data processing algorithm corresponding to the initial sensitive data and at least two target scoring units according to the data scenario comprises:
determining a data processing algorithm corresponding to the initial sensitive data and at least two initial scoring units according to the data scene;
calculating the association degree of each initial scoring unit and the data scene;
and adjusting the scoring strategy of each initial scoring unit according to the relevance, and determining each adjusted initial scoring unit as at least two target scoring units corresponding to the initial sensitive data.
4. The data processing method of claim 1, said sensitivity scoring the initial sensitive data according to the at least two target scoring units, comprising:
and performing sensitivity scoring on the initial sensitive data according to the scoring dimension and the scoring strategy corresponding to each target scoring unit in the at least two target scoring units.
5. The data processing method of claim 1, wherein the sensitivity scoring of the initial sensitive data according to the at least two target scoring units to obtain target scoring results comprises:
performing sensitivity scoring on the initial sensitive data according to each target scoring unit of the at least two target scoring units to obtain an initial scoring result of each target scoring unit on the initial sensitive data;
and calculating the initial scoring result of the initial sensitive data according to a preset calculation rule to obtain a target scoring result of the initial sensitive data.
6. The data processing method of claim 1, further comprising, after obtaining the target scoring result:
and determining whether the initial sensitive data are target sensitive data according to the target scoring result.
7. The data processing method of claim 6, wherein the determining whether the initial sensitive data is target sensitive data according to the target scoring result comprises:
determining target scoring results of all initial sensitive data in the sensitive data set;
determining an average scoring result according to the target scoring results of all initial sensitive data in the sensitive data set;
and determining whether the initial sensitive data are target sensitive data or not according to the average grading result.
8. The data processing method of claim 6, wherein the determining whether the initial sensitive data is target sensitive data according to the target scoring result comprises:
determining target scoring results of all initial sensitive data in the sensitive data set;
sorting all initial sensitive data in the sensitive data set according to the target scoring results of all initial sensitive data in the sensitive data set;
and determining whether the initial sensitive data are target sensitive data according to the sequencing result.
9. The data processing method of claim 6, wherein the determining whether the initial sensitive data is target sensitive data according to the target scoring result comprises:
and under the condition that the initial sensitive data has the historical scoring result, determining whether the initial sensitive data is the target sensitive data or not according to the historical scoring result of the initial sensitive data and the target scoring result.
10. The data processing method of claim 9, wherein the determining whether the initial sensitive data is target sensitive data according to the historical scoring result and the target scoring result of the initial sensitive data comprises:
calculating a difference value between the historical scoring result and the target scoring result according to the historical scoring result of the initial sensitive data and the target scoring result;
determining the initial sensitive data as target sensitive data under the condition that the difference is larger than a preset difference threshold; and
and determining that the initial sensitive data is not target sensitive data under the condition that the difference is smaller than or equal to the preset difference threshold.
11. A data processing apparatus comprising:
the data acquisition module is configured to acquire a sensitive data set containing initial sensitive data and determine a data scene corresponding to the initial sensitive data;
the algorithm determining module is configured to determine a data processing algorithm corresponding to the initial sensitive data and at least two target scoring units according to the data scene;
and the scoring module is configured to perform sensitivity scoring on the initial sensitive data according to the at least two target scoring units under the condition that the initial sensitive data meets the preset processing condition according to the data processing algorithm, so as to obtain a target scoring result.
12. A computing device, comprising:
a memory and a processor;
the memory is for storing computer-executable instructions and the processor is for executing the computer-executable instructions, which when executed by the processor implement the steps of the data processing method of any one of claims 1 to 10.
13. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the data processing method of any one of claims 1 to 10.
CN202210265154.8A 2022-03-17 2022-03-17 Data processing method and device Pending CN114611149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210265154.8A CN114611149A (en) 2022-03-17 2022-03-17 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210265154.8A CN114611149A (en) 2022-03-17 2022-03-17 Data processing method and device

Publications (1)

Publication Number Publication Date
CN114611149A true CN114611149A (en) 2022-06-10

Family

ID=81864335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210265154.8A Pending CN114611149A (en) 2022-03-17 2022-03-17 Data processing method and device

Country Status (1)

Country Link
CN (1) CN114611149A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116776390A (en) * 2023-08-15 2023-09-19 上海观安信息技术股份有限公司 Method, device, storage medium and equipment for monitoring data leakage behavior

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116776390A (en) * 2023-08-15 2023-09-19 上海观安信息技术股份有限公司 Method, device, storage medium and equipment for monitoring data leakage behavior

Similar Documents

Publication Publication Date Title
CN111966716A (en) Data processing method and device
CN110717758B (en) Abnormal transaction identification method and device
CN111783144A (en) Data processing method and device based on block chain
CN111782719B (en) Data processing method and device
CN115185760A (en) Abnormality detection method and apparatus
CN110675263B (en) Risk identification method and device for transaction data
CN112016850A (en) Service evaluation method and device
CN114611149A (en) Data processing method and device
CN110113748B (en) Crank call monitoring method and device
CN112232656B (en) Monitoring and early warning method, device, terminal and readable medium for business data
EP3879418B1 (en) Identity verification method and device
CN109241249B (en) Method and device for determining burst problem
CN116010221A (en) Alarm processing method and device
US11489877B2 (en) Cybersecurity maturity determination
CN112328937B (en) Information delivery method and device
CN111552846B (en) Method and device for identifying suspicious relationships
CN114780612A (en) System and method for mining target personnel based on time correlation of theme events
CN110808978A (en) Real name authentication method and device
CN111708811A (en) Visitor data management method and device, electronic equipment and storage medium
US10410228B2 (en) System for automatic responses to predicted tail event outcomes
CN110705995B (en) Data tagging method and device
KR20200141571A (en) Integrated History Management System of Police Manpower based on Block Chain
US20180239584A1 (en) Identification of users across multiple platforms
CN112134998B (en) Code number distinguishing method, electronic device and computer-readable storage medium
Velmurugadass et al. Enhancing Security Service of Data Protection Level using Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination