CN115098686A - Grading information determination method and device and computer equipment - Google Patents

Grading information determination method and device and computer equipment Download PDF

Info

Publication number
CN115098686A
CN115098686A CN202210840009.8A CN202210840009A CN115098686A CN 115098686 A CN115098686 A CN 115098686A CN 202210840009 A CN202210840009 A CN 202210840009A CN 115098686 A CN115098686 A CN 115098686A
Authority
CN
China
Prior art keywords
data matching
matching rule
historical data
similarity
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210840009.8A
Other languages
Chinese (zh)
Inventor
张游
夏雯君
谷俊
李菁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210840009.8A priority Critical patent/CN115098686A/en
Publication of CN115098686A publication Critical patent/CN115098686A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a hierarchical information determination method, a hierarchical information determination device, a computer device, a storage medium and a computer program product, which relate to the technical field of information security, wherein the method comprises the following steps: acquiring a newly added target data matching rule, determining the similarity between the target data matching rule and each historical data matching rule, and determining the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the grading information of each historical data matching rule. By adopting the method, the efficiency of determining the grading information of the newly added target data matching rule can be improved, and the accuracy of the grading information of the target data matching rule can be improved.

Description

Method and device for determining grading information and computer equipment
Technical Field
The present application relates to the field of information security technologies, and in particular, to a hierarchical information determining method, apparatus, and computer device.
Background
With the increasing importance on data security, data needs to be divided, and the security levels corresponding to data of different levels of information are different. The same data are in databases corresponding to different service scenarios, and corresponding rating information is different, for example, the classification and rating of the address field in a personal system and a public system are different.
In the related art, after a data matching rule is newly added, the classification information of the data matching rule is manually determined, and the classification information of the data matching rule is different and the applicable databases are different.
However, as the number of data matching rules is multiplied, the manner of manually determining the hierarchical information of the data matching rules is inefficient and less accurate.
Disclosure of Invention
In view of the above, it is necessary to provide a hierarchical information determining method, apparatus, computer device, computer readable storage medium and computer program product capable of improving efficiency.
In a first aspect, the present application provides a hierarchical information determination method. The method comprises the following steps:
acquiring a newly added target data matching rule;
determining the similarity between the target data matching rule and each historical data matching rule;
and determining the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the grading information of each historical data matching rule.
In one embodiment, the determining the similarity between the target data matching rule and each historical data matching rule includes:
performing word segmentation on the target data matching rule to obtain a first word segmentation result;
for each historical data matching rule, performing word segmentation on the historical data matching rule to obtain a second word segmentation result; determining the similarity of the first segmentation result and the second segmentation result, and taking the similarity of the first segmentation result and the second segmentation result as the similarity of the target data matching rule and the historical data matching rule.
In one embodiment, the determining the ranking information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the ranking information of each historical data matching rule includes:
and determining the grading information of the historical data matching rule with the highest similarity with the target data matching rule as the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule.
In one embodiment, the determining the ranking information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the ranking information of each historical data matching rule includes:
selecting a plurality of first historical data matching rules with the similarity meeting the similarity condition from the historical data matching rules according to a preset selection strategy and the similarity between the target data matching rule and each historical data matching rule;
and determining the grading information with the highest ratio in the grading information of the plurality of first historical data matching rules as the grading information of the target data matching rule.
In one embodiment, the selecting, according to a preset selection policy and the similarity between the target data matching rule and each historical data matching rule, a plurality of first historical data matching rules with similarity satisfying a similarity condition from each historical data matching rule includes:
and selecting a plurality of first historical data matching rules with the similarity higher than a preset threshold value from the historical data matching rules.
In one embodiment, the selecting, according to a preset selection policy, a plurality of previous target historical data matching rules includes:
sequencing the similarity between the target data matching rule and each historical data matching rule from high to low; and selecting a plurality of historical data matching rules in the historical data matching rules with ascending similarity as a first historical data matching rule.
In one embodiment, the rating information includes system identification, data classification, data rating;
the determining the classification information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the classification information of each historical data matching rule comprises:
determining a target system identifier of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and a system identifier corresponding to each historical data matching rule;
determining the similarity between a second historical data matching rule corresponding to the target system identification and the target data matching rule; and determining the target data classification and the target data classification of the target data matching rules based on the similarity between the target data matching rules and the second historical data matching rules and the data classification of the second historical data matching rules.
In a second aspect, the present application further provides a hierarchical information determining apparatus. The device comprises:
the acquisition module is used for acquiring a newly added target data matching rule;
the first determining module is used for determining the similarity between the target data matching rule and each historical data matching rule;
and the second determining module is used for determining the grading information of the target data matching rules based on the similarity between the target data matching rules and each historical data matching rule and the grading information of each historical data matching rule.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:
acquiring a newly added target data matching rule;
determining the similarity between the target data matching rule and each historical data matching rule;
and determining the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the grading information of each historical data matching rule.
In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring a newly added target data matching rule;
determining the similarity between the target data matching rule and each historical data matching rule;
and determining the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the grading information of each historical data matching rule.
In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:
acquiring a newly added target data matching rule;
determining the similarity between the target data matching rule and each historical data matching rule;
and determining the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the grading information of each historical data matching rule.
According to the method, the device, the computer equipment, the storage medium and the computer program product for determining the grading information of the data matching rules, the grading information of the target data matching rules is determined according to the similarity between the target data matching rules and the historical data matching rules and the grading information of the historical data matching rules, so that the efficiency of determining the grading information of the newly added target data matching rules is improved, and the accuracy of the grading information of the target data matching rules is improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a hierarchical information determination method according to an embodiment;
FIG. 2 is a schematic diagram of a hierarchical system in one embodiment;
FIG. 3 is a block diagram showing the structure of a hierarchical information determining apparatus according to one embodiment;
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Financial data are complex and various, and if hierarchical management is performed on the data, data protection objects can be further defined, so that reasonable allocation of data protection resources and cost of financial institutions is facilitated. Therefore, related financial data stored in the database needs to be divided, and the hierarchical information of each financial data is determined, so that different levels of security protection are implemented on different financial data.
The influence (such as possible damage, loss or potential risk) possibly caused by the security of the financial data being damaged is an important judgment basis for determining the grading information of the financial data, and two factors of an influence object and an influence degree are mainly considered, while the financial enterprise has many business systems, including hundreds of business systems and hundreds of millions of data. Therefore, the grading information of the same financial data in different business scenarios may be different, for example, the classification and grading of the address field are different in the personal system and the public system, the address field is graded as level 2 (more important) in the personal system and level 1 (less important) in the public system.
In the related technology, every time a new data matching rule is added, business personnel firstly determine a system to which the data matching rule belongs, secondly determine data classification and data classification of financial data matched by the data matching rule, namely determine the data classification and the data classification of the data matching rule, and then form the data matching rule with classification information.
On one hand, the efficiency of determining the classification information of the new data matching rule by the service personnel is low, and on the other hand, the accuracy of determining the classification information of the new data matching rule by the service personnel is low along with the increase of the data matching rule.
Based on the above, the application provides a method for determining the grading information of the data matching rule. Acquiring a newly added target data matching rule, determining the similarity between the target data matching rule and each historical data matching rule, and determining the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the grading information of each historical data matching rule.
According to the method for determining the grading information of the data matching rule, on one hand, the efficiency of determining the grading information of the new data matching rule is improved, and on the other hand, the accuracy of the grading information of the data matching rule is increased.
The method for determining the hierarchical information of the data matching rules can be applied to terminals or servers, the terminals can be but are not limited to various personal computers, notebook computers, smart phones and tablet computers, and the servers can be realized by independent servers or server clusters formed by a plurality of servers.
In one embodiment, as shown in fig. 1, there is provided a hierarchical information determination method, including the steps of:
and 101, acquiring a newly added target data matching rule.
The data matching rule may be used to find the data matching rule in the database, for example, the data matching rule is as follows: the data matching rule is utilized to find the data table with the table remark as the basic information date terminal download file of the personal client, and the field remark as the data of the birth date. As another example, the data matching rule is: the table remark (customer living country information modification list) & & field remark (customer ID type) may be found in the database under the data table of which the table remark is the customer living country information modification list, with the field remark being data of the customer ID type.
In one embodiment, after receiving a data matching rule input by a service person, the device determines that the input data matching rule is a newly added target matching rule. The device can also receive a newly added data matching rule request sent by other devices, and respond to the request to acquire a newly added target data matching rule.
In an embodiment, after receiving a new data matching rule, the device may further check whether the format of the data matching rule is correct, and determine that the data matching rule is a new target data matching rule after checking that the format is correct.
And 103, determining the similarity between the target data matching rule and each historical data matching rule.
Before the method for determining the grading information of the data matching rules is implemented, the grading information can be determined for a plurality of data matching rules by experienced business personnel to obtain a plurality of historical data matching rules. When the number of the historical data matching rules corresponding to each hierarchical information reaches a certain number, for example, the number of the historical data matching rules corresponding to each hierarchical information reaches 500, and the like.
As shown in table 1, several data matching rules and corresponding rating information are exemplarily given for the present application:
TABLE 1 data matching rules and hierarchical information
Figure BDA0003750547270000061
In one embodiment, the device may separately place the target data matching rule and the historical data matching rule in a preset encoder, obtain a target code of the target matching rule and a historical code of the historical data matching rule according to an output of the encoder, and then compare similarity between the target code and the historical code to obtain similarity between the target data matching rule and the historical data matching rule. The device can also perform word segmentation on the target data matching rule and the historical data matching rule respectively to obtain a first word segmentation result of the target data matching rule and a second word segmentation result of the historical data matching rule, and then compare the similarity of the first word segmentation result with the second word segmentation result to obtain the similarity of the target data matching rule and the historical data matching rule. The device can also vectorize the target data matching rule and the historical data matching rule, then input the vectorized target data matching rule and the historical data matching rule into the trained similarity comparison model, and determine the similarity between the target data matching rule and the historical data matching rule according to the result output by the similarity comparison model.
And 105, determining the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the grading information of each historical data matching rule.
The more similar the target data matching rule and the historical data matching rule are, the higher the degree of coverage of the table and the column related to the target data matching rule and the historical data matching rule is, so that the security importance levels are the same, and the grading information is the same.
In one embodiment, after the device determines the similarity between the target matching rule and each historical data matching rule, the device may determine the ranking information of the historical data matching rule with the highest similarity to the target matching rule as the ranking information of the target matching rule. The device can also select a plurality of first historical data matching rules from the historical data matching rules according to a preset selection strategy, and then determines the grading information with the highest proportion from the grading information of the plurality of first historical data matching rules as the grading information of the target data matching rule.
In this embodiment, on the one hand, the efficiency of determining the ranking information of the new data matching rule is improved, and on the other hand, the accuracy of the ranking information of the data matching rule is increased.
In an embodiment, the step 103 specifically includes:
and step 1031, performing word segmentation on the target data matching rule to obtain a first word segmentation result.
In one embodiment, the device may remove format content in the target data matching rule according to the requirement of the rule format, for example, a logic symbol required by the rule format or a fixed field required by the rule format, and then perform word segmentation according to the content of the format content in the target data matching rule to obtain a first word segmentation result. Taking the rule name as the client ID type, and the rule content as table remark (client country information modification list) & & field remark (client ID type), for example, after removing the logic symbol "(,)" and fixing the field "table remark, field remark", the obtained content is: the client ID type client living country information modification list client ID type is subjected to word segmentation to obtain a first word segmentation result: client, ID, type, client, residence, country, information, modification, list, client, ID, type.
In one embodiment, the device may further perform semantic analysis on the target data matching rule first, and sort out nouns in the target data matching rule as the first segmentation result. Or taking the rule name as the client ID type, and the rule content as table remark (client country information modification list) & & field remark (client ID type), the nouns in the rule are: client, ID type, client, country of residence, information, modification schedule, client, ID type, etc.
And 1033, performing word segmentation on the historical data matching rules according to each historical data matching rule to obtain a second word segmentation result.
In one embodiment, the device may remove format content in the historical data matching rule according to the requirement of the rule format, for example, a logic symbol of the requirement of the rule format, or a fixed field of the requirement of the rule format, and then perform word segmentation according to the content of the format content in the removed target data matching rule to obtain a second word segmentation result.
In one embodiment, the device may further perform semantic analysis on the historical data matching rule first, and sort out the nouns in the historical data matching rule as the second word segmentation result.
It should be noted that the word segmentation process for performing word segmentation on the target data matching rule is the same as the word segmentation process for performing word segmentation on the historical data matching rule, so that the consistency of the first word segmentation result and the second word segmentation result is ensured.
And 1035, determining the similarity between the first segmentation result and the second segmentation result, and taking the similarity between the first segmentation result and the second segmentation result as the similarity between the target data matching rule and the historical data matching rule.
In one embodiment, the device obtains a first segmentation result of the target data matching rule and a second segmentation result of the historical data matching rule, then vectorizes the first segmentation result and the second segmentation result, calculates a cosine distance, an euclidean distance, a manhattan distance and the like of the vectorized first segmentation result and the vectorized second segmentation result, and obtains a similarity between the first segmentation result and the second segmentation result.
Specifically, after obtaining the first segmentation result of the target data matching rule and the second segmentation result of the historical data matching rule, the device may merge the first segmentation result and the second segmentation result to obtain a word set union of the first segmentation result and the second segmentation result, and then calculate the word frequency of the first segmentation result and the word frequency of the second segmentation result respectively to obtain a first segmentation vector corresponding to the first segmentation result and a second segmentation vector corresponding to the second segmentation result. Then, the device obtains the similarity between the first segmentation result and the second segmentation result according to the cosine distance between the first segmentation vector and the second analysis vector and the like.
For example, the rule name of rule 1 (target data matching rule) is: the client ID type, the rule content is: table remark & (customer country information modification list) & & field remark & (customer ID type). Rule 2 (historical data matching rule) has a rule name: english name, rule content is: table remark & (customer name information modification list) & & field remark & (english name). Removing format content from the rule 1 and performing word segmentation to obtain a first word segmentation result: client/ID/type/residence/country/information/modification/detail/table, removing format content and performing word segmentation on rule 2, and obtaining a second word segmentation result as follows: customer/name/information/modify/list/table/english name. Taking a union set of the word segmentation results of the rule 1 and the rule 2 to obtain a word set union set: client/ID/type/residence/country/information/modification/detail/table/english name/name.
Respectively calculating the word frequency of the rule 1 and the rule 2 to obtain a first word segmentation vector corresponding to the first word segmentation result and a second word segmentation vector corresponding to the second word segmentation result:
a first segmentation vector: (3,2,2,1,1,1,1,1,1,0,0)
Second word segmentation vector: (1,0,0,0,0,1,1,1,1,2,1)
And substituting the first word segmentation vector and the second word segmentation vector into a cosine distance calculation formula:
Figure BDA0003750547270000091
that is, rule 1 has a similarity of 0.46 to rule 2.
In this embodiment, the similarity between the first segmentation result of the target data matching rule and the second segmentation result of the historical data matching rule is calculated to obtain the similarity between the target data matching rule and the historical data matching rule.
In an embodiment, the step 105 specifically includes:
and determining the grading information of the historical data matching rule with the highest similarity with the target data matching rule as the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule.
In one embodiment, after determining the similarity between the target data matching rule and each historical data matching rule, the device determines the historical data matching rule with the highest similarity to the target data matching rule, and then determines the ranking information of the historical data matching rule with the highest similarity to the target data matching rule as the ranking information of the target data matching rule.
In one embodiment, after the device determines the similarity between the target data matching rule and each historical data matching rule, the number of the historical data matching rules with the highest similarity to the target data matching rule is two or more, and if the determined hierarchical information of each historical data matching rule with the highest similarity to the target data matching rule is the same, the hierarchical information of each historical data matching rule with the highest similarity to the target data matching rule is determined as the hierarchical information of the target data matching rule. If the classification information of each historical data matching rule with the highest similarity with the target data matching rule determined by the equipment is different, the classification information with the highest proportion in the classification information of each historical data matching rule with the highest similarity is determined as the classification information of the target data matching rule.
In this embodiment, the ranking information of the history data matching rule most similar to the target data matching rule is taken as the ranking information of the target data matching rule.
In an embodiment, the step 105 specifically includes:
and 105A, selecting a plurality of first historical data matching rules with the similarity meeting the selection strategy from the historical data matching rules according to the preset selection strategy and the similarity between the target data matching rule and each historical data matching rule.
In one embodiment, the device obtains a preset selection strategy, and selects a plurality of first historical data matching rules with the similarity meeting the selection strategy from the similarity between the target data matching rule and each historical data matching rule.
Specifically, the device may use, as the first historical data matching rule, a historical data matching rule whose similarity is higher than a preset threshold, for example, may select, as the first historical data matching rule, a historical data matching rule whose similarity is higher than eighty-five percent. The device may also sort the historical data according to the similarity from high to low, and select the first several historical data matching rules as the first historical data matching rules, for example, select the 20 historical data matching rules with the highest similarity as the first historical data matching rules.
And 105B, determining the grading information with the highest ratio in the grading information of the plurality of first historical data matching rules as the grading information of the target data matching rule.
In one embodiment, the device selects a plurality of first historical data matching rules from the historical data matching rules according to a preset selection strategy, then obtains the grading information of the first historical data matching rules, determines the grading information with the highest ratio from the grading information of the plurality of first historical data matching rules, and takes the grading information with the highest ratio as the grading information of the target data matching rule.
In this embodiment, a plurality of historical data matching rules with the greatest similarity to the target data matching rule are selected, and then the hierarchical information with the highest percentage in the hierarchical information of the selected plurality of historical data matching rules is used as the hierarchical information of the target data matching rule.
In an embodiment, the step 105A specifically includes:
and selecting a plurality of first historical data matching rules with the similarity higher than a preset threshold value from the historical data matching rules.
In one embodiment, the device may first obtain a preset threshold set for the similarity, screen out a historical data matching rule having a similarity higher than the preset threshold after determining the similarity between each historical data matching rule and the target data matching rule, and use the screened historical data matching rule as the first historical data matching rule.
In the embodiment, a plurality of historical data matching rules with similarity higher than a preset threshold with the target data matching rule are selected, and the reference reliability of the selected historical data matching rules is ensured.
In an embodiment, the step 105A specifically includes:
step A1, the similarity between the target data matching rule and each historical data matching rule is sorted from high to low.
In one embodiment, after determining the similarity between the target data matching rule and each historical data matching rule, the device sorts the similarity between the target data matching rule and each historical data matching rule from high to low to obtain each historical data matching rule with ascending similarity.
And A2, selecting a plurality of historical data matching rules as first historical data matching rules from the historical data matching rules with ascending similarity.
In one embodiment, after the device obtains the historical data matching rules with the ascending similarity, the previous historical data matching rules are selected from the historical data matching rules with the ascending similarity to serve as the first historical data matching rule.
In an embodiment, the step 105A specifically includes:
and step B1, sorting the similarity between the target data matching rule and each historical data matching rule from low to high.
In one embodiment, after determining the similarity between the target data matching rule and each historical data matching rule, the device sorts the similarity between the target data matching rule and each historical data matching rule from low to high to obtain each historical data matching rule with the similarity in descending order.
And step B2, selecting a plurality of historical data matching rules from the historical data matching rules with descending similarity as first historical data matching rules.
In one embodiment, after the device obtains the historical data matching rules with the descending similarity, the previous historical data matching rules are selected from the historical data matching rules with the descending similarity as the first historical data matching rule.
In this embodiment, a plurality of historical data matching rules with the maximum similarity to the target data matching rule are selected, and the number of the selected historical data matching rules is ensured to be enough as a reference of the target data matching rule.
In one embodiment, the rating information includes system identification, data classification, data rating. In this case, the step 105 specifically includes:
step 1, determining a target system identification corresponding to the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the system identification corresponding to each historical data matching rule.
As shown in table 2, several exemplary data matching rules and corresponding rating information are provided for the present application, where the rating information includes system identification, data classification, and data rating.
TABLE 2 data matching rules and hierarchical information
Figure BDA0003750547270000121
Figure BDA0003750547270000131
In one embodiment, after determining the similarity between the target matching rule and each historical data matching rule, the device may determine the system identifier of the historical data matching rule with the highest similarity to the target matching rule as the target system identifier of the target matching rule.
In an embodiment, the device may further select, according to a preset selection policy, a plurality of first historical data matching rules from the historical data matching rules, and then determine a system identifier with a highest proportion from the system identifiers of the plurality of first historical data matching rules as a target system identifier of the target data matching rule.
And 2, determining the similarity between the second historical data matching rule corresponding to the target system identification and the target data matching rule. And determining the target data classification and the target data classification of the target data matching rule based on the similarity between the target data matching rule and each second historical data matching rule and the data classification of each second historical data matching rule.
In one embodiment, after the device determines the target system identifier of the target data matching rule, it determines each second historical data matching rule of the target system identifier for which the system is determined, and obtains the data classification and data classification of the second historical data matching rule of which the system identifier is determined as the target system identifier. Then, the device may determine, as the target data classification and the data classification of the target matching rule, the data classification and the data classification of the second history data matching rule having the highest similarity to the target matching rule among the second history data matching rules whose system identification is the target system identification. The device can also select a plurality of target second historical data matching rules according to a preset selection strategy from second historical data matching rules with the system identification as the target system identification, and then classify the data of the plurality of target second historical data matching rules, classify the data with the highest proportion in the data classification, classify the data, and determine the data as the target data classification and the target data classification of the target data matching rules.
In one embodiment, the device selects a plurality of historical data matching rules according to a preset selection strategy from the historical data matching rules, determines a system identifier with the highest proportion from the system identifiers of the plurality of historical data matching rules as a target system identifier of the target data matching rule, determines the historical data matching rule with the system identifier as the target system identifier from the plurality of historical data matching rules, and sets the data classification and the data classification with the highest proportion from the selected system identifier as the historical data matching rule of the target system identifier as the target data classification and the target data classification of the target data matching rule.
For example, the device ranks the similarities between the historical data matching rules and the target data matching rules, selects the K historical data matching rules with the highest similarity, determines the system identifiers of the K historical data matching rules, and sets the target system identifier of the newly added target data matching rule as the system identifier with the highest proportion according to the principle that a minority follows a majority. And then the equipment determines X historical data matching rules with the system identification as the target system identification in the K historical data matching rules, determines the data classification and the data classification of the X historical data matching rules, and sets the target data classification and the target data classification of the newly added target data matching rules as the data classification and the data classification with the highest proportion according to the principle that a minority obeys a majority.
In one embodiment, the device may further obtain the priority of each data matching rule, and when each data in the database is divided according to the data matching rule and the classification information, if two data matching rules match the same data, the data is divided according to the data matching rule with the higher priority.
In addition, the application also provides a grading system of the data matching rules, as shown in fig. 2, the system comprises a database, a database configuration module, a data acquisition module, a data matching rule matching module, a grading module of the data matching rules, and a rule base.
The database is provided with a plurality of databases, and each database stores service data under different service scenes.
And the database configuration module is used for configuring the connection configuration information of each database in the database and the service system identifier of each database.
And the data acquisition module is used for reading data in the database and extracting the data characteristic information of each data table according to the connection configuration information of each database in the database configuration module.
And the data matching rule matching module is used for matching the data characteristic information extracted by the data acquisition module according to each data matching rule in the data matching rule base corresponding to each service system in the rule base, and determining the grading information of the matched data as the grading information of the data matching rule. The matching result is generally a report file or is displayed in a page, and comprises database names, table names, field names, classification, grading, matching rule names and the like of data.
And the rule base is used for storing the data matching rules corresponding to each business system.
And the data matching rule grading module is used for receiving the newly added target data matching rule, automatically dividing the newly added data matching rule into rule bases corresponding to all the service systems, and setting corresponding data classification and data grading.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides a classification information determination apparatus for implementing the above-mentioned classification information determination method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the rating information determining device provided below can be referred to the limitations of the method of the rating information determining device in the above, and details are not described herein again.
In one embodiment, as shown in fig. 3, there is provided a hierarchical information determining apparatus including:
an obtaining module 301, configured to obtain a newly added target data matching rule;
a first determining module 303, configured to determine similarity between the target data matching rule and each historical data matching rule;
a second determining module 305, configured to determine ranking information of the target data matching rule based on a similarity between the target data matching rule and each of the historical data matching rules and ranking information of each of the historical data matching rules.
In one embodiment, the second determining module 303 specifically includes:
a first segmentation unit 303A (not shown in the figure) configured to perform segmentation on the target data matching rule to obtain a first segmentation result;
a second word segmentation unit 303B (not shown in the figure) configured to perform word segmentation on each historical data matching rule to obtain a second word segmentation result;
a similarity comparing unit 303C (not shown in the figure), configured to determine a similarity between the first segmentation result and the second segmentation result, and use the similarity between the first segmentation result and the second segmentation result as a similarity between the target data matching rule and the historical data matching rule.
In one embodiment, the second determining module 305 is specifically configured to:
and determining the grading information of the historical data matching rule with the highest similarity with the target data matching rule as the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule.
In one embodiment, the second determining module 305 specifically includes:
a selecting unit 305A (not shown in the figure) configured to select, according to a preset selecting policy and similarities between the target data matching rule and each historical data matching rule, a plurality of first historical data matching rules with similarities satisfying the selecting policy from each historical data matching rule;
the determining unit 305B (not shown in the figure) is configured to determine, as the ranking information of the target data matching rule, the ranking information having the highest ratio among the ranking information of the plurality of first history data matching rules.
In one embodiment, the selecting unit 305A is specifically configured to:
and selecting a plurality of first historical data matching rules with the similarity higher than a preset threshold value from the historical data matching rules.
In one embodiment, the selecting unit 305A specifically includes:
a sorting subunit a (not shown in the figure) for sorting the similarity between the target data matching rule and each historical data matching rule from high to low;
and the selecting subunit B (not shown in the figure) is configured to select, as the first historical data matching rule, the previous several historical data matching rules from the historical data matching rules with ascending similarity.
The respective modules in the hierarchical information determining apparatus described above may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing historical data matching rule data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a hierarchical information determination method.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 4. The computer device comprises a processor, a memory, and a communication interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a data hierarchy determination method.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, carries out the steps in the method embodiments described above.
It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided herein can include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. A hierarchical information determination method, the method comprising:
acquiring a newly added target data matching rule;
determining the similarity between the target data matching rule and each historical data matching rule;
and determining the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the grading information of each historical data matching rule.
2. The method of claim 1, wherein determining the similarity of the target data matching rule to each historical data matching rule comprises:
performing word segmentation on the target data matching rule to obtain a first word segmentation result;
for each historical data matching rule, performing word segmentation on the historical data matching rule to obtain a second word segmentation result; determining the similarity of the first segmentation result and the second segmentation result, and taking the similarity of the first segmentation result and the second segmentation result as the similarity of the target data matching rule and the historical data matching rule.
3. The method of claim 1, wherein determining the ranking information of the target data matching rule based on the similarity of the target data matching rule to each of the historical data matching rules and the ranking information of each of the historical data matching rules comprises:
and determining the grading information of the historical data matching rule with the highest similarity with the target data matching rule as the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule.
4. The method of claim 1, wherein determining the ranking information of the target data matching rule based on the similarity of the target data matching rule to each of the historical data matching rules and the ranking information of each of the historical data matching rules comprises:
selecting a plurality of first historical data matching rules with the similarity meeting the selection strategy from the historical data matching rules according to a preset selection strategy and the similarity between the target data matching rule and each historical data matching rule;
and determining the grading information with the highest ratio in the grading information of the plurality of first historical data matching rules as the grading information of the target data matching rule.
5. The method according to claim 4, wherein the selecting, according to a preset selection policy and the similarity between the target data matching rule and each historical data matching rule, a plurality of first historical data matching rules with the similarity satisfying the selection policy from each historical data matching rule comprises:
and selecting a plurality of first historical data matching rules with the similarity higher than a preset threshold value from the historical data matching rules.
6. The method according to claim 4, wherein the selecting a plurality of first historical data matching rules with similarity satisfying the selection policy from the historical data matching rules according to a preset selection policy and similarity between the target data matching rule and each historical data matching rule comprises:
sequencing the similarity between the target data matching rule and each historical data matching rule from high to low;
and selecting a plurality of historical data matching rules as first historical data matching rules from the historical data matching rules with ascending similarity.
7. The method of claim 1, wherein the rating information includes system identification, data classification, data rating;
the determining the ranking information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the ranking information of each historical data matching rule comprises:
determining a target system identifier of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and a system identifier corresponding to each historical data matching rule;
determining the similarity between a second historical data matching rule corresponding to the target system identification and the target data matching rule; and determining the target data classification and the target data classification of the target data matching rule based on the similarity between the target data matching rule and each second historical data matching rule and the data classification of each second historical data matching rule.
8. A hierarchical information determination apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring a newly added target data matching rule;
the first determining module is used for determining the similarity between the target data matching rule and each historical data matching rule;
and the second determination module is used for determining the grading information of the target data matching rule based on the similarity between the target data matching rule and each historical data matching rule and the grading information of each historical data matching rule.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202210840009.8A 2022-07-18 2022-07-18 Grading information determination method and device and computer equipment Pending CN115098686A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210840009.8A CN115098686A (en) 2022-07-18 2022-07-18 Grading information determination method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210840009.8A CN115098686A (en) 2022-07-18 2022-07-18 Grading information determination method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN115098686A true CN115098686A (en) 2022-09-23

Family

ID=83298394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210840009.8A Pending CN115098686A (en) 2022-07-18 2022-07-18 Grading information determination method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN115098686A (en)

Similar Documents

Publication Publication Date Title
CN109948641B (en) Abnormal group identification method and device
CN112528025A (en) Text clustering method, device and equipment based on density and storage medium
CN107622326B (en) User classification and available resource prediction method, device and equipment
US20230205755A1 (en) Methods and systems for improved search for data loss prevention
CN110825894A (en) Data index establishing method, data index retrieving method, data index establishing device, data index retrieving device, data index establishing equipment and storage medium
CN112052891A (en) Machine behavior recognition method, device, equipment and computer readable storage medium
CN111209929A (en) Access data processing method and device, computer equipment and storage medium
US20220229854A1 (en) Constructing ground truth when classifying data
CN111611228B (en) Load balancing adjustment method and device based on distributed database
CN116561607A (en) Method and device for detecting abnormality of resource interaction data and computer equipment
CN114881761A (en) Determination method of similar sample and determination method of credit limit
CN115759742A (en) Enterprise risk assessment method and device, computer equipment and storage medium
CN115098686A (en) Grading information determination method and device and computer equipment
CN109885710B (en) User image depicting method based on differential evolution algorithm and server
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium
CN111382068A (en) Hierarchical testing method and device for mass data
CN116578583B (en) Abnormal statement identification method, device, equipment and storage medium
CN113806372B (en) New data information construction method, device, computer equipment and storage medium
CN110177006B (en) Node testing method and device based on interface prediction model
CN117541193A (en) Business auditing method, device, computer equipment and storage medium
CN117333255A (en) Product recommendation method, device, computer equipment, storage medium and program product
CN111177132A (en) Label cleaning method, device, equipment and storage medium for relational data
CN117455386A (en) Resource auditing method and device, computer equipment and storage medium thereof
CN115689738A (en) Business intervention method, device, equipment, storage medium and program product
CN117036041A (en) Service information pushing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination