CN117591530B - Data cross section processing method and system - Google Patents
Data cross section processing method and system Download PDFInfo
- Publication number
- CN117591530B CN117591530B CN202410066973.9A CN202410066973A CN117591530B CN 117591530 B CN117591530 B CN 117591530B CN 202410066973 A CN202410066973 A CN 202410066973A CN 117591530 B CN117591530 B CN 117591530B
- Authority
- CN
- China
- Prior art keywords
- data
- abnormal
- service
- determining
- update
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 230000002159 abnormal effect Effects 0.000 claims abstract description 226
- 238000010276 construction Methods 0.000 claims abstract description 77
- 230000005856 abnormality Effects 0.000 claims description 33
- 238000012795 verification Methods 0.000 claims description 30
- 238000000034 method Methods 0.000 claims description 29
- 238000011156 evaluation Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 17
- 238000012216 screening Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2358—Change logging, detection, and notification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a data cross section processing method and a system, which belong to the technical field of data processing, and specifically comprise the following steps: the abnormal update probability of different business data under different matching check rules is determined based on the problem update records, the abnormal update data and stable update data in different business systems are determined, the system abnormal probability of different business systems and the abnormal business systems are determined by combining the abnormal update probability of different business data in different business systems, the determination of the comprehensive abnormal probability is carried out according to the abnormal business systems in different business systems and the system abnormal probability of different business systems, and the determination of the construction period of the data section of the business system is carried out by combining the similarity of the system frames among different business systems, so that the construction of the data section and the problem data identification efficiency are improved.
Description
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a data cross section processing method and system.
Background
The cross-section data (cross-section data) refers to the observation value of a whole group (or all) of individuals reflected on a data section at the same time (time period or time point), and due to the fact that the program languages and software structures of different application systems inside enterprises have a certain degree of difference, the problem that updating is not timely or data processing errors inevitably exist in the data transmission and processing process is important to ensure the reliability and accuracy of the data through the construction of the data section.
In the prior art, cross section data of different data cross sections are collected and analyzed to realize verification processing of the cross section data, so that the reliability of the data of an enterprise is ensured, and the following technical problems are solved by analyzing and verifying the cross section data in the invention patent CN202210913338.0, a mining method for distribution rules and outliers of the cross section data, CN202011221755.6, a key cross section data analysis method, a device, equipment and a storage medium:
For large enterprises, for example, in consumer finance enterprises, because the number of service systems is huge, and meanwhile, the data volume of interactive data of data interaction among service modules among different service systems is also large, if the determination of sampling analysis objects and sampling analysis periods of differentiated data sections cannot be carried out according to the historical verification results of the data, the real-time performance of verification analysis of section data of the data sections cannot be ensured, and meanwhile, the reliability of the data of the enterprises cannot be ensured.
Aiming at the technical problems, the invention provides a data section processing method and a data section processing system.
Disclosure of Invention
In order to achieve the purpose of the invention, the invention adopts the following technical scheme:
According to one aspect of the present invention, a data cross-section processing method is provided.
The data cross section processing method is characterized by comprising the following steps of:
S1, determining problem update records of service data through historical change data of the service data of different service systems, and determining abnormal update probabilities of different service data under different matching check rules based on the problem update records;
s2, determining an abnormal verification rule and a reliable verification rule of the service data based on abnormal update probabilities of different service data under different matching verification rules, determining data abnormal probabilities and abnormal update data of the service data by combining the abnormal update probabilities of different service data under different matching verification rules, and determining a construction period of a data section of the abnormal update data based on the data abnormal probabilities;
s3, determining abnormal update data and stable update data in different service systems, determining system abnormal probability of different service systems and abnormal service systems by combining the abnormal update probability of different service data in different service systems, and determining the construction period of a data section of the abnormal service system based on the system abnormal probability;
S4, determining the comprehensive abnormal probability according to abnormal service systems in different service systems and the system abnormal probability of different service systems, and determining the construction period of the data section of the service system by combining the similarity of the system frames among different service systems.
The invention has the beneficial effects that:
1. The abnormal update probability of different business data under different matching check rules is determined based on the problem update records, so that the accurate evaluation of the problem conditions of different business data under the corresponding different matching check rules from the problem update conditions under the historical update times is realized, and meanwhile, a foundation is laid for the determination of the construction period of the differentiated data section of the business data with larger problems through the screening of the matching check rules corresponding to the business data with larger abnormal update probability;
2. By determining the data anomaly probability and anomaly update data of the service data, the probability of anomaly occurrence of the service data from multiple angles is accurately estimated, meanwhile, the differential estimation of the construction period of the data section of the service data with larger probability of anomaly occurrence is ensured by screening the anomaly update data, and further the verification processing efficiency and the reliability of the service data are ensured;
3. The construction period of the data section of the service system is determined by integrating the anomaly probability and the similarity of the system frames among different service systems, so that the difference of the anomaly probability of the service data of the service system is considered, meanwhile, the construction difficulty of the data section is accurately evaluated by considering the similarity of the system frames, the reliability of the service data of the service system with the anomaly probability is ensured, and meanwhile, the problem of high system pressure caused by the fact that the construction of the data section is frequently carried out due to high construction difficulty is solved.
The service system comprises a login system, a credit granting system, a branch system, a repayment interface and a credit application system.
The further technical scheme is that the problem update record comprises the problem update times of the service data in different historical update times and update results under different problem update times.
The further technical scheme is that the matching check rule comprises 0-1 check, range check, data type check, length limit check and logic relation check.
The further technical scheme is that the method for determining the abnormal check rule and the reliable check rule of the service data comprises the following steps:
And when the abnormal updating probability of the service data under the matching check rule is greater than the set abnormal probability, determining that the matching check rule is an abnormal check rule, and when the abnormal updating probability of the service data under the matching check rule is within a preset abnormal probability interval, determining that the matching check rule is a reliable check rule.
The further technical scheme is that the method for determining the stable update data comprises the following steps:
And when the data anomaly probability of the service data in the service system is smaller than a preset anomaly probability threshold value, determining that the service data is stable update data.
In a second aspect, the present invention provides a computer system comprising: a communicatively coupled memory and processor, and a computer program stored on the memory and capable of running on the processor, characterized by: the processor executes a data cross-section processing method as described above when running the computer program.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention as set forth hereinafter.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings;
FIG. 1 is a flow chart of a method of data cross-section processing;
FIG. 2 is a flow chart of a method of determining abnormal update probabilities for traffic data under different match check rules;
FIG. 3 is a flow chart of a method of determining a probability of data anomalies for traffic data;
FIG. 4 is a flow chart of a method of determination of an abnormal business system;
FIG. 5 is a flow chart of a method of determining a build cycle of a data section of a business system;
FIG. 6 is a block diagram of a computer system.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present disclosure.
Noun interpretation:
Cross-section data refers to an observation value reflecting a whole set of (or all) individuals on a data section at the same time (period or time point).
Technical problems:
for the business systems of the credit processing mechanism, when the processing of data is carried out by the login system, the credit processing system, the branch system, the repayment interface and the credit application system, the problems of untimely data updating or wrong updating processing may exist, so if different business data and the data section of the business system cannot be constructed according to the abnormal conditions in the data updating processing, the verification processing of the business data cannot be accurately realized, and the reliability of the business data cannot be ensured.
In order to solve the technical problems, the following technical means are adopted:
Firstly, determining abnormal update probabilities of different business data under different matching check rules based on a problem update record, and particularly determining the abnormal update probabilities under different matching check rules through the ratio of the problem update times to the update times under different matching check rules;
determining an abnormal verification rule and a reliable verification rule of the service data based on abnormal update probabilities of different service data under different matching verification rules, determining data abnormal probabilities and abnormal update data of the service data according to the abnormal verification rule and the reliable verification rule of the service data and the abnormal update probabilities of different service data under different matching verification rules, specifically determining the construction period of a data section of the abnormal update data based on the normalized quantity sum of the products of the duty ratio of the abnormal verification rule and the abnormal update probabilities and the duty ratio of the reliable verification rule and the abnormal update probability;
Determining system anomaly probabilities of different service systems and anomaly service systems according to anomaly update data and stable update data in different service systems and anomaly update probabilities of different service data, wherein the system anomaly probabilities can be determined specifically through the normalized quantity sum of products of the duty ratio of the anomaly update data and the anomaly update probabilities, the duty ratio of the stable update data and the anomaly update probabilities, and the construction period of a data section of the anomaly service system is determined based on the system anomaly probabilities;
Finally, determining comprehensive abnormal probability according to abnormal service systems in different service systems and system abnormal probability of different service systems, determining construction period of data sections of the service systems by combining similarity of system frames among different service systems, particularly dividing the service systems into different service system groups through the system frames of different service systems, determining similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in different service system groups, and determining construction difficulty and basic construction period of the data sections of the service systems through the similarity; determining the comprehensive system anomaly probability of the service system by the number of the anomaly service systems in the service system, the system anomaly probabilities of different anomaly service systems and the average value of the system anomaly probabilities of different service systems; and determining the construction period of the data section of the service system by utilizing the comprehensive system anomaly probability and the basic construction period of the service system.
The data section is constructed in the following way:
1. screening business system and database table objects
2. Determining coarse-grained data time points
3. Variable dimension-increasing processing of data object
4. Accurate time point comparison based on data time points in dimension-increasing information
5. Setting a flag bit and cutting up a lifting dimension entity
6. And (3) reducing the dimension of the data entity output in the fifth step and outputting final section data.
Further explanation will be made below from two perspectives of the method class embodiment and the system class embodiment.
In order to solve the above-mentioned problems, according to an aspect of the present invention, as shown in fig. 1, there is provided a data cross-section processing method, which is characterized by specifically comprising:
S1, determining problem update records of service data through historical change data of the service data of different service systems, and determining abnormal update probabilities of different service data under different matching check rules based on the problem update records;
specifically, the business system comprises a login system, a credit granting system, a branch system, a repayment interface and a credit application system.
Further, the problem update record includes the number of problem updates in different historical update times and the update result under different problem update times of the service data.
In one possible embodiment, as shown in fig. 2, the method for determining the abnormal update probability of the service data in the step S1 under different matching check rules is as follows:
S11, determining the problem update times of the service data under different matching check rules based on update results under different problem update times, and taking the problem update times of the service data under different matching check rules as the matching problem times under the matching check rules;
S12, determining abnormal update probability of the service data under different matching check rules according to the number of matching problems under different matching check rules, the number of updating average intervals among the different matching problems and the number of matching problems with the interval number of updating less than a preset number.
In another possible embodiment, the method for determining the abnormal update probability of the service data in the step S1 under different matching check rules is as follows:
Determining the problem update times of the service data under different matching check rules based on update results under different problem update times, and taking the problem update times of the service data under different matching check rules as the matching problem times under the matching check rules;
When the number of times of the matching problem under the matching check rule is larger than a preset number of times, determining abnormal updating probability of the service data under the matching check rule according to the number of times of the matching problem;
When the number of matching questions under the matching check rule is not greater than a preset number,
Determining that the number of updating intervals is smaller than the number of matching problems under different matching check rules, and determining the abnormal updating probability of the service data under the matching check rules according to the number of matching problems with the number of updating intervals smaller than the preset number of matching problems with the number of updating intervals larger than the preset number of matching problems;
When the number of the update times of the interval is smaller than the number of the match questions of the preset times and is not larger than the number of the match questions of the preset times, determining a question frequency evaluation amount by the number of the update times of the interval is smaller than the number of the match questions of the preset times and the number of the update times of the average interval of the match questions of the interval is smaller than the number of the preset times, and when the question frequency evaluation amount does not meet the requirement, determining abnormal update probability of the service data under the match check rule by the question frequency evaluation amount;
And when the problem frequency evaluation value meets the requirement, determining abnormal update probability of the service data under different matching check rules according to the matching problem times under different matching check rules and the update times of the average interval between the different matching problem times.
In another possible embodiment, the method for determining the abnormal update probability of the service data in the step S1 under different matching check rules is as follows:
Determining the problem update times of the service data under different matching check rules based on update results under different problem update times, and taking the problem update times of the service data under different matching check rules as the matching problem times under the matching check rules;
judging whether the number of times of the matching problem under the matching check rule is larger than a preset number of times, if so, determining the abnormal updating probability of the service data under the matching check rule according to the number of times of the matching problem, if not, entering the next step;
Determining the number of times of matching problems with the interval less than the preset number of times according to the number of times of matching problems under different matching check rules, determining a problem frequency evaluation quantity according to the number of times of matching problems with the interval less than the preset number of times of matching problems and the average number of times of updating the interval less than the preset number of times of matching problems, judging whether the problem frequency evaluation quantity meets the requirement, if yes, determining the abnormal updating probability of the service data under the matching check rules according to the problem frequency evaluation quantity, and if no, entering the next step;
and determining abnormal update probability of the business data under different matching check rules according to the matching problem times under different matching check rules and the update times of average intervals among different matching problem times by using the problem frequency evaluation.
Specifically, the matching check rule comprises 0-1 check, range check, data type check, length limit check and logic relation check.
S2, determining an abnormal verification rule and a reliable verification rule of the service data based on abnormal update probabilities of different service data under different matching verification rules, determining data abnormal probabilities and abnormal update data of the service data by combining the abnormal update probabilities of different service data under different matching verification rules, and determining a construction period of a data section of the abnormal update data based on the data abnormal probabilities;
it can be understood that the method for determining the anomaly check rule and the reliable check rule of the service data is as follows:
And when the abnormal updating probability of the service data under the matching check rule is greater than the set abnormal probability, determining that the matching check rule is an abnormal check rule, and when the abnormal updating probability of the service data under the matching check rule is within a preset abnormal probability interval, determining that the matching check rule is a reliable check rule.
In one possible embodiment, as shown in fig. 3, the method for determining the data anomaly probability of the service data in the step S2 is as follows:
S21, determining basic anomaly probability of the service data based on the problem update times of the service data;
S22, determining the rule anomaly probability of the service data according to the anomaly update probability of the service data under different matching check rules, the matching check rules with the problem update times, the anomaly update probability of the anomaly check rules and the anomaly update probability of the reliable check rules;
s23, determining the data anomaly probability of the service data based on the rule anomaly probability and the basic anomaly probability.
Further, when the data anomaly probability of the service data does not meet the requirement, determining that the service data is anomaly update data.
In another possible embodiment, the method for determining the data anomaly probability of the service data in the step S2 is as follows:
S21, determining basic anomaly probability of the service data based on the problem update times of the service data, the update times of the average interval between different problem update times and the problem update times of which the interval update times are smaller than the preset times;
s22, judging whether the number of the abnormal check rules is smaller than the number of preset check rules, if yes, taking the basic abnormal probability of the service data as the data abnormal probability of the service data, and if no, entering the next step;
S23, determining the comprehensive abnormal update probability of the abnormal check rules according to the number of the abnormal check rules and the abnormal update probabilities of different abnormal check rules, judging whether the comprehensive abnormal update probability of the abnormal check rules is smaller than a preset probability threshold, if so, taking the basic abnormal probability of the business data as the data abnormal probability of the business data, and if not, entering the next step;
S24, determining rule abnormality probability of the service data according to the abnormality update probability of the service data under different matching check rules and matching check rules with problem update times, the number of reliable abnormality rules, the abnormality update probability of the reliable check rules and the comprehensive abnormality update probability of the abnormality check rules, and determining data abnormality probability of the service data based on the rule abnormality probability and the basic abnormality probability.
It can be appreciated that determining the construction period of the data section of the abnormal update data based on the data abnormality probability specifically includes:
Determining a preset abnormal probability interval corresponding to the data abnormal probability of the abnormal update data based on the data abnormal probability, and determining the construction period of the data section of the abnormal update probability through the preset abnormal probability interval corresponding to the data abnormal probability.
S3, determining abnormal update data and stable update data in different service systems, determining system abnormal probability of different service systems and abnormal service systems by combining the abnormal update probability of different service data in different service systems, and determining the construction period of a data section of the abnormal service system based on the system abnormal probability;
It should be noted that, the method for determining the stable update data includes:
And when the data anomaly probability of the service data in the service system is smaller than a preset anomaly probability threshold value, determining that the service data is stable update data.
In one possible embodiment, as shown in fig. 4, the method for determining the abnormal service system in the step S3 is as follows:
Determining the comprehensive abnormal update probability of the abnormal update data of the service system according to the abnormal update probability of different abnormal update data and the number of abnormal update data in the service system, and determining the comprehensive abnormal update probability of the stable update data of the service system based on the abnormal update probability of different stable update data and the number of stable update data in the service system;
Determining an average value of abnormal update frequencies of different business data of the business system based on the abnormal update probabilities of the different business data of the business system, and determining the system abnormal probability of the business system by combining the number of the business data of the business system, the comprehensive abnormal update probability of the abnormal update data and the comprehensive abnormal update probability of the stable update data;
and determining whether the service system belongs to an abnormal service system according to the system abnormality probability of the service system.
S4, determining the comprehensive abnormal probability according to abnormal service systems in different service systems and the system abnormal probability of different service systems, and determining the construction period of the data section of the service system by combining the similarity of the system frames among different service systems.
In one possible embodiment, as shown in fig. 5, the method for determining the construction period of the data section of the service system in the step S4 is as follows:
S41, dividing the service systems into different service system groups based on system frames of different service systems, determining the similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in the different service system groups, and determining the construction difficulty and the basic construction period of the data sections of the service systems through the similarity;
s42, determining the comprehensive system abnormality probability of the service system through the number of abnormal service systems in the service system, the system abnormality probabilities of different abnormal service systems and the average value of the system abnormality probabilities of different service systems;
s43, determining the construction period of the data section of the service system by using the comprehensive system anomaly probability of the service system and the basic construction period.
In another possible embodiment, the method for determining the construction period of the data section of the service system in the step S4 is as follows:
Dividing the service systems into different service system groups based on system frames of different service systems, determining the similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in the different service system groups, and determining the construction difficulty and the basic construction period of the data sections of the service systems through the similarity;
When a service system with the abnormal probability of the system not meeting the requirement does not exist, taking the basic construction period as the construction period of the data section of the service system;
when a service system with the abnormal probability of the system not meeting the requirement exists, when the number of abnormal service systems in the service system is smaller than the number of preset systems, the basic construction period is used as the construction period of the data section of the service system;
When the number of abnormal service systems in the service system is not less than the number of preset systems, determining the comprehensive system abnormal probability of the abnormal service system according to the number of the abnormal service systems in the service system and the system abnormal probabilities of different abnormal service systems, and when the comprehensive system abnormal probability of the abnormal service system does not meet the requirement, determining the construction period of the data section of the service system according to the comprehensive system abnormal probability of the abnormal service system and the basic construction period;
When the comprehensive system abnormality probability of the abnormal service system meets the requirement, determining the comprehensive system abnormality probability of the service system according to the comprehensive system abnormality probability of the abnormal service system, the number of the service systems and the average value of the system abnormality probabilities of different service systems, and determining the construction period of the data section of the service system by utilizing the comprehensive system abnormality probability of the service system and the basic construction period.
In another possible embodiment, the method for determining the construction period of the data section of the service system in the step S4 is as follows:
S41, dividing the service systems into different service system groups based on system frames of different service systems, determining the similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in the different service system groups, and determining the construction difficulty and the basic construction period of the data sections of the service systems through the similarity;
S42, judging whether a service system with the system abnormality probability larger than the preset system abnormality probability exists, if so, entering a step S43, and if not, entering a next step;
S43, judging whether the number of abnormal service systems in the service systems is smaller than the number of preset systems, if so, taking the basic construction period as the construction period of the data section of the service system, and if not, entering the next step;
s44, determining the abnormal probability of the comprehensive system of the abnormal service system according to the number of the abnormal service systems in the service system and the system abnormal probabilities of different abnormal service systems, judging whether the abnormal probability of the comprehensive system of the abnormal service system does not meet the requirement, if so, determining the construction period of the data section of the service system according to the abnormal probability of the comprehensive system of the abnormal service system and the basic construction period, and if not, entering the next step;
S45, determining the abnormal probability of the comprehensive system of the service system according to the abnormal probability of the comprehensive system of the abnormal service system, the number of the service systems and the average value of the abnormal probabilities of the systems of different service systems, and determining the construction period of the data section of the service system by utilizing the abnormal probability of the comprehensive system of the service system and the basic construction period.
In another aspect, as shown in FIG. 6, the present invention provides a computer system comprising: a communicatively coupled memory and processor, and a computer program stored on the memory and capable of running on the processor, characterized by: the processor executes a data cross-section processing method as described above when running the computer program.
The data section processing method specifically comprises the following steps:
determining problem update records of service data through historical change data of the service data of different service systems, and determining abnormal update probabilities of different service data under different matching check rules based on the problem update records;
Determining an abnormal verification rule and a reliable verification rule of the service data based on abnormal update probabilities of different service data under different matching verification rules, determining data abnormal probabilities and abnormal update data of the service data by combining the abnormal update probabilities of different service data under different matching verification rules, and determining a construction period of a data section of the abnormal update data based on the data abnormal probabilities;
Determining the comprehensive abnormal update probability of the abnormal update data of the service system according to the abnormal update probability of different abnormal update data and the number of abnormal update data in the service system, and determining the comprehensive abnormal update probability of the stable update data of the service system based on the abnormal update probability of different stable update data and the number of stable update data in the service system;
Determining an average value of abnormal update frequencies of different business data of the business system based on the abnormal update probabilities of the different business data of the business system, and determining the system abnormal probability of the business system by combining the number of the business data of the business system, the comprehensive abnormal update probability of the abnormal update data and the comprehensive abnormal update probability of the stable update data;
Determining whether the service system belongs to an abnormal service system according to the system abnormal probability of the service system, and determining the construction period of a data section of the abnormal service system based on the system abnormal probability;
Dividing the service systems into different service system groups based on system frames of different service systems, determining the similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in the different service system groups, and determining the construction difficulty and the basic construction period of the data sections of the service systems through the similarity;
When a service system with the abnormal probability of the system not meeting the requirement does not exist, taking the basic construction period as the construction period of the data section of the service system;
when a service system with the abnormal probability of the system not meeting the requirement exists, when the number of abnormal service systems in the service system is smaller than the number of preset systems, the basic construction period is used as the construction period of the data section of the service system;
When the number of abnormal service systems in the service system is not less than the number of preset systems, determining the comprehensive system abnormal probability of the abnormal service system according to the number of the abnormal service systems in the service system and the system abnormal probabilities of different abnormal service systems, and when the comprehensive system abnormal probability of the abnormal service system does not meet the requirement, determining the construction period of the data section of the service system according to the comprehensive system abnormal probability of the abnormal service system and the basic construction period;
When the comprehensive system abnormality probability of the abnormal service system meets the requirement, determining the comprehensive system abnormality probability of the service system according to the comprehensive system abnormality probability of the abnormal service system, the number of the service systems and the average value of the system abnormality probabilities of different service systems, and determining the construction period of the data section of the service system by utilizing the comprehensive system abnormality probability of the service system and the basic construction period.
Through the above embodiments, the present invention has the following beneficial effects:
The invention has the beneficial effects that:
1. The abnormal update probability of different business data under different matching check rules is determined based on the problem update records, so that the accurate evaluation of the problem conditions of different business data under the corresponding different matching check rules from the problem update conditions under the historical update times is realized, and meanwhile, a foundation is laid for the determination of the construction period of the differentiated data section of the business data with larger problems through screening the matching check rules corresponding to the business data with larger abnormal update probability.
2. By determining the data anomaly probability and anomaly update data of the service data, the probability of anomaly occurrence of the service data from multiple angles is accurately estimated, and meanwhile, the differential estimation of the construction period of the data section of the service data with high anomaly occurrence probability is ensured by screening the anomaly update data, so that the verification processing efficiency and the reliability of the service data are ensured.
3. The construction period of the data section of the service system is determined by integrating the anomaly probability and the similarity of the system frames among different service systems, so that the difference of the anomaly probability of the service data of the service system is considered, meanwhile, the construction difficulty of the data section is accurately evaluated by considering the similarity of the system frames, the reliability of the service data of the service system with the anomaly probability is ensured, and meanwhile, the problem of high system pressure caused by the fact that the construction of the data section is frequently carried out due to high construction difficulty is solved.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, devices, non-volatile computer storage medium embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the section of the method embodiments being relevant.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The foregoing is merely one or more embodiments of the present description and is not intended to limit the present description. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of one or more embodiments of the present description, is intended to be included within the scope of the claims of the present description.
Claims (9)
1. The data cross section processing method is characterized by comprising the following steps of:
determining problem update records of service data through historical change data of the service data of different service systems, and determining abnormal update probabilities of different service data under different matching check rules based on the problem update records;
Determining an abnormal verification rule and a reliable verification rule of the service data based on abnormal update probabilities of different service data under different matching verification rules, determining data abnormal probabilities and abnormal update data of the service data by combining the abnormal update probabilities of different service data under different matching verification rules, and determining a construction period of a data section of the abnormal update data based on the data abnormal probabilities;
determining abnormal update data and stable update data in different service systems, determining system abnormal probability of different service systems and abnormal service systems by combining the abnormal update probability of different service data in different service systems, and determining the construction period of a data section of the abnormal service system based on the system abnormal probability;
Determining comprehensive abnormal probability according to abnormal service systems in different service systems and system abnormal probability of different service systems, and determining a construction period of a data section of the service system by combining similarity of system frames among different service systems;
the method for determining the construction period of the data section of the service system comprises the following steps:
Dividing the service systems into different service system groups based on system frames of different service systems, determining the similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in the different service system groups, and determining the construction difficulty and the basic construction period of the data sections of the service systems through the similarity;
Determining the comprehensive system anomaly probability of the service system by the number of the anomaly service systems in the service system, the system anomaly probabilities of different anomaly service systems and the average value of the system anomaly probabilities of different service systems;
And determining the construction period of the data section of the service system by utilizing the comprehensive system anomaly probability and the basic construction period of the service system.
2. The data cross-section processing method of claim 1, wherein the business system comprises a login system, a credit giving system, a branch system, a repayment interface, and a credit application system.
3. The data cross-section processing method as claimed in claim 1, wherein the problem update record includes a number of problem updates of the service data among different historical update times and an update result at the different number of problem updates.
4. The data cross-section processing method as claimed in claim 1, wherein the method for determining the abnormal update probability of the service data under different matching check rules is as follows:
Determining the problem update times of the service data under different matching check rules based on update results under different problem update times, and taking the problem update times of the service data under different matching check rules as the matching problem times under the matching check rules;
judging whether the number of times of the matching problem under the matching check rule is larger than a preset number of times, if so, determining the abnormal updating probability of the service data under the matching check rule according to the number of times of the matching problem, if not, entering the next step;
Determining the number of times of matching problems with the interval less than the preset number of times according to the number of times of matching problems under different matching check rules, determining a problem frequency evaluation quantity according to the number of times of matching problems with the interval less than the preset number of times of matching problems and the average number of times of updating the interval less than the preset number of times of matching problems, judging whether the problem frequency evaluation quantity meets the requirement, if yes, determining the abnormal updating probability of the service data under the matching check rules according to the problem frequency evaluation quantity, and if no, entering the next step;
and determining abnormal update probability of the business data under different matching check rules according to the matching problem times under different matching check rules and the update times of average intervals among different matching problem times by using the problem frequency evaluation.
5. The data cross-section processing method of claim 1, wherein the match check rule includes a 0-1 check, a range check, a data type check, a length constraint check, and a logical relationship check.
6. The data cross-section processing method as claimed in claim 1, wherein the method for determining the anomaly check rule and the reliable check rule of the service data is as follows:
And when the abnormal updating probability of the service data under the matching check rule is greater than the set abnormal probability, determining that the matching check rule is an abnormal check rule, and when the abnormal updating probability of the service data under the matching check rule is within a preset abnormal probability interval, determining that the matching check rule is a reliable check rule.
7. The data cross-section processing method of claim 1, wherein the method for determining the data anomaly probability of the service data is:
determining basic anomaly probability of the service data based on the problem update times of the service data;
Determining the rule abnormality probability of the service data according to the abnormality update probabilities of the service data under different matching check rules, the matching check rules with the problem update times, the abnormality update probabilities of the abnormality check rules and the abnormality update probabilities of the reliable check rules;
and determining the data anomaly probability of the service data based on the rule anomaly probability and the basic anomaly probability.
8. The data cross-section processing method of claim 1, wherein determining a construction period of a data cross section of the abnormality update data based on the data abnormality probability, specifically comprises:
Determining a preset abnormal probability interval corresponding to the data abnormal probability of the abnormal update data based on the data abnormal probability, and determining the construction period of the data section of the abnormal update probability through the preset abnormal probability interval corresponding to the data abnormal probability.
9. A computer system, comprising: a communicatively coupled memory and processor, and a computer program stored on the memory and capable of running on the processor, characterized by: the processor, when running the computer program, performs a data cross-section processing method as claimed in any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410066973.9A CN117591530B (en) | 2024-01-17 | 2024-01-17 | Data cross section processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410066973.9A CN117591530B (en) | 2024-01-17 | 2024-01-17 | Data cross section processing method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117591530A CN117591530A (en) | 2024-02-23 |
CN117591530B true CN117591530B (en) | 2024-04-19 |
Family
ID=89913636
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410066973.9A Active CN117591530B (en) | 2024-01-17 | 2024-01-17 | Data cross section processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117591530B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117808578B (en) * | 2024-03-01 | 2024-07-26 | 杭银消费金融股份有限公司 | Intelligent pedestrian credit information data analysis method and system |
Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446546A (en) * | 2018-03-20 | 2018-08-24 | 深信服科技股份有限公司 | Abnormal access detection method, device, equipment and computer readable storage medium |
CN109460432A (en) * | 2018-11-14 | 2019-03-12 | 腾讯科技(深圳)有限公司 | A kind of data processing method and system |
CN111553576A (en) * | 2020-04-20 | 2020-08-18 | 国电南瑞科技股份有限公司 | Data verification method, device and system suitable for electric power spot market |
CN112381773A (en) * | 2020-11-05 | 2021-02-19 | 东风柳州汽车有限公司 | Key cross section data analysis method, device, equipment and storage medium |
CN112395325A (en) * | 2020-11-27 | 2021-02-23 | 广州光点信息科技有限公司 | Data management method, system, terminal equipment and storage medium |
CN112486891A (en) * | 2020-11-30 | 2021-03-12 | 无锡职业技术学院 | Automatic checking device, system and method for supply chain business document |
CN112668944A (en) * | 2021-01-26 | 2021-04-16 | 天元大数据信用管理有限公司 | Enterprise wind control method, device, equipment and medium based on big data credit investigation |
KR20220040023A (en) * | 2020-09-23 | 2022-03-30 | 오스템임플란트 주식회사 | Method, device and computer program stored in recording medium for displaying teeth |
WO2022068645A1 (en) * | 2020-09-30 | 2022-04-07 | 深圳前海微众银行股份有限公司 | Database fault discovery method, apparatus, electronic device, and storage medium |
CN114691443A (en) * | 2020-12-25 | 2022-07-01 | 苏州国双软件有限公司 | Cross section data sending method and device, electronic equipment and storage medium |
CN115237996A (en) * | 2022-08-01 | 2022-10-25 | 数预智能科技(上海)有限公司杭州分公司 | Mining method for distribution rule and outlier of cross-section data |
CN116611797A (en) * | 2023-07-20 | 2023-08-18 | 杭银消费金融股份有限公司 | Service tracking and monitoring method, system and storage medium |
CN116663978A (en) * | 2023-05-22 | 2023-08-29 | 厦门美亚亿安信息科技有限公司 | Quality assessment method and system for audit data |
CN116743501A (en) * | 2023-08-10 | 2023-09-12 | 杭银消费金融股份有限公司 | Abnormal flow control method and system |
CN116821848A (en) * | 2023-06-27 | 2023-09-29 | 杭银消费金融股份有限公司 | Accounting abnormal data periodic detection method and system based on artificial intelligence |
CN116883184A (en) * | 2023-07-12 | 2023-10-13 | 江苏知链科技有限公司 | Financial tax intelligent analysis method based on big data |
CN116933189A (en) * | 2022-04-07 | 2023-10-24 | 北京沃东天骏信息技术有限公司 | Data detection method and device |
CN117009204A (en) * | 2023-08-30 | 2023-11-07 | 杭银消费金融股份有限公司 | Service call tracking-based health evaluation system of credit giving system |
CN117149797A (en) * | 2023-10-27 | 2023-12-01 | 杭银消费金融股份有限公司 | Accounting method and system based on multidimensional data monitoring |
CN117390392A (en) * | 2023-10-19 | 2024-01-12 | 呼伦贝尔安泰热电有限责任公司海拉尔热电厂 | Building abnormal heat utilization probability identification method, system and storage medium |
CN117405971A (en) * | 2023-10-09 | 2024-01-16 | 国网河南电力公司营销服务中心 | Power acquisition digitization method based on flow automation |
-
2024
- 2024-01-17 CN CN202410066973.9A patent/CN117591530B/en active Active
Patent Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446546A (en) * | 2018-03-20 | 2018-08-24 | 深信服科技股份有限公司 | Abnormal access detection method, device, equipment and computer readable storage medium |
CN109460432A (en) * | 2018-11-14 | 2019-03-12 | 腾讯科技(深圳)有限公司 | A kind of data processing method and system |
CN111553576A (en) * | 2020-04-20 | 2020-08-18 | 国电南瑞科技股份有限公司 | Data verification method, device and system suitable for electric power spot market |
KR20220040023A (en) * | 2020-09-23 | 2022-03-30 | 오스템임플란트 주식회사 | Method, device and computer program stored in recording medium for displaying teeth |
WO2022068645A1 (en) * | 2020-09-30 | 2022-04-07 | 深圳前海微众银行股份有限公司 | Database fault discovery method, apparatus, electronic device, and storage medium |
CN112381773A (en) * | 2020-11-05 | 2021-02-19 | 东风柳州汽车有限公司 | Key cross section data analysis method, device, equipment and storage medium |
CN112395325A (en) * | 2020-11-27 | 2021-02-23 | 广州光点信息科技有限公司 | Data management method, system, terminal equipment and storage medium |
CN112486891A (en) * | 2020-11-30 | 2021-03-12 | 无锡职业技术学院 | Automatic checking device, system and method for supply chain business document |
CN114691443A (en) * | 2020-12-25 | 2022-07-01 | 苏州国双软件有限公司 | Cross section data sending method and device, electronic equipment and storage medium |
CN112668944A (en) * | 2021-01-26 | 2021-04-16 | 天元大数据信用管理有限公司 | Enterprise wind control method, device, equipment and medium based on big data credit investigation |
CN116933189A (en) * | 2022-04-07 | 2023-10-24 | 北京沃东天骏信息技术有限公司 | Data detection method and device |
CN115237996A (en) * | 2022-08-01 | 2022-10-25 | 数预智能科技(上海)有限公司杭州分公司 | Mining method for distribution rule and outlier of cross-section data |
CN116663978A (en) * | 2023-05-22 | 2023-08-29 | 厦门美亚亿安信息科技有限公司 | Quality assessment method and system for audit data |
CN116821848A (en) * | 2023-06-27 | 2023-09-29 | 杭银消费金融股份有限公司 | Accounting abnormal data periodic detection method and system based on artificial intelligence |
CN116883184A (en) * | 2023-07-12 | 2023-10-13 | 江苏知链科技有限公司 | Financial tax intelligent analysis method based on big data |
CN116611797A (en) * | 2023-07-20 | 2023-08-18 | 杭银消费金融股份有限公司 | Service tracking and monitoring method, system and storage medium |
CN116743501A (en) * | 2023-08-10 | 2023-09-12 | 杭银消费金融股份有限公司 | Abnormal flow control method and system |
CN117009204A (en) * | 2023-08-30 | 2023-11-07 | 杭银消费金融股份有限公司 | Service call tracking-based health evaluation system of credit giving system |
CN117405971A (en) * | 2023-10-09 | 2024-01-16 | 国网河南电力公司营销服务中心 | Power acquisition digitization method based on flow automation |
CN117390392A (en) * | 2023-10-19 | 2024-01-12 | 呼伦贝尔安泰热电有限责任公司海拉尔热电厂 | Building abnormal heat utilization probability identification method, system and storage medium |
CN117149797A (en) * | 2023-10-27 | 2023-12-01 | 杭银消费金融股份有限公司 | Accounting method and system based on multidimensional data monitoring |
Non-Patent Citations (2)
Title |
---|
公路运输统计指标数据质量评估方法研究;夏晶;秦芬芬;;交通运输研究;20180205(第06期);全文 * |
夏晶 ; 秦芬芬 ; .公路运输统计指标数据质量评估方法研究.交通运输研究.2018,(第06期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN117591530A (en) | 2024-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117591530B (en) | Data cross section processing method and system | |
US9298538B2 (en) | Methods and systems for abnormality analysis of streamed log data | |
CN110852878B (en) | Credibility determination method, device, equipment and storage medium | |
CN109961165B (en) | Method, device, equipment and storage medium for predicting part quantity | |
US11704220B2 (en) | Machine learning based data monitoring | |
CN117149797B (en) | Accounting method and system based on multidimensional data monitoring | |
CN111585799A (en) | Network fault prediction model establishing method and device | |
US12105687B2 (en) | Systems and methods for automated data quality semantic constraint identification using rich data type inferences | |
CN112257974A (en) | Gas lock well risk prediction model data set, model training method and application | |
CN110674100A (en) | User demand prediction method and framework based on full-channel operation data | |
CN117829994A (en) | Money laundering risk analysis method based on graph calculation | |
CN110703183A (en) | Intelligent electric energy meter fault data analysis method and system | |
CN111062600A (en) | Model evaluation method, system, electronic device, and computer-readable storage medium | |
Dioputra et al. | Academic information system management to improve service quality to students during the covid-19 pandemicat universities in jambi province | |
CN116992602B (en) | Reliability fuzzy evaluation method based on failure state characterization, electronic equipment and storage medium | |
CN116049157B (en) | Quality data analysis method and system | |
CN116739605A (en) | Transaction data detection method, device, equipment and storage medium | |
Nelson | Rebooting simulation | |
CN114722081B (en) | Streaming data time sequence transmission method and system based on transfer library mode | |
Raj et al. | On the Impact of ML use cases on Industrial Data Pipelines | |
CN115147029A (en) | Enterprise activity monitoring method and system based on big data | |
Grambau et al. | Reference Architecture framework for enhanced social media data analytics for Predictive Maintenance models | |
Nekipelov et al. | Moment forests | |
CN118409830B (en) | Database transaction management method, terminal and storage medium | |
CN115860488A (en) | AI industrial assistant process rule range statistical method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |