CN117591530A - Data cross section processing method and system - Google Patents

Data cross section processing method and system Download PDF

Info

Publication number
CN117591530A
CN117591530A CN202410066973.9A CN202410066973A CN117591530A CN 117591530 A CN117591530 A CN 117591530A CN 202410066973 A CN202410066973 A CN 202410066973A CN 117591530 A CN117591530 A CN 117591530A
Authority
CN
China
Prior art keywords
data
abnormal
service
determining
update
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410066973.9A
Other languages
Chinese (zh)
Other versions
CN117591530B (en
Inventor
石杰
廖家林
陶嘉驹
陈煜�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangyin Consumer Finance Co ltd
Original Assignee
Hangyin Consumer Finance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangyin Consumer Finance Co ltd filed Critical Hangyin Consumer Finance Co ltd
Priority to CN202410066973.9A priority Critical patent/CN117591530B/en
Publication of CN117591530A publication Critical patent/CN117591530A/en
Application granted granted Critical
Publication of CN117591530B publication Critical patent/CN117591530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a data cross section processing method and a system, which belong to the technical field of data processing, and specifically comprise the following steps: the abnormal update probability of different business data under different matching check rules is determined based on the problem update records, the abnormal update data and stable update data in different business systems are determined, the system abnormal probability of different business systems and the abnormal business systems are determined by combining the abnormal update probability of different business data in different business systems, the determination of the comprehensive abnormal probability is carried out according to the abnormal business systems in different business systems and the system abnormal probability of different business systems, and the determination of the construction period of the data section of the business system is carried out by combining the similarity of the system frames among different business systems, so that the construction of the data section and the problem data identification efficiency are improved.

Description

Data cross section processing method and system
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a data cross section processing method and system.
Background
The cross-section data (cross-section data) refers to the observation value of a whole group (or all) of individuals reflected on a data section at the same time (time period or time point), and due to the fact that the program languages and software structures of different application systems inside enterprises have a certain degree of difference, the problem that updating is not timely or data processing errors inevitably exist in the data transmission and processing process is important to ensure the reliability and accuracy of the data through the construction of the data section.
In the prior art, cross section data of different data cross sections are collected and analyzed to realize verification processing of the cross section data, so that the reliability of the data of an enterprise is ensured, and the following technical problems are solved by analyzing and verifying the cross section data in the invention patent CN202210913338.0, namely, a mining method for distribution rules and outliers of the cross section data, CN202011221755.6, namely, a key cross section data analysis method, a device, equipment and a storage medium, respectively:
for large enterprises, for example, in consumer finance enterprises, because the number of service systems is huge, and meanwhile, the data volume of interactive data of data interaction among service modules among different service systems is also large, if the determination of sampling analysis objects and sampling analysis periods of differentiated data sections cannot be carried out according to the historical verification results of the data, the real-time performance of verification analysis of section data of the data sections cannot be ensured, and meanwhile, the reliability of the data of the enterprises cannot be ensured.
Aiming at the technical problems, the invention provides a data section processing method and a data section processing system.
Disclosure of Invention
In order to achieve the purpose of the invention, the invention adopts the following technical scheme:
according to one aspect of the present invention, a data cross-section processing method is provided.
The data cross section processing method is characterized by comprising the following steps of:
s1, determining problem update records of service data through historical change data of the service data of different service systems, and determining abnormal update probabilities of different service data under different matching check rules based on the problem update records;
s2, determining an abnormal verification rule and a reliable verification rule of the service data based on abnormal update probabilities of different service data under different matching verification rules, determining data abnormal probabilities and abnormal update data of the service data by combining the abnormal update probabilities of different service data under different matching verification rules, and determining a construction period of a data section of the abnormal update data based on the data abnormal probabilities;
s3, determining abnormal update data and stable update data in different service systems, determining system abnormal probability of different service systems and abnormal service systems by combining the abnormal update probability of different service data in different service systems, and determining the construction period of a data section of the abnormal service system based on the system abnormal probability;
s4, determining the comprehensive abnormal probability according to abnormal service systems in different service systems and the system abnormal probability of different service systems, and determining the construction period of the data section of the service system by combining the similarity of the system frames among different service systems.
The invention has the beneficial effects that:
1. the abnormal update probability of different business data under different matching check rules is determined based on the problem update records, so that the accurate evaluation of the problem conditions of different business data under the corresponding different matching check rules from the problem update conditions under the historical update times is realized, and meanwhile, a foundation is laid for the determination of the construction period of the differentiated data section of the business data with larger problems through the screening of the matching check rules corresponding to the business data with larger abnormal update probability;
2. by determining the data anomaly probability and anomaly update data of the service data, the probability of anomaly occurrence of the service data from multiple angles is accurately estimated, meanwhile, the differential estimation of the construction period of the data section of the service data with larger probability of anomaly occurrence is ensured by screening the anomaly update data, and further the verification processing efficiency and the reliability of the service data are ensured;
3. the construction period of the data section of the service system is determined by integrating the anomaly probability and the similarity of the system frames among different service systems, so that the difference of the anomaly probability of the service data of the service system is considered, meanwhile, the construction difficulty of the data section is accurately evaluated by considering the similarity of the system frames, the reliability of the service data of the service system with the anomaly probability is ensured, and meanwhile, the problem of high system pressure caused by the fact that the construction of the data section is frequently carried out due to high construction difficulty is solved.
The service system comprises a login system, a credit granting system, a branch system, a repayment interface and a credit application system.
The further technical scheme is that the problem update record comprises the problem update times of the service data in different historical update times and update results under different problem update times.
The further technical scheme is that the matching check rule comprises 0-1 check, range check, data type check, length limit check and logic relation check.
The further technical scheme is that the method for determining the abnormal check rule and the reliable check rule of the service data comprises the following steps:
and when the abnormal updating probability of the service data under the matching check rule is greater than the set abnormal probability, determining that the matching check rule is an abnormal check rule, and when the abnormal updating probability of the service data under the matching check rule is within a preset abnormal probability interval, determining that the matching check rule is a reliable check rule.
The further technical scheme is that the method for determining the stable update data comprises the following steps:
and when the data anomaly probability of the service data in the service system is smaller than a preset anomaly probability threshold value, determining that the service data is stable update data.
In a second aspect, the present invention provides a computer system comprising: a communicatively coupled memory and processor, and a computer program stored on the memory and capable of running on the processor, characterized by: the processor executes a data cross-section processing method as described above when running the computer program.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention as set forth hereinafter.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings;
FIG. 1 is a flow chart of a method of data cross-section processing;
FIG. 2 is a flow chart of a method of determining abnormal update probabilities for traffic data under different match check rules;
FIG. 3 is a flow chart of a method of determining a probability of data anomalies for traffic data;
FIG. 4 is a flow chart of a method of determination of an abnormal business system;
FIG. 5 is a flow chart of a method of determining a build cycle of a data section of a business system;
FIG. 6 is a block diagram of a computer system.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present disclosure.
Noun interpretation:
cross-section data refers to an observation value reflecting a whole set of (or all) individuals on a data section at the same time (period or time point).
Technical problems:
for the business systems of the credit processing mechanism, when the processing of data is carried out by the login system, the credit processing system, the branch system, the repayment interface and the credit application system, the problems of untimely data updating or wrong updating processing may exist, so if different business data and the data section of the business system cannot be constructed according to the abnormal conditions in the data updating processing, the verification processing of the business data cannot be accurately realized, and the reliability of the business data cannot be ensured.
In order to solve the technical problems, the following technical means are adopted:
firstly, determining abnormal update probabilities of different business data under different matching check rules based on a problem update record, and particularly determining the abnormal update probabilities under different matching check rules through the ratio of the problem update times to the update times under different matching check rules;
determining an abnormal verification rule and a reliable verification rule of the service data based on abnormal update probabilities of different service data under different matching verification rules, determining data abnormal probabilities and abnormal update data of the service data according to the abnormal verification rule and the reliable verification rule of the service data and the abnormal update probabilities of different service data under different matching verification rules, specifically determining the construction period of a data section of the abnormal update data based on the normalized quantity sum of the products of the duty ratio of the abnormal verification rule and the abnormal update probabilities and the duty ratio of the reliable verification rule and the abnormal update probability;
determining system anomaly probabilities of different service systems and anomaly service systems according to anomaly update data and stable update data in different service systems and anomaly update probabilities of different service data, wherein the system anomaly probabilities can be determined specifically through the normalized quantity sum of products of the duty ratio of the anomaly update data and the anomaly update probabilities, the duty ratio of the stable update data and the anomaly update probabilities, and the construction period of a data section of the anomaly service system is determined based on the system anomaly probabilities;
finally, determining comprehensive abnormal probability according to abnormal service systems in different service systems and system abnormal probability of different service systems, determining construction period of data sections of the service systems by combining similarity of system frames among different service systems, particularly dividing the service systems into different service system groups through the system frames of different service systems, determining similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in different service system groups, and determining construction difficulty and basic construction period of the data sections of the service systems through the similarity; determining the comprehensive system anomaly probability of the service system by the number of the anomaly service systems in the service system, the system anomaly probabilities of different anomaly service systems and the average value of the system anomaly probabilities of different service systems; and determining the construction period of the data section of the service system by utilizing the comprehensive system anomaly probability and the basic construction period of the service system.
The data section is constructed in the following way:
1. screening business system and database table objects
2. Determining coarse-grained data time points
3. Variable dimension-increasing processing of data object
4. Accurate time point comparison based on data time points in dimension-increasing information
5. Setting a flag bit and cutting up a lifting dimension entity
6. And (3) reducing the dimension of the data entity output in the fifth step and outputting final section data.
Further explanation will be made below from two perspectives of the method class embodiment and the system class embodiment.
In order to solve the above-mentioned problems, according to an aspect of the present invention, as shown in fig. 1, there is provided a data cross-section processing method, which is characterized by specifically comprising:
s1, determining problem update records of service data through historical change data of the service data of different service systems, and determining abnormal update probabilities of different service data under different matching check rules based on the problem update records;
specifically, the business system comprises a login system, a credit granting system, a branch system, a repayment interface and a credit application system.
Further, the problem update record includes the number of problem updates in different historical update times and the update result under different problem update times of the service data.
In one possible embodiment, as shown in fig. 2, the method for determining the abnormal update probability of the service data in the step S1 under different matching check rules is as follows:
s11, determining the problem update times of the service data under different matching check rules based on update results under different problem update times, and taking the problem update times of the service data under different matching check rules as the matching problem times under the matching check rules;
s12, determining abnormal update probability of the service data under different matching check rules according to the number of matching problems under different matching check rules, the number of updating average intervals among the different matching problems and the number of matching problems with the interval number of updating less than a preset number.
In another possible embodiment, the method for determining the abnormal update probability of the service data in the step S1 under different matching check rules is as follows:
determining the problem update times of the service data under different matching check rules based on update results under different problem update times, and taking the problem update times of the service data under different matching check rules as the matching problem times under the matching check rules;
when the number of times of the matching problem under the matching check rule is larger than a preset number of times, determining abnormal updating probability of the service data under the matching check rule according to the number of times of the matching problem;
when the number of matching questions under the matching check rule is not greater than a preset number,
determining that the number of updating intervals is smaller than the number of matching problems under different matching check rules, and determining the abnormal updating probability of the service data under the matching check rules according to the number of matching problems with the number of updating intervals smaller than the preset number of matching problems with the number of updating intervals larger than the preset number of matching problems;
when the number of the update times of the interval is smaller than the number of the match questions of the preset times and is not larger than the number of the match questions of the preset times, determining a question frequency evaluation amount by the number of the update times of the interval is smaller than the number of the match questions of the preset times and the number of the update times of the average interval of the match questions of the interval is smaller than the number of the preset times, and when the question frequency evaluation amount does not meet the requirement, determining abnormal update probability of the service data under the match check rule by the question frequency evaluation amount;
and when the problem frequency evaluation value meets the requirement, determining abnormal update probability of the service data under different matching check rules according to the matching problem times under different matching check rules and the update times of the average interval between the different matching problem times.
In another possible embodiment, the method for determining the abnormal update probability of the service data in the step S1 under different matching check rules is as follows:
determining the problem update times of the service data under different matching check rules based on update results under different problem update times, and taking the problem update times of the service data under different matching check rules as the matching problem times under the matching check rules;
judging whether the number of times of the matching problem under the matching check rule is larger than a preset number of times, if so, determining the abnormal updating probability of the service data under the matching check rule according to the number of times of the matching problem, if not, entering the next step;
determining the number of times of matching problems with the interval less than the preset number of times according to the number of times of matching problems under different matching check rules, determining a problem frequency evaluation quantity according to the number of times of matching problems with the interval less than the preset number of times of matching problems and the average number of times of updating the interval less than the preset number of times of matching problems, judging whether the problem frequency evaluation quantity meets the requirement, if yes, determining the abnormal updating probability of the service data under the matching check rules according to the problem frequency evaluation quantity, and if no, entering the next step;
and determining abnormal update probability of the business data under different matching check rules according to the matching problem times under different matching check rules and the update times of average intervals among different matching problem times by using the problem frequency evaluation.
Specifically, the matching check rule comprises 0-1 check, range check, data type check, length limit check and logic relation check.
S2, determining an abnormal verification rule and a reliable verification rule of the service data based on abnormal update probabilities of different service data under different matching verification rules, determining data abnormal probabilities and abnormal update data of the service data by combining the abnormal update probabilities of different service data under different matching verification rules, and determining a construction period of a data section of the abnormal update data based on the data abnormal probabilities;
it can be understood that the method for determining the anomaly check rule and the reliable check rule of the service data is as follows:
and when the abnormal updating probability of the service data under the matching check rule is greater than the set abnormal probability, determining that the matching check rule is an abnormal check rule, and when the abnormal updating probability of the service data under the matching check rule is within a preset abnormal probability interval, determining that the matching check rule is a reliable check rule.
In one possible embodiment, as shown in fig. 3, the method for determining the data anomaly probability of the service data in the step S2 is as follows:
s21, determining basic anomaly probability of the service data based on the problem update times of the service data;
s22, determining the rule anomaly probability of the service data according to the anomaly update probability of the service data under different matching check rules, the matching check rules with the problem update times, the anomaly update probability of the anomaly check rules and the anomaly update probability of the reliable check rules;
s23, determining the data anomaly probability of the service data based on the rule anomaly probability and the basic anomaly probability.
Further, when the data anomaly probability of the service data does not meet the requirement, determining that the service data is anomaly update data.
In another possible embodiment, the method for determining the data anomaly probability of the service data in the step S2 is as follows:
s21, determining basic anomaly probability of the service data based on the problem update times of the service data, the update times of the average interval between different problem update times and the problem update times of which the interval update times are smaller than the preset times;
s22, judging whether the number of the abnormal check rules is smaller than the number of preset check rules, if yes, taking the basic abnormal probability of the service data as the data abnormal probability of the service data, and if no, entering the next step;
s23, determining the comprehensive abnormal update probability of the abnormal check rules according to the number of the abnormal check rules and the abnormal update probabilities of different abnormal check rules, judging whether the comprehensive abnormal update probability of the abnormal check rules is smaller than a preset probability threshold, if so, taking the basic abnormal probability of the business data as the data abnormal probability of the business data, and if not, entering the next step;
s24, determining rule abnormality probability of the service data according to the abnormality update probability of the service data under different matching check rules and matching check rules with problem update times, the number of reliable abnormality rules, the abnormality update probability of the reliable check rules and the comprehensive abnormality update probability of the abnormality check rules, and determining data abnormality probability of the service data based on the rule abnormality probability and the basic abnormality probability.
It can be appreciated that determining the construction period of the data section of the abnormal update data based on the data abnormality probability specifically includes:
determining a preset abnormal probability interval corresponding to the data abnormal probability of the abnormal update data based on the data abnormal probability, and determining the construction period of the data section of the abnormal update probability through the preset abnormal probability interval corresponding to the data abnormal probability.
S3, determining abnormal update data and stable update data in different service systems, determining system abnormal probability of different service systems and abnormal service systems by combining the abnormal update probability of different service data in different service systems, and determining the construction period of a data section of the abnormal service system based on the system abnormal probability;
it should be noted that, the method for determining the stable update data includes:
and when the data anomaly probability of the service data in the service system is smaller than a preset anomaly probability threshold value, determining that the service data is stable update data.
In one possible embodiment, as shown in fig. 4, the method for determining the abnormal service system in the step S3 is as follows:
determining the comprehensive abnormal update probability of the abnormal update data of the service system according to the abnormal update probability of different abnormal update data and the number of abnormal update data in the service system, and determining the comprehensive abnormal update probability of the stable update data of the service system based on the abnormal update probability of different stable update data and the number of stable update data in the service system;
determining an average value of abnormal update frequencies of different business data of the business system based on the abnormal update probabilities of the different business data of the business system, and determining the system abnormal probability of the business system by combining the number of the business data of the business system, the comprehensive abnormal update probability of the abnormal update data and the comprehensive abnormal update probability of the stable update data;
and determining whether the service system belongs to an abnormal service system according to the system abnormality probability of the service system.
S4, determining the comprehensive abnormal probability according to abnormal service systems in different service systems and the system abnormal probability of different service systems, and determining the construction period of the data section of the service system by combining the similarity of the system frames among different service systems.
In one possible embodiment, as shown in fig. 5, the method for determining the construction period of the data section of the service system in the step S4 is as follows:
s41, dividing the service systems into different service system groups based on system frames of different service systems, determining the similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in the different service system groups, and determining the construction difficulty and the basic construction period of the data sections of the service systems through the similarity;
s42, determining the comprehensive system abnormality probability of the service system through the number of abnormal service systems in the service system, the system abnormality probabilities of different abnormal service systems and the average value of the system abnormality probabilities of different service systems;
s43, determining the construction period of the data section of the service system by using the comprehensive system anomaly probability of the service system and the basic construction period.
In another possible embodiment, the method for determining the construction period of the data section of the service system in the step S4 is as follows:
dividing the service systems into different service system groups based on system frames of different service systems, determining the similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in the different service system groups, and determining the construction difficulty and the basic construction period of the data sections of the service systems through the similarity;
when a service system with the abnormal probability of the system not meeting the requirement does not exist, taking the basic construction period as the construction period of the data section of the service system;
when a service system with the abnormal probability of the system not meeting the requirement exists, when the number of abnormal service systems in the service system is smaller than the number of preset systems, the basic construction period is used as the construction period of the data section of the service system;
when the number of abnormal service systems in the service system is not less than the number of preset systems, determining the comprehensive system abnormal probability of the abnormal service system according to the number of the abnormal service systems in the service system and the system abnormal probabilities of different abnormal service systems, and when the comprehensive system abnormal probability of the abnormal service system does not meet the requirement, determining the construction period of the data section of the service system according to the comprehensive system abnormal probability of the abnormal service system and the basic construction period;
when the comprehensive system abnormality probability of the abnormal service system meets the requirement, determining the comprehensive system abnormality probability of the service system according to the comprehensive system abnormality probability of the abnormal service system, the number of the service systems and the average value of the system abnormality probabilities of different service systems, and determining the construction period of the data section of the service system by utilizing the comprehensive system abnormality probability of the service system and the basic construction period.
In another possible embodiment, the method for determining the construction period of the data section of the service system in the step S4 is as follows:
s41, dividing the service systems into different service system groups based on system frames of different service systems, determining the similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in the different service system groups, and determining the construction difficulty and the basic construction period of the data sections of the service systems through the similarity;
s42, judging whether a service system with the system abnormality probability larger than the preset system abnormality probability exists, if so, entering a step S43, and if not, entering a next step;
s43, judging whether the number of abnormal service systems in the service systems is smaller than the number of preset systems, if so, taking the basic construction period as the construction period of the data section of the service system, and if not, entering the next step;
s44, determining the abnormal probability of the comprehensive system of the abnormal service system according to the number of the abnormal service systems in the service system and the system abnormal probabilities of different abnormal service systems, judging whether the abnormal probability of the comprehensive system of the abnormal service system does not meet the requirement, if so, determining the construction period of the data section of the service system according to the abnormal probability of the comprehensive system of the abnormal service system and the basic construction period, and if not, entering the next step;
s45, determining the abnormal probability of the comprehensive system of the service system according to the abnormal probability of the comprehensive system of the abnormal service system, the number of the service systems and the average value of the abnormal probabilities of the systems of different service systems, and determining the construction period of the data section of the service system by utilizing the abnormal probability of the comprehensive system of the service system and the basic construction period.
In another aspect, as shown in FIG. 6, the present invention provides a computer system comprising: a communicatively coupled memory and processor, and a computer program stored on the memory and capable of running on the processor, characterized by: the processor executes a data cross-section processing method as described above when running the computer program.
The data section processing method specifically comprises the following steps:
determining problem update records of service data through historical change data of the service data of different service systems, and determining abnormal update probabilities of different service data under different matching check rules based on the problem update records;
determining an abnormal verification rule and a reliable verification rule of the service data based on abnormal update probabilities of different service data under different matching verification rules, determining data abnormal probabilities and abnormal update data of the service data by combining the abnormal update probabilities of different service data under different matching verification rules, and determining a construction period of a data section of the abnormal update data based on the data abnormal probabilities;
determining the comprehensive abnormal update probability of the abnormal update data of the service system according to the abnormal update probability of different abnormal update data and the number of abnormal update data in the service system, and determining the comprehensive abnormal update probability of the stable update data of the service system based on the abnormal update probability of different stable update data and the number of stable update data in the service system;
determining an average value of abnormal update frequencies of different business data of the business system based on the abnormal update probabilities of the different business data of the business system, and determining the system abnormal probability of the business system by combining the number of the business data of the business system, the comprehensive abnormal update probability of the abnormal update data and the comprehensive abnormal update probability of the stable update data;
determining whether the service system belongs to an abnormal service system according to the system abnormal probability of the service system, and determining the construction period of a data section of the abnormal service system based on the system abnormal probability;
dividing the service systems into different service system groups based on system frames of different service systems, determining the similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in the different service system groups, and determining the construction difficulty and the basic construction period of the data sections of the service systems through the similarity;
when a service system with the abnormal probability of the system not meeting the requirement does not exist, taking the basic construction period as the construction period of the data section of the service system;
when a service system with the abnormal probability of the system not meeting the requirement exists, when the number of abnormal service systems in the service system is smaller than the number of preset systems, the basic construction period is used as the construction period of the data section of the service system;
when the number of abnormal service systems in the service system is not less than the number of preset systems, determining the comprehensive system abnormal probability of the abnormal service system according to the number of the abnormal service systems in the service system and the system abnormal probabilities of different abnormal service systems, and when the comprehensive system abnormal probability of the abnormal service system does not meet the requirement, determining the construction period of the data section of the service system according to the comprehensive system abnormal probability of the abnormal service system and the basic construction period;
when the comprehensive system abnormality probability of the abnormal service system meets the requirement, determining the comprehensive system abnormality probability of the service system according to the comprehensive system abnormality probability of the abnormal service system, the number of the service systems and the average value of the system abnormality probabilities of different service systems, and determining the construction period of the data section of the service system by utilizing the comprehensive system abnormality probability of the service system and the basic construction period.
Through the above embodiments, the present invention has the following beneficial effects:
the invention has the beneficial effects that:
1. the abnormal update probability of different business data under different matching check rules is determined based on the problem update records, so that the accurate evaluation of the problem conditions of different business data under the corresponding different matching check rules from the problem update conditions under the historical update times is realized, and meanwhile, a foundation is laid for the determination of the construction period of the differentiated data section of the business data with larger problems through screening the matching check rules corresponding to the business data with larger abnormal update probability.
2. By determining the data anomaly probability and anomaly update data of the service data, the probability of anomaly occurrence of the service data from multiple angles is accurately estimated, and meanwhile, the differential estimation of the construction period of the data section of the service data with high anomaly occurrence probability is ensured by screening the anomaly update data, so that the verification processing efficiency and the reliability of the service data are ensured.
3. The construction period of the data section of the service system is determined by integrating the anomaly probability and the similarity of the system frames among different service systems, so that the difference of the anomaly probability of the service data of the service system is considered, meanwhile, the construction difficulty of the data section is accurately evaluated by considering the similarity of the system frames, the reliability of the service data of the service system with the anomaly probability is ensured, and meanwhile, the problem of high system pressure caused by the fact that the construction of the data section is frequently carried out due to high construction difficulty is solved.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, devices, non-volatile computer storage medium embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the section of the method embodiments being relevant.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The foregoing is merely one or more embodiments of the present description and is not intended to limit the present description. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of one or more embodiments of the present description, is intended to be included within the scope of the claims of the present description.

Claims (10)

1. The data cross section processing method is characterized by comprising the following steps of:
determining problem update records of service data through historical change data of the service data of different service systems, and determining abnormal update probabilities of different service data under different matching check rules based on the problem update records;
determining an abnormal verification rule and a reliable verification rule of the service data based on abnormal update probabilities of different service data under different matching verification rules, determining data abnormal probabilities and abnormal update data of the service data by combining the abnormal update probabilities of different service data under different matching verification rules, and determining a construction period of a data section of the abnormal update data based on the data abnormal probabilities;
determining abnormal update data and stable update data in different service systems, determining system abnormal probability of different service systems and abnormal service systems by combining the abnormal update probability of different service data in different service systems, and determining the construction period of a data section of the abnormal service system based on the system abnormal probability;
and determining the comprehensive abnormal probability according to the abnormal service systems in different service systems and the system abnormal probability of different service systems, and determining the construction period of the data section of the service system by combining the similarity of the system frames among different service systems.
2. The data cross-section processing method of claim 1, wherein the business system comprises a login system, a credit giving system, a branch system, a repayment interface, and a credit application system.
3. The data cross-section processing method as claimed in claim 1, wherein the problem update record includes a number of problem updates of the service data among different historical update times and an update result at the different number of problem updates.
4. The data cross-section processing method as claimed in claim 1, wherein the method for determining the abnormal update probability of the service data under different matching check rules is as follows:
determining the problem update times of the service data under different matching check rules based on update results under different problem update times, and taking the problem update times of the service data under different matching check rules as the matching problem times under the matching check rules;
judging whether the number of times of the matching problem under the matching check rule is larger than a preset number of times, if so, determining the abnormal updating probability of the service data under the matching check rule according to the number of times of the matching problem, if not, entering the next step;
determining the number of times of matching problems with the interval less than the preset number of times according to the number of times of matching problems under different matching check rules, determining a problem frequency evaluation quantity according to the number of times of matching problems with the interval less than the preset number of times of matching problems and the average number of times of updating the interval less than the preset number of times of matching problems, judging whether the problem frequency evaluation quantity meets the requirement, if yes, determining the abnormal updating probability of the service data under the matching check rules according to the problem frequency evaluation quantity, and if no, entering the next step;
and determining abnormal update probability of the business data under different matching check rules according to the matching problem times under different matching check rules and the update times of average intervals among different matching problem times by using the problem frequency evaluation.
5. The data cross-section processing method of claim 1, wherein the match check rule includes a 0-1 check, a range check, a data type check, a length constraint check, and a logical relationship check.
6. The data cross-section processing method as claimed in claim 1, wherein the method for determining the anomaly check rule and the reliable check rule of the service data is as follows:
and when the abnormal updating probability of the service data under the matching check rule is greater than the set abnormal probability, determining that the matching check rule is an abnormal check rule, and when the abnormal updating probability of the service data under the matching check rule is within a preset abnormal probability interval, determining that the matching check rule is a reliable check rule.
7. The data cross-section processing method of claim 1, wherein the method for determining the data anomaly probability of the service data is:
determining basic anomaly probability of the service data based on the problem update times of the service data;
determining the rule abnormality probability of the service data according to the abnormality update probabilities of the service data under different matching check rules, the matching check rules with the problem update times, the abnormality update probabilities of the abnormality check rules and the abnormality update probabilities of the reliable check rules;
and determining the data anomaly probability of the service data based on the rule anomaly probability and the basic anomaly probability.
8. The data cross-section processing method of claim 1, wherein determining a construction period of a data cross section of the abnormality update data based on the data abnormality probability, specifically comprises:
determining a preset abnormal probability interval corresponding to the data abnormal probability of the abnormal update data based on the data abnormal probability, and determining the construction period of the data section of the abnormal update probability through the preset abnormal probability interval corresponding to the data abnormal probability.
9. The data section processing method according to claim 1, wherein the method for determining the construction period of the data section of the service system is as follows:
dividing the service systems into different service system groups based on system frames of different service systems, determining the similarity of frames among the service systems according to the number of the service system groups and the number of the service systems in the different service system groups, and determining the construction difficulty and the basic construction period of the data sections of the service systems through the similarity;
determining the comprehensive system anomaly probability of the service system by the number of the anomaly service systems in the service system, the system anomaly probabilities of different anomaly service systems and the average value of the system anomaly probabilities of different service systems;
and determining the construction period of the data section of the service system by utilizing the comprehensive system anomaly probability and the basic construction period of the service system.
10. A computer system, comprising: a communicatively coupled memory and processor, and a computer program stored on the memory and capable of running on the processor, characterized by: the processor, when running the computer program, performs a data cross-section processing method as claimed in any one of claims 1-9.
CN202410066973.9A 2024-01-17 2024-01-17 Data cross section processing method and system Active CN117591530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410066973.9A CN117591530B (en) 2024-01-17 2024-01-17 Data cross section processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410066973.9A CN117591530B (en) 2024-01-17 2024-01-17 Data cross section processing method and system

Publications (2)

Publication Number Publication Date
CN117591530A true CN117591530A (en) 2024-02-23
CN117591530B CN117591530B (en) 2024-04-19

Family

ID=89913636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410066973.9A Active CN117591530B (en) 2024-01-17 2024-01-17 Data cross section processing method and system

Country Status (1)

Country Link
CN (1) CN117591530B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808578A (en) * 2024-03-01 2024-04-02 杭银消费金融股份有限公司 Intelligent pedestrian credit information data analysis method and system

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446546A (en) * 2018-03-20 2018-08-24 深信服科技股份有限公司 Abnormal access detection method, device, equipment and computer readable storage medium
CN109460432A (en) * 2018-11-14 2019-03-12 腾讯科技(深圳)有限公司 A kind of data processing method and system
CN111553576A (en) * 2020-04-20 2020-08-18 国电南瑞科技股份有限公司 Data verification method, device and system suitable for electric power spot market
CN112381773A (en) * 2020-11-05 2021-02-19 东风柳州汽车有限公司 Key cross section data analysis method, device, equipment and storage medium
CN112395325A (en) * 2020-11-27 2021-02-23 广州光点信息科技有限公司 Data management method, system, terminal equipment and storage medium
CN112486891A (en) * 2020-11-30 2021-03-12 无锡职业技术学院 Automatic checking device, system and method for supply chain business document
CN112668944A (en) * 2021-01-26 2021-04-16 天元大数据信用管理有限公司 Enterprise wind control method, device, equipment and medium based on big data credit investigation
KR20220040023A (en) * 2020-09-23 2022-03-30 오스템임플란트 주식회사 Method, device and computer program stored in recording medium for displaying teeth
WO2022068645A1 (en) * 2020-09-30 2022-04-07 深圳前海微众银行股份有限公司 Database fault discovery method, apparatus, electronic device, and storage medium
CN114691443A (en) * 2020-12-25 2022-07-01 苏州国双软件有限公司 Cross section data sending method and device, electronic equipment and storage medium
CN115237996A (en) * 2022-08-01 2022-10-25 数预智能科技(上海)有限公司杭州分公司 Mining method for distribution rule and outlier of cross-section data
CN116611797A (en) * 2023-07-20 2023-08-18 杭银消费金融股份有限公司 Service tracking and monitoring method, system and storage medium
CN116663978A (en) * 2023-05-22 2023-08-29 厦门美亚亿安信息科技有限公司 Quality assessment method and system for audit data
CN116743501A (en) * 2023-08-10 2023-09-12 杭银消费金融股份有限公司 Abnormal flow control method and system
CN116821848A (en) * 2023-06-27 2023-09-29 杭银消费金融股份有限公司 Accounting abnormal data periodic detection method and system based on artificial intelligence
CN116883184A (en) * 2023-07-12 2023-10-13 江苏知链科技有限公司 Financial tax intelligent analysis method based on big data
CN116933189A (en) * 2022-04-07 2023-10-24 北京沃东天骏信息技术有限公司 Data detection method and device
CN117009204A (en) * 2023-08-30 2023-11-07 杭银消费金融股份有限公司 Service call tracking-based health evaluation system of credit giving system
CN117149797A (en) * 2023-10-27 2023-12-01 杭银消费金融股份有限公司 Accounting method and system based on multidimensional data monitoring
CN117390392A (en) * 2023-10-19 2024-01-12 呼伦贝尔安泰热电有限责任公司海拉尔热电厂 Building abnormal heat utilization probability identification method, system and storage medium
CN117405971A (en) * 2023-10-09 2024-01-16 国网河南电力公司营销服务中心 Power acquisition digitization method based on flow automation

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446546A (en) * 2018-03-20 2018-08-24 深信服科技股份有限公司 Abnormal access detection method, device, equipment and computer readable storage medium
CN109460432A (en) * 2018-11-14 2019-03-12 腾讯科技(深圳)有限公司 A kind of data processing method and system
CN111553576A (en) * 2020-04-20 2020-08-18 国电南瑞科技股份有限公司 Data verification method, device and system suitable for electric power spot market
KR20220040023A (en) * 2020-09-23 2022-03-30 오스템임플란트 주식회사 Method, device and computer program stored in recording medium for displaying teeth
WO2022068645A1 (en) * 2020-09-30 2022-04-07 深圳前海微众银行股份有限公司 Database fault discovery method, apparatus, electronic device, and storage medium
CN112381773A (en) * 2020-11-05 2021-02-19 东风柳州汽车有限公司 Key cross section data analysis method, device, equipment and storage medium
CN112395325A (en) * 2020-11-27 2021-02-23 广州光点信息科技有限公司 Data management method, system, terminal equipment and storage medium
CN112486891A (en) * 2020-11-30 2021-03-12 无锡职业技术学院 Automatic checking device, system and method for supply chain business document
CN114691443A (en) * 2020-12-25 2022-07-01 苏州国双软件有限公司 Cross section data sending method and device, electronic equipment and storage medium
CN112668944A (en) * 2021-01-26 2021-04-16 天元大数据信用管理有限公司 Enterprise wind control method, device, equipment and medium based on big data credit investigation
CN116933189A (en) * 2022-04-07 2023-10-24 北京沃东天骏信息技术有限公司 Data detection method and device
CN115237996A (en) * 2022-08-01 2022-10-25 数预智能科技(上海)有限公司杭州分公司 Mining method for distribution rule and outlier of cross-section data
CN116663978A (en) * 2023-05-22 2023-08-29 厦门美亚亿安信息科技有限公司 Quality assessment method and system for audit data
CN116821848A (en) * 2023-06-27 2023-09-29 杭银消费金融股份有限公司 Accounting abnormal data periodic detection method and system based on artificial intelligence
CN116883184A (en) * 2023-07-12 2023-10-13 江苏知链科技有限公司 Financial tax intelligent analysis method based on big data
CN116611797A (en) * 2023-07-20 2023-08-18 杭银消费金融股份有限公司 Service tracking and monitoring method, system and storage medium
CN116743501A (en) * 2023-08-10 2023-09-12 杭银消费金融股份有限公司 Abnormal flow control method and system
CN117009204A (en) * 2023-08-30 2023-11-07 杭银消费金融股份有限公司 Service call tracking-based health evaluation system of credit giving system
CN117405971A (en) * 2023-10-09 2024-01-16 国网河南电力公司营销服务中心 Power acquisition digitization method based on flow automation
CN117390392A (en) * 2023-10-19 2024-01-12 呼伦贝尔安泰热电有限责任公司海拉尔热电厂 Building abnormal heat utilization probability identification method, system and storage medium
CN117149797A (en) * 2023-10-27 2023-12-01 杭银消费金融股份有限公司 Accounting method and system based on multidimensional data monitoring

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
夏晶;秦芬芬;: "公路运输统计指标数据质量评估方法研究", 交通运输研究, no. 06, 5 February 2018 (2018-02-05) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808578A (en) * 2024-03-01 2024-04-02 杭银消费金融股份有限公司 Intelligent pedestrian credit information data analysis method and system

Also Published As

Publication number Publication date
CN117591530B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN117591530B (en) Data cross section processing method and system
US9298538B2 (en) Methods and systems for abnormality analysis of streamed log data
CN110852878B (en) Credibility determination method, device, equipment and storage medium
CN109961165B (en) Method, device, equipment and storage medium for predicting part quantity
CN117149797B (en) Accounting method and system based on multidimensional data monitoring
US11301351B2 (en) Machine learning based data monitoring
CN111062600B (en) Model evaluation method, system, electronic device, and computer-readable storage medium
CN114281877A (en) Data management system and method
CN114970926A (en) Model training method, enterprise operation risk prediction method and device
CN117435630B (en) Rule preposition-based data verification method and system
CN110674100A (en) User demand prediction method and framework based on full-channel operation data
CN116049157B (en) Quality data analysis method and system
CN112257974A (en) Gas lock well risk prediction model data set, model training method and application
US20230099164A1 (en) Systems and methods for automated data quality semantic constraint identification using rich data type inferences
CN115147029A (en) Enterprise activity monitoring method and system based on big data
CN114722081A (en) Streaming data time sequence transmission method and system based on transfer library mode
Raj et al. On the Impact of ML use cases on Industrial Data Pipelines
Grambau et al. Reference Architecture framework for enhanced social media data analytics for Predictive Maintenance models
CN110703183A (en) Intelligent electric energy meter fault data analysis method and system
Nekipelov et al. Moment forests
US20230409460A1 (en) System and method for optimizing performance of a process
CN115860488A (en) AI industrial assistant process rule range statistical method
CN117493420A (en) Financial cloud data processing method, device, equipment and medium
Nelson Rebooting simulation
Houssou et al. Anomaly Detection Model for Imbalanced Datasets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant