CN106980684B - Method and system for realizing data flow audit of large-scale informatization system - Google Patents
Method and system for realizing data flow audit of large-scale informatization system Download PDFInfo
- Publication number
- CN106980684B CN106980684B CN201710206153.5A CN201710206153A CN106980684B CN 106980684 B CN106980684 B CN 106980684B CN 201710206153 A CN201710206153 A CN 201710206153A CN 106980684 B CN106980684 B CN 106980684B
- Authority
- CN
- China
- Prior art keywords
- audit
- data
- network element
- metadata
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1456—Hardware arrangements for backup
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/2866—Architectures; Arrangements
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Debugging And Monitoring (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses a method and a system for realizing data flow audit of a large-scale informatization system, wherein the method comprises the following steps: step (1) using time difference to collect audit metadata from different network element nodes; step (2) the audit metadata is distributed and stored in a streaming audit component; step (3) data flow audit is carried out; step (4) calibrating an audit result; and (5) storing the audit result. The system comprises an audit metadata collection module, a distributed storage audit metadata module, a streaming audit module, an audit result calibration module and an audit result storage module by utilizing time difference. The time difference is utilized to collect the audit metadata from the network element nodes and the difference comparison method is utilized to calibrate the audit result, so that the influence of abnormal data on the audit result can be effectively avoided, the accuracy of the streaming audit result is greatly improved, and automatic fault tolerance can be realized by distributing and storing the audit metadata.
Description
Technical Field
The invention relates to the field of data audit analysis of a large-scale informatization system, in particular to a data flow audit realization method and system of the large-scale informatization system.
Background
The informatization is beneficial to the system innovation of enterprises and improves the comprehensive competitiveness of the enterprises. In recent years, the application level of informatization of government and enterprise industries in China is gradually deepened, most of the government and enterprise groups construct own large-scale business support information systems, each large-scale business support information system is composed of a plurality of components or network elements, and each component or network element adopts data localization for ensuring the business response capability, so that the related data are redundantly stored in each related system. With the advent of the information age, work orders and transaction data flow of business support information systems are rapidly increased, and data structures are diversified. In actual continuous systems, services and operations of the government and enterprise business support system, the inconsistency of redundant data of each component or network element can cause conflict of business results of each component, for example, the inconsistency problem of business subscription, billing and service is becoming serious due to the existence of the data difference of telecommunication operators, so that the increase of customer complaints, the loss of business income, the reduction of customer service level and satisfaction degree and the influence on the credibility of a decision-making system are brought. The government and enterprise group pays more attention to the condition that the data of each component or network element is inconsistent, and the requirement on the accuracy of the audit processing result of the inconsistent data of each component or network element is continuously improved.
The data auditing synchronization among all components and network elements of the large-scale information system is developed by constructing a data stream type auditing technology, namely, differences are found by means of the system, so that the method is the most direct and effective means for overcoming the defect of inconsistent data in the large-scale information system, and the robustness, reliability and controllability of the large-scale information system are ensured. In the prior art, a data auditing system platform acquires data acquired by each network element platform from an FTP transfer server, and the data auditing platform compares and performs difference processing on the data after acquiring the data, so that the problem that the accuracy of an auditing result is low, and the influence of abnormal data on the auditing result cannot be effectively eliminated exists.
Disclosure of Invention
Aiming at the existing problems, the invention provides a method and a system for realizing data flow type audit of a large-scale information system, which can greatly improve the audit accuracy by combining time difference collection audit metadata with a difference comparison method.
The technical scheme adopted by the invention is as follows:
a data flow audit realization method of a large-scale informatization system is characterized by comprising the following steps
Step (1) using time difference to collect audit metadata from different network element nodes;
step (2) the audit metadata is distributed and stored in a streaming audit component;
step (3) data flow audit is carried out;
step (4) calibrating an audit result;
and (5) storing the audit result.
Further, in step (1), data collection from a network element node is started earlier by Δ t, and data collection from the network element node is ended earlier by Δ t'.
Further, the Δ t and Δ t' are the time taken for data to be transferred between the neighboring network element nodes.
Still further, Δ t may range from 0 to 1s, and Δ t' may range from 0 to 1 s.
Furthermore, before the audit metadata distribution storage in the step (2), batch packaging processing is performed.
Still further, the streaming audit component includes a compute farm and a middleware repository.
Still further, the middleware library includes difference calculation model middleware.
Further, the step (3) comprises real-time automatic data streaming audit, automatic data streaming audit according to configuration tasks and manual data streaming audit.
Furthermore, the same data of the difference data sets of two adjacent audits is removed in the step (4).
A data streaming audit realization system of a large-scale informatization system is characterized by comprising an audit metadata collection module, a distributed storage audit metadata module, a streaming audit module, an audit result calibration module and an audit result storage module by utilizing time difference.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: the time difference is utilized to collect the audit metadata from the network element and the difference comparison method is utilized to calibrate the audit result, so that the influence of abnormal data on the audit result can be effectively avoided, and the accuracy of the streaming audit result is greatly improved; the distributed storage of the audit metadata can realize automatic fault tolerance and ensure the accuracy of the audit difference data.
Drawings
The invention will now be described, by way of example, with reference to the accompanying drawings, in which:
fig. 1 is a flowchart of a data flow audit implementation method of a large-scale information system according to the present invention.
Detailed Description
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
Any feature disclosed in this specification may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.
A data flow audit realization method of large-scale informatization system is characterized in that the method comprises,
step (1) using time difference to collect audit metadata from network element nodes;
step (2) the audit metadata is distributed and stored in a streaming audit component;
step (3) data flow audit is carried out;
step (4) calibrating an audit result;
and (5) storing the audit result.
In the specific embodiment, the presence of the network element a and the network element B is analyzed, and since the data stream on the network is serial, when a node at a certain time starts to collect data at the network element a node and the network element B node at the same time, part of the data already flows to the network element B node through the network element a node, which causes part of the data in the network element B node to be absent in the network element a node. Similarly, when the node pair stops collecting data for both the node a and the node B at a certain time, a portion of the data has already passed through the node a but has not reached the node B, which results in a portion of the data in the node a not being present in the node B. Therefore, the data sets acquired in the same time period have differences due to network transmission, system congestion and the like, and the data of the part are not really different, and the part should be treated as abnormal data.
In the step (1), data is collected from the node of the network element A at a time delta t ahead of time, and data flowing from the node of the network element A to the node of the network element B is obtained at the node of the network element A at the time delta t ahead of time, so that the node of the network element A and the node of the network element B are ensured to have the same starting data flow set; and finishing the data collection from the network element A node within the time of delta t', so as to ensure that finishing data in the network element A node and the network element B node are consistent. The data are respectively collected from the network elements by utilizing the time difference, so that abnormal data in the obtained network element data can be reduced.
The delta t and the delta t' are time used for transmitting data between adjacent network element nodes, a part of identification data is selected, time for the part of data to pass through the network element A node and time for the part of data to pass through the network element B node are obtained, and the time value for the data to reach the network element A node is subtracted from the time value for the data to reach the network element B node to obtain standard △ t time.
Further, the range of Δ t is 0-1s, and the range of Δ t' is 0-1 s.
The streaming audit component comprises a computing center and a middleware library, wherein the difference computing model is compiled into one or more middleware, the middleware is uploaded to the middleware library of the streaming audit component, the computing center and the middleware library of the streaming audit component judge whether the middleware is added or deleted through heartbeat, if the computing center judges that new middleware is added, the middleware is added to the middleware library, and if the computing center judges that the middleware is deleted, the middleware is deleted from the middleware library. No matter the middleware is added or deleted, the original system does not need to be changed, and the universality of the streaming audit system is improved by the design of the middleware library.
After the audit metadata in the step (2) is distributed to reach streaming audit components, the audit metadata is divided into a batch of data by taking a time slice (second level) as a unit, and the batch of data is arranged in a first-in first-out queue. When the difference audit is started, the difference calculation model sequentially takes out the data of each batch from the queue, packages the data of each batch into a set, and then starts the streaming audit. The audit metadata are packaged in batches to realize real-time automatic streaming audit.
The method comprises the steps that audit metadata are stored in a distributed storage mode, a plurality of copies are stored in the system, when a certain storage node fails, the system automatically switches services to other copies, automatic fault tolerance and fault tolerance transparency are achieved, and accuracy of streaming audit is improved.
The data flow audit comprises real-time automatic data flow audit, automatic data flow audit according to configuration tasks and manual data flow audit. Real-time automatic data streaming auditing ensures real-time performance of differences.
In the audit result calibration step, the same data of the difference data sets of two adjacent audits are removed, the data existing in the difference data sets of two adjacent audits are abnormal data, and the data are removed to prevent audit errors caused by the abnormal data.
And storing the difference result set calculated by the streaming audit component into a database as a difference file, and providing an inquiry interface to facilitate the butt-joint inquiry of an external system on the difference result.
A data streaming audit realization system of a large-scale informatization system is characterized by comprising an audit metadata collection module, a distributed storage audit metadata module, a streaming audit module, an audit result calibration module and an audit result storage module by utilizing time difference.
The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.
Claims (7)
1. A data flow audit realization method of large-scale informatization system is characterized in that the method comprises,
step (1) using time difference to collect audit metadata from different network element nodes;
step (2) the audit metadata is distributed and stored in a streaming audit component;
step (3) data flow audit is carried out;
step (4) calibrating an audit result;
step (5) storing an audit result;
in the step (1), the data stream flows from the first network element node to the second network element node, the difference between the time when the data reaches the second network element node and the time when the data reaches the first network element node is Δ t, the data collection from the first network element node is started earlier by Δ t, and the data collection from the network element node is finished earlier by Δ t'; Δ t is equal to Δ t';
wherein, the delta t and the delta t' are the time for transmitting data between adjacent network element nodes;
the range of the delta t is 0-1s, and the range of the delta t' is 0-1 s.
2. The method as claimed in claim 1, wherein the data stream audit is implemented by a large-scale information system,
and (3) performing batch packaging treatment before the audit metadata distribution storage in the step (2).
3. The method as claimed in claim 1, wherein the streaming audit component includes a computing center and a middleware library.
4. The method as claimed in claim 3, wherein the middleware library comprises a difference computation model middleware.
5. The method as claimed in claim 1, wherein the step (3) employs any one of real-time automatic data streaming audit, automatic data streaming audit according to configuration task, or manual data streaming audit.
6. The method for implementing data streaming audit of large-scale information system according to claim 1, wherein in step (4), the same data of difference data sets of two adjacent audits are removed.
7. A data flow audit realization system of a large-scale informatization system is characterized by comprising an audit metadata collection module, a distributed storage audit metadata module, a flow audit module, an audit result calibration module and an audit result storage module by using time difference;
the module for collecting audit metadata by using time difference comprises: collecting audit metadata from different network element nodes by using time difference;
the distributed storage audit metadata module comprises: the audit metadata is distributed and stored in the streaming audit component;
the streaming audit module is used for carrying out data streaming audit;
the audit result calibration module: calibrating the audit result;
the audit result storage module: storing an audit result;
in the module for collecting audit metadata by using time difference, data flow flows from a first network element node to a second network element node, the time difference between the data arrival time at the second network element node and the data arrival time at the first network element node is delta t, the data collection from the first network element node is started in advance of delta t, and the data collection from the network element node is finished in advance of delta t'; Δ t is equal to Δ t';
wherein, the delta t and the delta t' are the time for transmitting data between adjacent network element nodes;
the range of the delta t is 0-1s, and the range of the delta t' is 0-1 s.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710206153.5A CN106980684B (en) | 2017-03-31 | 2017-03-31 | Method and system for realizing data flow audit of large-scale informatization system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710206153.5A CN106980684B (en) | 2017-03-31 | 2017-03-31 | Method and system for realizing data flow audit of large-scale informatization system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106980684A CN106980684A (en) | 2017-07-25 |
CN106980684B true CN106980684B (en) | 2020-04-17 |
Family
ID=59339483
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710206153.5A Active CN106980684B (en) | 2017-03-31 | 2017-03-31 | Method and system for realizing data flow audit of large-scale informatization system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106980684B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101094051A (en) * | 2007-06-27 | 2007-12-26 | 中国移动通信集团四川有限公司 | System and method for synchronizing comparison of data consistency |
CN101217393A (en) * | 2007-12-30 | 2008-07-09 | 中国移动通信集团四川有限公司 | A charging income guarantee method realized by auditing |
KR20100085493A (en) * | 2009-01-20 | 2010-07-29 | 주.피어링포탈 | Method for allowing view of time difference in internet broadcasting service |
CN103780406A (en) * | 2012-10-18 | 2014-05-07 | 中国电信股份有限公司 | Data acquisition method and system, and network management device |
CN104915756A (en) * | 2015-05-22 | 2015-09-16 | 电信科学技术第五研究所 | Data consistency cloud auditing system and implementation method |
CN106503977A (en) * | 2016-10-20 | 2017-03-15 | 财付通支付科技有限公司 | The processing method of data, system and device |
-
2017
- 2017-03-31 CN CN201710206153.5A patent/CN106980684B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101094051A (en) * | 2007-06-27 | 2007-12-26 | 中国移动通信集团四川有限公司 | System and method for synchronizing comparison of data consistency |
CN101217393A (en) * | 2007-12-30 | 2008-07-09 | 中国移动通信集团四川有限公司 | A charging income guarantee method realized by auditing |
KR20100085493A (en) * | 2009-01-20 | 2010-07-29 | 주.피어링포탈 | Method for allowing view of time difference in internet broadcasting service |
CN103780406A (en) * | 2012-10-18 | 2014-05-07 | 中国电信股份有限公司 | Data acquisition method and system, and network management device |
CN104915756A (en) * | 2015-05-22 | 2015-09-16 | 电信科学技术第五研究所 | Data consistency cloud auditing system and implementation method |
CN106503977A (en) * | 2016-10-20 | 2017-03-15 | 财付通支付科技有限公司 | The processing method of data, system and device |
Non-Patent Citations (1)
Title |
---|
客户数据一致性管理系统稽核模型的研究;张彦君;《科技信息》;20101231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN106980684A (en) | 2017-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20120011121A1 (en) | Data analysis using multiple systems | |
CN103176974B (en) | The method and apparatus of access path in optimization data storehouse | |
US8861691B1 (en) | Methods for managing telecommunication service and devices thereof | |
CN106547643B (en) | Recovery method and device of abnormal data | |
US20160055044A1 (en) | Fault analysis method, fault analysis system, and storage medium | |
CN110362455B (en) | Data processing method and data processing device | |
CN110363663B (en) | Block chain-based data batch processing method, device, equipment and storage medium | |
WO2014116527A1 (en) | Method and system for using a recursive event listener on a node in hierarchical data structure | |
EP2972959B1 (en) | Auditing of data processing applications | |
Bouillet et al. | Processing 6 billion CDRs/day: from research to production (experience report) | |
CN112711496A (en) | Log information full link tracking method and device, computer equipment and storage medium | |
KR20150077474A (en) | Rule distribution server, as well as event processing system, method, and program | |
CN104636211A (en) | Information interaction method among software systems, and middleware system | |
CN114579668A (en) | Database data synchronization method | |
US9009735B2 (en) | Method for processing data, computing node, and system | |
CN106980684B (en) | Method and system for realizing data flow audit of large-scale informatization system | |
US8600978B2 (en) | Diverse route adjustment tool | |
EP2770447B1 (en) | Data processing method, computational node and system | |
CN116634011A (en) | Data pushing method and device, storage medium and electronic equipment | |
CN112800064B (en) | Real-time big data application development method and system based on Confluent community open source version | |
US10467193B1 (en) | Real-time ad hoc querying of data records | |
US10318506B2 (en) | Database system | |
US20170124611A1 (en) | Methods for Monitoring and Valuating Transactions for Document Processes | |
CN105590224A (en) | Method for determining failure node in transaction process | |
CN104038532A (en) | Distributed system interconnection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 610021 Sichuan city of Chengdu province Jinjiang District Dacisi Road No. 22 Applicant after: Telecommunication science and technology fifth Research Institute Co., Ltd. Address before: 610021 Sichuan city of Chengdu province Jinjiang District Dacisi Road No. 22 Applicant before: Information Industry Department No. 5 Telecommunication Technologics Research Institute |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |