CN106980684B - Method and system for realizing data flow audit of large-scale informatization system - Google Patents

Method and system for realizing data flow audit of large-scale informatization system Download PDF

Info

Publication number
CN106980684B
CN106980684B CN201710206153.5A CN201710206153A CN106980684B CN 106980684 B CN106980684 B CN 106980684B CN 201710206153 A CN201710206153 A CN 201710206153A CN 106980684 B CN106980684 B CN 106980684B
Authority
CN
China
Prior art keywords
audit
data
network element
metadata
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710206153.5A
Other languages
Chinese (zh)
Other versions
CN106980684A (en
Inventor
龚历
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fifth Research Institute Of Telecommunications Technology Co ltd
Original Assignee
Fifth Research Institute Of Telecommunications Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fifth Research Institute Of Telecommunications Technology Co ltd filed Critical Fifth Research Institute Of Telecommunications Technology Co ltd
Priority to CN201710206153.5A priority Critical patent/CN106980684B/en
Publication of CN106980684A publication Critical patent/CN106980684A/en
Application granted granted Critical
Publication of CN106980684B publication Critical patent/CN106980684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1456Hardware arrangements for backup
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a method and a system for realizing data flow audit of a large-scale informatization system, wherein the method comprises the following steps: step (1) using time difference to collect audit metadata from different network element nodes; step (2) the audit metadata is distributed and stored in a streaming audit component; step (3) data flow audit is carried out; step (4) calibrating an audit result; and (5) storing the audit result. The system comprises an audit metadata collection module, a distributed storage audit metadata module, a streaming audit module, an audit result calibration module and an audit result storage module by utilizing time difference. The time difference is utilized to collect the audit metadata from the network element nodes and the difference comparison method is utilized to calibrate the audit result, so that the influence of abnormal data on the audit result can be effectively avoided, the accuracy of the streaming audit result is greatly improved, and automatic fault tolerance can be realized by distributing and storing the audit metadata.

Description

Method and system for realizing data flow audit of large-scale informatization system
Technical Field
The invention relates to the field of data audit analysis of a large-scale informatization system, in particular to a data flow audit realization method and system of the large-scale informatization system.
Background
The informatization is beneficial to the system innovation of enterprises and improves the comprehensive competitiveness of the enterprises. In recent years, the application level of informatization of government and enterprise industries in China is gradually deepened, most of the government and enterprise groups construct own large-scale business support information systems, each large-scale business support information system is composed of a plurality of components or network elements, and each component or network element adopts data localization for ensuring the business response capability, so that the related data are redundantly stored in each related system. With the advent of the information age, work orders and transaction data flow of business support information systems are rapidly increased, and data structures are diversified. In actual continuous systems, services and operations of the government and enterprise business support system, the inconsistency of redundant data of each component or network element can cause conflict of business results of each component, for example, the inconsistency problem of business subscription, billing and service is becoming serious due to the existence of the data difference of telecommunication operators, so that the increase of customer complaints, the loss of business income, the reduction of customer service level and satisfaction degree and the influence on the credibility of a decision-making system are brought. The government and enterprise group pays more attention to the condition that the data of each component or network element is inconsistent, and the requirement on the accuracy of the audit processing result of the inconsistent data of each component or network element is continuously improved.
The data auditing synchronization among all components and network elements of the large-scale information system is developed by constructing a data stream type auditing technology, namely, differences are found by means of the system, so that the method is the most direct and effective means for overcoming the defect of inconsistent data in the large-scale information system, and the robustness, reliability and controllability of the large-scale information system are ensured. In the prior art, a data auditing system platform acquires data acquired by each network element platform from an FTP transfer server, and the data auditing platform compares and performs difference processing on the data after acquiring the data, so that the problem that the accuracy of an auditing result is low, and the influence of abnormal data on the auditing result cannot be effectively eliminated exists.
Disclosure of Invention
Aiming at the existing problems, the invention provides a method and a system for realizing data flow type audit of a large-scale information system, which can greatly improve the audit accuracy by combining time difference collection audit metadata with a difference comparison method.
The technical scheme adopted by the invention is as follows:
a data flow audit realization method of a large-scale informatization system is characterized by comprising the following steps
Step (1) using time difference to collect audit metadata from different network element nodes;
step (2) the audit metadata is distributed and stored in a streaming audit component;
step (3) data flow audit is carried out;
step (4) calibrating an audit result;
and (5) storing the audit result.
Further, in step (1), data collection from a network element node is started earlier by Δ t, and data collection from the network element node is ended earlier by Δ t'.
Further, the Δ t and Δ t' are the time taken for data to be transferred between the neighboring network element nodes.
Still further, Δ t may range from 0 to 1s, and Δ t' may range from 0 to 1 s.
Furthermore, before the audit metadata distribution storage in the step (2), batch packaging processing is performed.
Still further, the streaming audit component includes a compute farm and a middleware repository.
Still further, the middleware library includes difference calculation model middleware.
Further, the step (3) comprises real-time automatic data streaming audit, automatic data streaming audit according to configuration tasks and manual data streaming audit.
Furthermore, the same data of the difference data sets of two adjacent audits is removed in the step (4).
A data streaming audit realization system of a large-scale informatization system is characterized by comprising an audit metadata collection module, a distributed storage audit metadata module, a streaming audit module, an audit result calibration module and an audit result storage module by utilizing time difference.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: the time difference is utilized to collect the audit metadata from the network element and the difference comparison method is utilized to calibrate the audit result, so that the influence of abnormal data on the audit result can be effectively avoided, and the accuracy of the streaming audit result is greatly improved; the distributed storage of the audit metadata can realize automatic fault tolerance and ensure the accuracy of the audit difference data.
Drawings
The invention will now be described, by way of example, with reference to the accompanying drawings, in which:
fig. 1 is a flowchart of a data flow audit implementation method of a large-scale information system according to the present invention.
Detailed Description
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
Any feature disclosed in this specification may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.
A data flow audit realization method of large-scale informatization system is characterized in that the method comprises,
step (1) using time difference to collect audit metadata from network element nodes;
step (2) the audit metadata is distributed and stored in a streaming audit component;
step (3) data flow audit is carried out;
step (4) calibrating an audit result;
and (5) storing the audit result.
In the specific embodiment, the presence of the network element a and the network element B is analyzed, and since the data stream on the network is serial, when a node at a certain time starts to collect data at the network element a node and the network element B node at the same time, part of the data already flows to the network element B node through the network element a node, which causes part of the data in the network element B node to be absent in the network element a node. Similarly, when the node pair stops collecting data for both the node a and the node B at a certain time, a portion of the data has already passed through the node a but has not reached the node B, which results in a portion of the data in the node a not being present in the node B. Therefore, the data sets acquired in the same time period have differences due to network transmission, system congestion and the like, and the data of the part are not really different, and the part should be treated as abnormal data.
In the step (1), data is collected from the node of the network element A at a time delta t ahead of time, and data flowing from the node of the network element A to the node of the network element B is obtained at the node of the network element A at the time delta t ahead of time, so that the node of the network element A and the node of the network element B are ensured to have the same starting data flow set; and finishing the data collection from the network element A node within the time of delta t', so as to ensure that finishing data in the network element A node and the network element B node are consistent. The data are respectively collected from the network elements by utilizing the time difference, so that abnormal data in the obtained network element data can be reduced.
The delta t and the delta t' are time used for transmitting data between adjacent network element nodes, a part of identification data is selected, time for the part of data to pass through the network element A node and time for the part of data to pass through the network element B node are obtained, and the time value for the data to reach the network element A node is subtracted from the time value for the data to reach the network element B node to obtain standard △ t time.
Further, the range of Δ t is 0-1s, and the range of Δ t' is 0-1 s.
The streaming audit component comprises a computing center and a middleware library, wherein the difference computing model is compiled into one or more middleware, the middleware is uploaded to the middleware library of the streaming audit component, the computing center and the middleware library of the streaming audit component judge whether the middleware is added or deleted through heartbeat, if the computing center judges that new middleware is added, the middleware is added to the middleware library, and if the computing center judges that the middleware is deleted, the middleware is deleted from the middleware library. No matter the middleware is added or deleted, the original system does not need to be changed, and the universality of the streaming audit system is improved by the design of the middleware library.
After the audit metadata in the step (2) is distributed to reach streaming audit components, the audit metadata is divided into a batch of data by taking a time slice (second level) as a unit, and the batch of data is arranged in a first-in first-out queue. When the difference audit is started, the difference calculation model sequentially takes out the data of each batch from the queue, packages the data of each batch into a set, and then starts the streaming audit. The audit metadata are packaged in batches to realize real-time automatic streaming audit.
The method comprises the steps that audit metadata are stored in a distributed storage mode, a plurality of copies are stored in the system, when a certain storage node fails, the system automatically switches services to other copies, automatic fault tolerance and fault tolerance transparency are achieved, and accuracy of streaming audit is improved.
The data flow audit comprises real-time automatic data flow audit, automatic data flow audit according to configuration tasks and manual data flow audit. Real-time automatic data streaming auditing ensures real-time performance of differences.
In the audit result calibration step, the same data of the difference data sets of two adjacent audits are removed, the data existing in the difference data sets of two adjacent audits are abnormal data, and the data are removed to prevent audit errors caused by the abnormal data.
And storing the difference result set calculated by the streaming audit component into a database as a difference file, and providing an inquiry interface to facilitate the butt-joint inquiry of an external system on the difference result.
A data streaming audit realization system of a large-scale informatization system is characterized by comprising an audit metadata collection module, a distributed storage audit metadata module, a streaming audit module, an audit result calibration module and an audit result storage module by utilizing time difference.
The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.

Claims (7)

1. A data flow audit realization method of large-scale informatization system is characterized in that the method comprises,
step (1) using time difference to collect audit metadata from different network element nodes;
step (2) the audit metadata is distributed and stored in a streaming audit component;
step (3) data flow audit is carried out;
step (4) calibrating an audit result;
step (5) storing an audit result;
in the step (1), the data stream flows from the first network element node to the second network element node, the difference between the time when the data reaches the second network element node and the time when the data reaches the first network element node is Δ t, the data collection from the first network element node is started earlier by Δ t, and the data collection from the network element node is finished earlier by Δ t'; Δ t is equal to Δ t';
wherein, the delta t and the delta t' are the time for transmitting data between adjacent network element nodes;
the range of the delta t is 0-1s, and the range of the delta t' is 0-1 s.
2. The method as claimed in claim 1, wherein the data stream audit is implemented by a large-scale information system,
and (3) performing batch packaging treatment before the audit metadata distribution storage in the step (2).
3. The method as claimed in claim 1, wherein the streaming audit component includes a computing center and a middleware library.
4. The method as claimed in claim 3, wherein the middleware library comprises a difference computation model middleware.
5. The method as claimed in claim 1, wherein the step (3) employs any one of real-time automatic data streaming audit, automatic data streaming audit according to configuration task, or manual data streaming audit.
6. The method for implementing data streaming audit of large-scale information system according to claim 1, wherein in step (4), the same data of difference data sets of two adjacent audits are removed.
7. A data flow audit realization system of a large-scale informatization system is characterized by comprising an audit metadata collection module, a distributed storage audit metadata module, a flow audit module, an audit result calibration module and an audit result storage module by using time difference;
the module for collecting audit metadata by using time difference comprises: collecting audit metadata from different network element nodes by using time difference;
the distributed storage audit metadata module comprises: the audit metadata is distributed and stored in the streaming audit component;
the streaming audit module is used for carrying out data streaming audit;
the audit result calibration module: calibrating the audit result;
the audit result storage module: storing an audit result;
in the module for collecting audit metadata by using time difference, data flow flows from a first network element node to a second network element node, the time difference between the data arrival time at the second network element node and the data arrival time at the first network element node is delta t, the data collection from the first network element node is started in advance of delta t, and the data collection from the network element node is finished in advance of delta t'; Δ t is equal to Δ t';
wherein, the delta t and the delta t' are the time for transmitting data between adjacent network element nodes;
the range of the delta t is 0-1s, and the range of the delta t' is 0-1 s.
CN201710206153.5A 2017-03-31 2017-03-31 Method and system for realizing data flow audit of large-scale informatization system Active CN106980684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710206153.5A CN106980684B (en) 2017-03-31 2017-03-31 Method and system for realizing data flow audit of large-scale informatization system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710206153.5A CN106980684B (en) 2017-03-31 2017-03-31 Method and system for realizing data flow audit of large-scale informatization system

Publications (2)

Publication Number Publication Date
CN106980684A CN106980684A (en) 2017-07-25
CN106980684B true CN106980684B (en) 2020-04-17

Family

ID=59339483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710206153.5A Active CN106980684B (en) 2017-03-31 2017-03-31 Method and system for realizing data flow audit of large-scale informatization system

Country Status (1)

Country Link
CN (1) CN106980684B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101094051A (en) * 2007-06-27 2007-12-26 中国移动通信集团四川有限公司 System and method for synchronizing comparison of data consistency
CN101217393A (en) * 2007-12-30 2008-07-09 中国移动通信集团四川有限公司 A charging income guarantee method realized by auditing
KR20100085493A (en) * 2009-01-20 2010-07-29 주.피어링포탈 Method for allowing view of time difference in internet broadcasting service
CN103780406A (en) * 2012-10-18 2014-05-07 中国电信股份有限公司 Data acquisition method and system, and network management device
CN104915756A (en) * 2015-05-22 2015-09-16 电信科学技术第五研究所 Data consistency cloud auditing system and implementation method
CN106503977A (en) * 2016-10-20 2017-03-15 财付通支付科技有限公司 The processing method of data, system and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101094051A (en) * 2007-06-27 2007-12-26 中国移动通信集团四川有限公司 System and method for synchronizing comparison of data consistency
CN101217393A (en) * 2007-12-30 2008-07-09 中国移动通信集团四川有限公司 A charging income guarantee method realized by auditing
KR20100085493A (en) * 2009-01-20 2010-07-29 주.피어링포탈 Method for allowing view of time difference in internet broadcasting service
CN103780406A (en) * 2012-10-18 2014-05-07 中国电信股份有限公司 Data acquisition method and system, and network management device
CN104915756A (en) * 2015-05-22 2015-09-16 电信科学技术第五研究所 Data consistency cloud auditing system and implementation method
CN106503977A (en) * 2016-10-20 2017-03-15 财付通支付科技有限公司 The processing method of data, system and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
客户数据一致性管理系统稽核模型的研究;张彦君;《科技信息》;20101231;全文 *

Also Published As

Publication number Publication date
CN106980684A (en) 2017-07-25

Similar Documents

Publication Publication Date Title
US20120011121A1 (en) Data analysis using multiple systems
CN103176974B (en) The method and apparatus of access path in optimization data storehouse
US8861691B1 (en) Methods for managing telecommunication service and devices thereof
CN106547643B (en) Recovery method and device of abnormal data
US20160055044A1 (en) Fault analysis method, fault analysis system, and storage medium
CN110362455B (en) Data processing method and data processing device
CN110363663B (en) Block chain-based data batch processing method, device, equipment and storage medium
WO2014116527A1 (en) Method and system for using a recursive event listener on a node in hierarchical data structure
EP2972959B1 (en) Auditing of data processing applications
Bouillet et al. Processing 6 billion CDRs/day: from research to production (experience report)
CN112711496A (en) Log information full link tracking method and device, computer equipment and storage medium
KR20150077474A (en) Rule distribution server, as well as event processing system, method, and program
CN104636211A (en) Information interaction method among software systems, and middleware system
CN114579668A (en) Database data synchronization method
US9009735B2 (en) Method for processing data, computing node, and system
CN106980684B (en) Method and system for realizing data flow audit of large-scale informatization system
US8600978B2 (en) Diverse route adjustment tool
EP2770447B1 (en) Data processing method, computational node and system
CN116634011A (en) Data pushing method and device, storage medium and electronic equipment
CN112800064B (en) Real-time big data application development method and system based on Confluent community open source version
US10467193B1 (en) Real-time ad hoc querying of data records
US10318506B2 (en) Database system
US20170124611A1 (en) Methods for Monitoring and Valuating Transactions for Document Processes
CN105590224A (en) Method for determining failure node in transaction process
CN104038532A (en) Distributed system interconnection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 610021 Sichuan city of Chengdu province Jinjiang District Dacisi Road No. 22

Applicant after: Telecommunication science and technology fifth Research Institute Co., Ltd.

Address before: 610021 Sichuan city of Chengdu province Jinjiang District Dacisi Road No. 22

Applicant before: Information Industry Department No. 5 Telecommunication Technologics Research Institute

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant