CN117892713A

CN117892713A - Method, device, electronic equipment and storage medium for determining report difference data

Info

Publication number: CN117892713A
Application number: CN202410026763.7A
Authority: CN
Inventors: 张晓丽; 杨扬; 王欢; 康江涛; 张琳; 郑勇健; 顾先尧; 薛伟
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Information Technology Co Ltd
Priority date: 2024-01-08
Filing date: 2024-01-08
Publication date: 2024-04-16

Abstract

The application discloses a method, a device, electronic equipment and a storage medium for determining report difference data, wherein the method comprises the steps of acquiring a source report file to be detected under the condition of receiving an analysis instruction input by a user; carrying out hash calculation on each piece of data in the source report file to obtain an index of the data of the source report file; determining a target cluster node corresponding to the index according to the corresponding relation between the index and the cluster node; and comparing the fragment data in the source report file corresponding to the index with the target data by using a matrix graph algorithm through the target cluster nodes. The method and the device can analyze the source report file based on a matrix graph algorithm, calculate the shortest comparison path of the same data, rapidly process the high-dimensional data and complex relationship between the target data and the source report file to be tested, automatically process the whole slicing, mapping and comparison process of the source report file, and rapidly obtain the difference data between the source report file and the target file.

Description

Method, device, electronic equipment and storage medium for determining report difference data

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, an electronic device, and a storage medium for determining report difference data.

Background

Currently, with the development of data statistics and computer technology, cloud service support systems have been widely used. In the related art, a report statistics module is arranged in the cloud service support system to perform statistics processing on various types of data. The data statistics report includes customer cloud product order data, basic password account, etc., and in the data statistics, total data analysis is required in daily cycle, weekly cycle, monthly cycle or other cycle. For the data statistics report, when the report data has the phenomenon of data rapid increase, the data source needs to be manually analyzed, the report data needs to be manually checked, and abnormal details need to be manually checked. In the related art, the report module is only responsible for report generation of different business types, and for the report generated by the report module, the report needs to be manually checked for data and analyzed for report content, and the report checking and analysis efficiency is low and errors are easy to occur.

Disclosure of Invention

The embodiment of the application provides a method, a device, electronic equipment and a storage medium for determining report difference data, which can rapidly process high-dimensional data and complex relations between target data and report files to be measured to acquire the report difference data.

In a first aspect, an embodiment of the present application provides a method for determining report difference data, where the method for determining report difference data includes: under the condition of receiving an analysis instruction input by a user, acquiring a source report file to be detected; carrying out hash calculation on each piece of data in the source report file to obtain an index of the data of the source report file; determining a target cluster node corresponding to the index according to the corresponding relation between the index and the cluster node; and comparing the fragment data in the source report file corresponding to the index with the target data by using a matrix graph algorithm through the target cluster nodes to obtain a difference data array.

According to a first aspect of embodiments of the present application, the source report file includes a data type; and comparing the fragment data in the source report file corresponding to the index with the target data by using a matrix graph algorithm through the target cluster node, and before obtaining the difference data array, the method further comprises the following steps: obtaining a target file from a file server, wherein the target file corresponds to the fragment data in the source report file corresponding to each index one by one; determining comparison index information corresponding to the source report file based on the data type of the source report file; and screening target data corresponding to the comparison index information from the data of the target file.

According to a first aspect of embodiments of the present application, the alignment index information includes at least one of an alignment difference template, an alignment primary key, and a list key.

According to a first aspect of the embodiments of the present application, by using a target cluster node, performing differential data comparison on fragment data in a source report file corresponding to an index and target data by using a matrix graph algorithm, to obtain a differential data array, including: the method comprises the steps of performing matrix arrangement on fragment data and target data by a target cluster node by adopting a matrix graph algorithm to obtain a data arrangement matrix graph; determining data position information in the fragment data and the target data based on the data arrangement matrix diagram; the data position information comprises the intersection point position of corresponding data and the position of non-corresponding data in the fragment data and the target data; determining the shortest comparison path of corresponding elements in the fragment data and the target data according to the data position information; and comparing the fragment data with the target data based on the shortest comparison path to obtain a difference data array.

According to a first aspect of embodiments of the present application, the difference data array includes at least one of an anomaly percentage, an anomaly data stream, and an anomaly file total amount.

According to a first aspect of the embodiments of the present application, the method for determining report difference data further includes: and under the condition that the quantity of the data in the difference data array meets the target threshold value, sending a target data packet comprising the difference data array to the graphic monitoring module for the graphic monitoring module to construct a data analysis graph based on the target data packet.

According to a first aspect of embodiments of the present application, the source report file includes a data type; in the case that the number of data in the difference data array meets the target threshold, sending a target data packet including the difference data array to the graphics monitoring module, wherein before the graphics monitoring module builds the data analysis graph based on the target data packet, the method further comprises: and determining a target threshold corresponding to the data type of the source report file according to the corresponding relation between the data type and the threshold, wherein the target threshold is used for representing the duty ratio of the difference data array relative to the fragment data and the target data.

In a second aspect, an embodiment of the present application provides an apparatus for determining report difference data, where the apparatus for determining report difference data includes: the acquisition module is used for acquiring a source report file to be detected under the condition of receiving an analysis instruction input by a user; the computing module is used for carrying out hash computation on each piece of data in the source report file to obtain an index of the data of the source report file; the determining module is used for determining a target cluster node corresponding to the index according to the corresponding relation between the index and the cluster node; and the comparison module is used for comparing the fragment data in the source report file corresponding to the index with the target data by using a matrix graph algorithm through the target cluster node to obtain a difference data array.

In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor, a memory, and a program stored in the memory and executable on the processor, and the program when executed by the processor implements the method for determining report difference data in the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, where a program or an instruction is stored, and when the program or the instruction is executed by a processor, the method for determining report difference data in the foregoing first aspect is implemented.

According to the method, the device, the electronic equipment and the storage medium for determining the report difference data, the target data is compared with the report file to be tested, the target data is historical data corresponding to the report file to be tested, and the target data is used for judging whether the report file to be tested has larger difference or not. According to the method, each piece of data in the source report file is subjected to Hash modulus according to the number of server engines to be fragmented, then the data of the target file is shunted to different server engines and is calculated and analyzed based on a matrix graph algorithm, and the matrix graph algorithm can calculate the shortest comparison path of the same data so as to rapidly process the high-dimensional data and complex relations between the target data and the data in the source report file to be tested. The whole process of slicing, mapping and comparing the source report files can be automatically performed, the operation efficiency is high, and the difference data in the source report files and the target report files can be obtained quickly.

Drawings

Features, advantages, and technical effects of exemplary embodiments of the present application will be described below with reference to the accompanying drawings.

FIG. 1 is a flowchart of a method for determining report difference data according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of target data screening according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of determining a difference data array based on a matrix graph algorithm according to an embodiment of the present application;

FIG. 4 is an example of a data alignment matrix diagram provided by an embodiment of the present application;

fig. 5 is a schematic structural diagram of an apparatus for determining report difference data according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of another apparatus for determining report difference data according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a system for determining report difference data according to an embodiment of the present disclosure;

fig. 8 is a rule configuration diagram of a system cluster service module provided in an embodiment of the present application;

FIG. 9 is a schematic diagram of a report difference capturing operation page layout of a file service module according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of an operation page layout of a report file management unit according to an embodiment of the present disclosure;

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application are described in detail below to make the objects, technical solutions and advantages of the present application more apparent, and to further describe the present application in conjunction with the accompanying drawings and the detailed embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative of the application and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by showing examples of the present application.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The present application is made based on the discovery and recognition by the inventors of the following facts and problems:

currently, with the development of data statistics and computer technology, cloud service support systems have been widely used. In the related art, a report statistics module is arranged in the cloud service support system to perform statistics processing on various types of data. The report types can be mainly divided into an account settlement report and a data statistics report, wherein the data statistics report comprises customer cloud product order data, basic password accounts and the like, and in the data statistics, the total data amount analysis needs to be carried out in a daily period, a weekly period, a monthly period or other periods. For the data statistics report, when the report data has the phenomenon of data rapid increase, the abnormal conditions of data sources and total research data are needed to be manually analyzed. For the account settlement report, the report statistics module can carry out multi-service platform account settlement for different types of clients, so that the report issuing function is realized, but the account amount difference in the account period still needs to be manually compared and whether the amount is correct or not is confirmed.

The report statistics module is only responsible for generating different business reports, and for the accounting statement and the data statistics statement, report data are manually verified and analyzed at the current stage to realize report abnormal information reporting. After the report generated by the report statistics module is downloaded, the data needs to be manually checked and the report content is analyzed, so that errors are easy to occur, and the user function experience is poor. In addition, the current report statistics module only simply generates a business report, then sends the business report to a client in a transmission mode, and the client checks report information after acquiring the report, so that the report and a report analysis data result cannot be directly acquired from the report statistics module.

In order to solve the problems in the prior art, the embodiment of the application provides a method, a device, electronic equipment and a storage medium for determining report difference data.

The following first describes a method for determining report difference data provided in the embodiments of the present application.

The flow chart of the method for determining report difference data provided in the embodiment of the present application, as shown in fig. 1, may include the following steps S110 to S140.

S110, under the condition of receiving an analysis instruction input by a user, acquiring a source report file to be detected.

In the embodiment of the present application, it may be understood that, before executing the acquiring action on the source report file to be detected, the system needs to first receive an analysis instruction input by the user, and locate and receive the corresponding source report file based on the analysis instruction. Based on the above, after receiving the source report file to be detected, further processing is performed on the source report file.

In one example, the system receives a service function request of a user through a task console, after receiving the service function request, analyzes a service code in the service function request through a routing function module, judges user operation based on the service code, and after determining an instruction corresponding to the user operation, the task console sends request information corresponding to the instruction to a related service function module and a cluster module so as to realize task scheduling and monitoring.

S120, carrying out hash calculation on each piece of data in the source report file to obtain an index of the data of the source report file.

It can be understood that in the process of analyzing the source report file, in order to reduce the pressure of the server and improve the operation speed of file processing, the source report file can be subjected to grouping processing to obtain a plurality of pieces of data, and each piece of data is divided into different server engines for processing, so that the operation process of difference comparison of the source report file can be ensured to be carried out at a higher speed.

In the embodiment of the application, in the process of performing slicing processing on the data in the source report file, the data can be distributed by adopting a technical means of hash modular computing, and when the hash modular computing is performed, the data can be divided based on the number of physical machines, and each physical machine corresponds to a group of slicing data. Specifically, when hash modulo operation is performed on each piece of data in the source report file, an index corresponding to the data is obtained, based on which, the system can identify the corresponding data according to the index, so that the source report file can be divided into multi-component slice data based on the index.

S130, determining a target cluster node corresponding to the index according to the corresponding relation between the index and the cluster node.

It will be appreciated that after grouping of the source report data is completed based on the index, it is necessary to map the plurality of sliced data to different cluster nodes and write the server engines corresponding to the respective cluster nodes.

In this embodiment of the present application, a virtual bucket method may be used to map fragmented data to a cluster node, one virtual bucket may accommodate multiple indexes, one cluster node may correspond to multiple virtual buckets, multiple virtual buckets may point to the same physical node, there is a correspondence between an index and a cluster node, and the correspondence between the index and the cluster node is stored in the virtual bucket, where it may be understood that data storage corresponding to the virtual bucket may be stored in a memory, so that data processing efficiency is higher, and decoupling of the mapping relationship between the index and the cluster node is implemented.

Based on the method, after decoupling the corresponding relation between the index and the cluster nodes, the cluster nodes corresponding to each piece of data in the source report file are searched through routing, and then the source report file is respectively written into the corresponding server engines for storage and calculation based on the partitioned data of each group of the index.

In one example, the server engines may be set to three, and accordingly, when performing hash modulo computation on each piece of data in the source report file, the number of indexes is the same as that of the server engines, in other words, the hash modulo computation may obtain three indexes, the source report file is divided into three groups of slice data based on the three indexes, and the three groups of slice data may be distributed and written into the three server engines respectively by adopting a virtual bucket method.

In another example, the server engines may be set to four, and accordingly, when hash modulo calculation is performed on each piece of data in the source report file, four indexes are obtained, the source report file is divided into four groups of slice data based on the four indexes, and the four groups of slice data may be distributed and written into the four server engines respectively by adopting a virtual bucket method.

It should be emphasized that the server engine may be selected according to the service requirement, which is not limited in this embodiment of the present application, and the number of indexes obtained in the hash modulo calculation is the same as the number of server engines.

And S140, comparing the fragment data in the source report file corresponding to the index with the target data by using a matrix graph algorithm through the target cluster nodes to obtain a difference data array.

In the embodiment of the application, one target cluster node corresponds to one server engine, and each server engine obtains a comparison result when comparing and analyzing the allocated fragment data with the target data, wherein the comparison result is a difference data array corresponding to each fragment data and the target data.

It can be understood that the difference between the whole source report file and the target file can be restored by combining based on the difference data arrays obtained by analyzing the server engines.

The specific implementation manner of the method for determining report difference data provided by the embodiment of the application is based on expanding the basis of a report statistics module in the related technology, uploading a source report file to be compared to a server engine module for capturing report difference, marking the file to be compared and a file serving as a comparison reference as a source report file and a target file respectively, carrying out hash mold taking on each piece of data in the source report file according to the number of server engines to fragment, and then shunting the target file data to different server engines to carry out difference comparison to obtain difference data between the source report file and the target file, wherein the server engines can adopt a matrix graph algorithm to carry out quick arrangement, analysis and processing on the data to obtain a comparison result. Based on the method, the whole process of slicing, mapping and comparing the source report files can be automatically performed, the operation efficiency is high, and difference data in the source report files and the target files can be obtained quickly.

In some embodiments, the source report file includes a data type; as shown in fig. 2, based on the foregoing method for determining report difference data, the present application further provides an implementation manner of another method for determining report difference data, where the implementation manner further includes, before S140:

S210, acquiring a target file from a file server, wherein the target file corresponds to the fragment data in the source report file corresponding to each index one by one.

It can be understood that, in the report data of the same type, when a significant difference occurs in the report data in the current period of one section compared with the previous period, the embodiment of the application provides a method for automatically analyzing and extracting the difference in the report data in the current period of one section. Based on the method, the report to be analyzed in the current period is taken as a source report file, meanwhile, the report data of the same type in the previous period is also required to be taken as a reference file for comparison and judgment, and the reference file used for comparison can be recorded as a target file in the process of carrying out comparison and analysis on the source report file.

S220, determining the comparison index information corresponding to the source report file based on the data type of the source report file.

In the embodiment of the present application, it may be understood that there is a large difference between data reports of different data types, such as an order data report, a product data report, an order data report, a billing data report, and a settlement data report. For data report forms of various data types, different report form configurations are needed, data sampling is carried out on required data from corresponding data report forms, and abstract extraction is carried out on different service requirements and different difference capturing scenes. Based on the above, when the difference analysis is performed on the source report files, the corresponding comparison index information needs to be determined for the source report files with different data types, and the data sampling is performed for the comparison index information.

In one example, the system can perform templatization processing on the report files of the source to be tested according to different business requirements and different anomaly capturing scenes, the system can abstract and extract report data of the same attribute based on comparison index information of the report and make templates so as to reduce the coupling degree of the system and improve the flexibility of file comparison, and in the process of processing the source report files, the rapid standardization processing is performed on the source report files based on the modules by judging the types of the source report files.

S230, screening target data corresponding to the comparison index information from the data of the target file.

In the embodiment of the present application, it may be understood that, in order to analyze the difference between the source report file and the target file in the current period, the source report file and the target file may be compared based on some key parameters and key attributes to be studied, where the key parameters and key attributes are the foregoing comparison index information. Based on the comparison index information, the system can automatically sample data from the target file and obtain target data corresponding to the comparison index information.

In some embodiments, the alignment index information includes at least one of an alignment difference template, an alignment primary key, and a list key.

In the embodiment of the application, it can be understood that the source report files under each data type correspond to different comparison difference templates, different comparison main keys and different list keywords. When analyzing the source report file, the system can obtain the corresponding comparison difference template and divide the identification keywords.

In some embodiments, as shown in fig. 3, S140 (comparing, by the target cluster node, the fragment data in the source report file corresponding to the index with the target data by using a matrix graph algorithm to obtain a difference data array) may include the following steps:

s310, through the target cluster node, the fragment data and the target data are arranged in a matrix mode through a matrix graph algorithm, and a data arrangement matrix graph is obtained.

It can be understood that the matrix diagram analysis algorithm is a method of finding pairs of factors from the event of a multidimensional problem and arranging the pairs into a matrix diagram, and then analyzing the problem and determining key points according to the matrix diagram. The matrix diagram analysis algorithm can acquire the interrelationship among the data factors in a matrix form, find out the problem in the link of comparing the data and obtain the idea of solving the problem.

In the embodiment of the application, the shortest distance between the same data and the difference data in the fragmented data and the target data can be found by using a matrix diagram analysis algorithm, and the comparison analysis is performed on the two groups of data based on the shortest path, so that the comparison time and complexity of the file data are shortened. Based on the method, the matrix graph algorithm is used for calculating and analyzing the data to obtain the required result, so that the operation efficiency can be improved, the time consumption is reduced, the software maintenance is easier, and the application structure is more reliable.

S320, determining data position information in the fragment data and the target data based on the data arrangement matrix diagram; the data position information comprises the intersection point position of corresponding data and the position of non-corresponding data in the fragment data and the target data.

In the embodiment of the present application, it may be understood that when a matrix graph algorithm is used to find a difference between the sliced data and the target data, the sliced data and the target data are respectively arranged in a row and a column, so as to obtain a data arrangement matrix graph, where the positions of corresponding data in the graph and the positions of non-corresponding data in the sliced data and the target data can be determined.

Based on this, the position of the data in the fragmented data and the target data can be determined from the position of the corresponding data in the data arrangement matrix diagram, and the position of the data in the fragmented data or the target data can be determined from the position of the non-corresponding data. In other words, the data position information in the piece data and the target data can be obtained based on the data arrangement matrix map, and the position of the piece data with respect to the target data can be located.

S330, determining the shortest comparison path of the corresponding elements in the fragment data and the target data according to the data position information.

In the embodiment of the present application, it may be understood that, in order to determine the difference data array of the slice data relative to the target data, the overall matching degree of the slice data and the target data needs to be compared. Based on this, in the data arrangement matrix diagram, according to the position of each element in the piece of data and the target data, the shortest alignment path of the corresponding element can be drawn, and the shortest alignment path can correspond to each element in the piece of data and each element in the target data.

S340, comparing the fragment data with the target data based on the shortest comparison path to obtain a difference data array.

In the embodiment of the present application, it may be understood that the complexity of the matrix diagram algorithm is a linear complexity, and in the data arrangement matrix diagram, the shortest alignment path is determined based on the corresponding element, the position of the corresponding element, and the position of the non-corresponding element in the slice data and the target data. Thus, each node in the shortest comparison path contains element information in the sliced data and the target data, and based on the shortest comparison path, the position of the sliced data in the data arrangement matrix diagram and the position of the target data in the data arrangement matrix diagram, the difference element of the sliced data relative to the target data can be positioned, so that a difference data array can be obtained.

In one example, as shown in fig. 4, a data arrangement matrix diagram constructed based on two data slice sequences is shown, and when the two data slice sequences are subjected to comparative analysis using a matrix diagram algorithm, the two data slice sequences are S1 and S2, respectively, S1 is composed of a1 and a 2..al, S1 is L in length, S2 is composed of b1 and b2...bn, and S2 is N in length, where L and N are both positive integers. The coordinates of the points of the data arrangement matrix diagram corresponding to S1 and S2 on the two-dimensional coordinate system may be expressed as (x, y), where x e [0, l ], y e [0, n ]. Each aggregation point of the data arrangement matrix diagram forms a forward infinite loop diagram, a position coordinate transformation of the aggregation point from one point to the next point along a horizontal axis direction can be expressed as (x-1, y) → (x, y), wherein x e [1, l ], y e [0, n ], a position coordinate transformation of the aggregation point from one point to the next point along a vertical axis direction can be expressed as (x, y-1) → (x, y), wherein x e [0, l ], y e [1, n ], and a position coordinate transformation of the aggregation point from one point to the next point can be expressed as (x-1, y-1) to (x, y) when the aggregation point moves diagonally in the data arrangement matrix diagram.

In the case where L takes 7 and n takes 6, the data slice sequence S1 is abcbaba, the data slice sequence S2 is CBABAC, and the abscissa in fig. 4 represents the coordinates of the data slice sequences S1 and S2, respectively. The shortest alignment coordinate path of the difference data can be found by fig. 4, where P is the diagonal quantity, the shortest path d=l+n-2P from (0, 0) to (L, N) in the coordinate axes.

Referring to fig. 4, the shortest alignment path is shown as arrow lines, and 9 arrow lines are shown in fig. 4. The abscissa of the data arrangement matrix diagram is the data slice sequence S1, the ordinate of the data arrangement matrix diagram is the data slice sequence S2, and the first arrow line extends from the origin of coordinates to the first element in the data slice sequence S1, namely element a; a second arrow extends from a first element to a second element in the sequence of data sheets S1; a third arrow line, which extends obliquely as viewed in the figure to reach the target position along the shortest path, covers both the third element (C element) in the data piece sequence S1 and the first element (C element) in the data piece sequence S2; the fourth arrow corresponds to the second element (element B) in the overlay data slice sequence S2; the fifth arrow line and the sixth arrow line cover the A element in the data slice sequence S1 and the data slice sequence S2 and the B element in the data slice sequence S1 and the data slice sequence S2 simultaneously; then a seventh arrow line covers the B element in the data sheet sequence S1; an eighth arrow line covers the A element in the data slice sequence S1 and the data slice sequence S2 at the same time; the ninth arrow covers the last element (element a) in the sequence of data pieces S2.

Based on the above, the position points pointed by the inclined arrow lines in the shortest comparison coordinate path are the same elements of the data sheet sequence S1 and the data sheet sequence S2, the position points pointed by the horizontal arrow lines are different elements of the data sheet sequence S1 relative to the data sheet sequence S2, the position points pointed by the vertical arrow lines are different elements of the data sheet sequence S2 relative to the data sheet sequence S1, and the related information of the two data sheet sequences can be acquired based on the shortest comparison coordinate path so as to analyze the difference data of the two data sheet sequences.

In some embodiments, the difference data array includes at least one of an anomaly percentage, an anomaly data stream, and an anomaly file total amount.

In the embodiment of the present application, it may be understood that after each server engine determines, based on a matrix graph algorithm, a difference element of the sliced data with respect to the target data, an abnormal data stream, an abnormal percentage of the sliced data with respect to the target data, and an abnormal data total amount can be generated based on the difference element.

In some embodiments, based on the foregoing method for determining report difference data, the present application further provides an implementation manner of another method for determining report difference data, where the implementation manner further includes, after S140:

And S150, under the condition that the quantity of the data in the difference data array meets a target threshold value, sending a target data packet comprising the difference data array to the graphic monitoring module for the graphic monitoring module to construct a data analysis graph based on the target data packet.

In the embodiment of the present application, it may be understood that after each server engine determines, based on a matrix graph algorithm, a difference element of the fragment data with respect to the target data, and obtains a difference data array, the system may perform a packaging process on the difference data array to obtain the target data packet. Based on the method, the system can send the target data packet obtained through analysis processing to the graphic monitoring module, and the graphic monitoring module can analyze the target data packet and generate a graphic to be intuitively displayed to a user.

In some embodiments, the source report file includes a data type. Based on the foregoing method for determining report difference data, the present application further provides an implementation manner of another method for determining report difference data, where the implementation manner further includes between S140 and S150:

s141, determining a target threshold corresponding to the data type of the source report file according to the corresponding relation between the data type and the threshold, wherein the target threshold is used for representing the duty ratio of the difference data array relative to the fragment data and the target data.

In the embodiment of the application, it can be understood that the size of the target threshold can be set in the system, and different types of report data are affected by the nature of the data types of the report data, so that the tolerance of the differences is different. Based on this, different target thresholds may be set in the system, taking into account the data type of the source report file.

In one example, where the source report file is a data report, the target threshold may be determined to be a duty cycle of the differential data array relative to the target data exceeding 30%.

In another example, where the source report file is a accounts clearing report, the target threshold may be determined to be a difference data array having a duty cycle of more than 0% relative to the source report file.

The embodiment of the application also provides a device 400 for determining report difference data, as shown in fig. 5, the device 400 may include the following modules:

and the acquisition module 410 is used for acquiring the source report file to be detected under the condition of receiving the analysis instruction input by the user.

The calculation module 420 is configured to perform hash calculation on each piece of data in the source report file, so as to obtain an index of the data of the source report file.

The determining module 430 is configured to determine, according to the correspondence between the index and the cluster node, a target cluster node corresponding to the index.

And the comparison module 440 is configured to compare the fragment data in the source report file corresponding to the index with the target data by using a matrix graph algorithm through the target cluster node, so as to obtain a difference data array.

Any of the acquisition module 410, the calculation module 420, the determination module 430, and the comparison module 440 may be combined in one module to be implemented, or any of them may be split into multiple modules, according to embodiments of the present application. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module.

In some embodiments, referring to fig. 6, the apparatus 400 for determining report difference data may further include a filtering module 450, where the filtering module 450 may be specifically configured to:

and obtaining target files from the file server, wherein the target files correspond to the fragment data in the source report files corresponding to the indexes one by one.

Determining comparison index information corresponding to the source report file based on the data type of the source report file; and screening target data corresponding to the comparison index information from the data of the target file.

In some embodiments, referring to fig. 6, the apparatus 400 for determining report difference data may further include a sending module 460, where the sending module 460 may be specifically configured to:

And under the condition that the quantity of the data in the difference data array meets the target threshold value, sending a target data packet comprising the difference data array to the graphic monitoring module for the graphic monitoring module to construct a data analysis graph based on the target data packet.

In some embodiments, referring to fig. 6, the apparatus 400 for determining report difference data may further include a target determining module 470, where the target determining module 470 may specifically be configured to:

and determining a target threshold corresponding to the data type of the source report file according to the corresponding relation between the data type and the threshold, wherein the target threshold is used for representing the duty ratio of the difference data array relative to the fragment data and the target data.

In some embodiments, the comparison module 440 may be specifically configured to:

and (3) through the target cluster node, carrying out matrix arrangement on the fragment data and the target data by adopting a matrix graph algorithm to obtain a data arrangement matrix graph.

Determining data position information in the fragment data and the target data based on the data arrangement matrix diagram; the data position information comprises the intersection point position of corresponding data and the position of non-corresponding data in the fragment data and the target data.

And determining the shortest comparison path of the corresponding elements in the fragment data and the target data according to the data position information.

And comparing the fragment data with the target data based on the shortest comparison path to obtain a difference data array.

Each module in the apparatus shown in fig. 6 has a function of implementing each step in the method for determining report difference data, and can achieve the corresponding technical effects thereof, which is not described herein for brevity.

The embodiment of the application also provides a structural schematic diagram of a system for determining report difference data, referring to fig. 7, the system comprises a task console, a system cluster service module, a file service module, an image monitoring module, a file server and a report anomaly capturing engine module; the file service module comprises a file classification module, a file processing module and a file data flow module, and the report form capturing abnormal engine module consists of three server engines.

Specifically, the task console is configured to receive a service function request of a user, send the service function request to the system cluster service module, perform service processing according to an operation type after the system cluster service module receives the report service function request, perform data sampling on required data according to different data reports, such as order data, product data, order data, charging data, settlement data and the like, and abstract and extract different service requirements and different anomaly capturing scenes, abstract and extract core contents, such as a file comparison keyword, a comparison period, a file type, an anomaly capturing type, a comparison anomaly capturing alarm proportion and the like, and perform template making according to different service anomaly capturing. Therefore, the system can determine the comparison index information corresponding to the source report file based on the data type of the source report file, and when the source report file is analyzed, the system can acquire the corresponding comparison difference template and divide the identification keywords.

The system cluster service module comprises a new rule configuration unit, a modification rule configuration unit, a deletion rule configuration unit, a report structure configuration unit, a file path configuration unit and a configuration information query unit. The newly added rule configuration unit can configure different types of data reports, as shown in fig. 8, wherein the configuration items comprise report types, report business codes, report business activity codes, channel codes, report operation types, exception capturing periods, whether to generate exception data files, file name rules and the like. The report types are divided into order data, product data, order data, billing data and settlement data. The report business codes and the report business activity codes are used for distinguishing report anomaly capture types, and the anomaly capture types are used for distinguishing manual anomaly capture and automatic anomaly capture. The anomaly capturing period is a set report anomaly capturing event period, and can be set as a daily period, a weekly period, a monthly period and a annual period; when automatic capturing of the abnormality is performed periodically, the system compares the previous cycle with the present cycle and obtains the result. The file name rule is used for distinguishing different report type files.

The modification rule configuration unit is used for modifying the business requirement of the data report operation rule, and the modifiable content configuration items comprise report operation types, exception capturing periods, whether to generate an exception data file and file name generation rules. The deleting rule configuration unit is used for deleting the data service requirement of the report configuration rule and deleting the invalidated report configuration rule. The report structure configuration unit is used for configuring the captured report abnormal data items and monitoring abnormal data column information, wherein the maximum value of the newly added data columns is 128 columns, and the minimum value is 1 column. The report structure configuration unit can set a file comparison unique identification column to be used as a file data comparison unique identification, and report data can be modified and deleted after the structure configuration. As shown in fig. 8, the file path configuration unit is configured to allocate independent paths to different operation files, where the configuration items include an abnormal file generation path, a source file report uploading path, and a target file report path. The configuration information inquiry unit is used for inquiring report information, wherein the report information comprises source report file uploading information, target file information, abnormal capture file information and the like.

In order to store and manage report data, the file service module is provided with a storage and management interface function, and after receiving a file processing request, the file service module carries out file related service processing, and the main service of the file processing service module is file processing so as to realize functions of report file generation, file uploading, file downloading, file deleting and the like; when analyzing the report data, the file service module classifies the files, converts the classified subfiles into data streams for data analysis, and compresses and optimizes the report data after the analysis result is obtained so as to reduce the occupation of storage space.

Specifically, the file service module comprises a report file generation unit, a report file uploading unit, a report exception capturing unit and a report file management unit; the report file generating unit is used for configuring operation types according to report types to generate report file data, and the mode of generating the report file data is divided into manual generation and automatic generation. The report file uploading unit is used for uploading files of different types of operations according to the requirements of the user, and if the uploading type is manual uploading, the user needs to upload report data files to be compared. As shown in fig. 9, the report anomaly capturing unit is configured to manually capture anomalies of the source report file and the target file uploaded by the report anomaly capturing unit according to the user requirements, and generate an anomaly data report and an anomaly data analysis chart.

Referring to fig. 10 for a function of the report file management unit, the report file management unit may include a report file querying subunit, a report file downloading subunit, and a report file deleting subunit. The report file inquiry subunit is used for inquiring file information by comparing time, file type and file name, wherein the file information comprises uploading source report file information, target file information and abnormal capturing file information. It will be appreciated that each source report file corresponds to a report name, report type, file creation time, file path, anomaly capture type, and capture period. The report file downloading subunit is used for downloading the report data file according to the queried data report information. The report file deleting subunit is used for deleting the report data files abandoned by the user, wherein the report file deleting subunit can be used for periodically and automatically deleting the files through configuration and can also be used for manually deleting the report data files.

It is emphasized that the file service module can adopt a distributed storage technology based on a big data Hadoop architecture and is combined with a framework HDFS distributed storage application, so that the file service module has high fault tolerance, can automatically store data into a plurality of copies, and can automatically recover after the copies are lost; in the file operation process, when a user accesses, the file is accessed in a streaming mode, can be written once and read for a plurality of times, achieves the consistency of file data, avoids the loss of the file data, perfects the backup and recovery mechanism of report data, ensures that the normal operation can be quickly recovered when abnormal conditions occur, and improves the reliability through multiple copies so as to provide a fault tolerance and recovery mechanism. In addition, the file processing module can also support real-time sharing of report data, so that the sharing performance and the cooperation efficiency of the report data are improved, the real-time sharing function can be realized, and a plurality of users are allowed to view and edit the same report data at the same time. Therefore, when the abnormal situation is handled, a plurality of users can cooperatively work, analyze and solve the problems together, and improve the overall working efficiency.

The method comprises the steps that a report file difference comparison and difference data analysis can be achieved by the report file capture abnormal engine module, a shortest comparison coordinate path is obtained through a matrix graph algorithm, fragment data and target data are compared to obtain a difference data array, after the report file capture abnormal engine module obtains the difference data array, a system can package the difference data array to obtain a target data packet, the target data packet comprises data such as report generation rate, report business processing rate, dynamic report capture, abnormal percentage, abnormal data flow, abnormal file total amount and the like, the packaged target data packet is sent to a graph monitoring module, an abnormal data report file is generated, data graph display is conducted in the graph monitoring module, the data is used for generating an abnormal result captured by the latest data report analysis in a period of report comparison frequency, and a dynamic data return data packet is formed.

The graphic monitoring module has a data graphic generating function, after receiving a target data packet returned by the data analysis service, the graphic monitoring module analyzes data in the target data packet, classifies the data, generates graphic data, fills numerical values of the data graphic, further generates a graphic of report difference data and displays the graphic to a client.

According to the report data flow analysis method and the report data flow analysis system, the report data flow analysis is realized by capturing the report exception engine, the data in the report file of the source to be tested is divided into groups, the data is classified and counted according to the types, the data is copied into the cluster service in a divided mode, the integrity of the data can be ensured when different clusters are requested, and the report data analysis result is returned after the data analysis.

In some embodiments, the present application provides an electronic device, and a schematic structural diagram of the electronic device is shown in fig. 11.

The electronic device may include a processor 510 and a memory 520 storing computer program instructions.

In particular, the processor 510 may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.

Memory 520 may include mass storage for data or instructions. By way of example, and not limitation, memory 520 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. Memory 520 may include removable or non-removable (or fixed) media, where appropriate. Memory 520 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 520 is a non-volatile solid state memory.

Memory 520 may include Read Only Memory (ROM), random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, memory 520 includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and which, when executed (e.g., by one or more processors), perform the operations described by the method of determining report difference data of any of the above-described embodiments.

The processor 510 implements any of the methods of determining report discrepancy data described in the embodiments above by reading and executing computer program instructions stored in the memory 520.

In one example, the electronic device may also include a communication interface 530 and a bus 500. As shown in fig. 11, the processor 510, the memory 520, and the communication interface 530 are connected to each other by the bus 500 and communicate with each other.

The communication interface 530 is mainly used to implement communication between each module, apparatus, unit and/or device in the embodiments of the present application.

Bus 500 includes hardware, software, or both, coupling components of the online data flow billing device to each other. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. Bus 500 may include one or more buses, where appropriate. Although embodiments of the present application describe and illustrate a particular bus, the present application contemplates any suitable bus or interconnect.

In addition, in combination with the method for determining report difference data in the above embodiment, the embodiment of the application may be implemented by providing a computer storage medium. The computer storage medium has stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement a method of determining report difference data in any of the above embodiments.

According to the embodiment of the application, the extraction and comparison of the reports with different periods can be realized, the report data anomalies with different periods can be found quickly and conveniently, the report data differences do not need to be checked by human intervention, the intelligent report data anomaly analysis and capture are realized, the functions of graphical report anomaly capture and the like are also realized, clients can conveniently, intuitively and accurately check the report data analysis results, and the waste of resources and human time is avoided. In addition, the calculation accuracy of the report difference data is improved by using a matrix graph algorithm, the high-dimensional data and complex relations are rapidly processed by calculating the minimum distance of the same data in the comparison process, the optimal data comparison use time is obtained, the system operation speed is prompted, and the file comparison abnormal information is rapidly obtained.

It should be clear that the present application is not limited to the particular arrangements and processes described above and illustrated in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions, or change the order between steps, after appreciating the spirit of the present application.

The functional blocks shown in the above block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be different from the order in the embodiments, or several steps may be performed simultaneously.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to being, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware which performs the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In the foregoing, only the specific embodiments of the present application are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, which are intended to be covered by the scope of the present application.

Claims

1. A method of determining report difference data, comprising:

under the condition of receiving an analysis instruction input by a user, acquiring a source report file to be detected;

carrying out hash calculation on each piece of data in the source report file to obtain an index of the data of the source report file;

determining a target cluster node corresponding to the index according to the corresponding relation between the index and the cluster node;

and comparing the fragment data in the source report file corresponding to the index with the target data by using a matrix graph algorithm through the target cluster node to obtain a difference data array.

2. The method of determining report difference data of claim 1, wherein the source report file comprises a data type; and comparing the fragment data in the source report file corresponding to the index with the target data by using a matrix graph algorithm through the target cluster node to obtain a difference data array, wherein the method further comprises the following steps:

obtaining a target file from a file server, wherein the target file corresponds to the fragment data in the source report file corresponding to each index one by one;

determining comparison index information corresponding to the source report file based on the data type of the source report file;

And screening target data corresponding to the comparison index information from the data of the target file.

3. The method of claim 2, wherein the alignment index information includes at least one of an alignment difference template, an alignment primary key, and a list key.

4. The method for determining report difference data according to claim 1, wherein the comparing, by the target cluster node, the difference data between the sliced data in the source report file corresponding to the index and the target data by using a matrix graph algorithm to obtain a difference data array includes:

the method comprises the steps of performing matrix arrangement on the fragment data and the target data by a target cluster node by adopting a matrix graph algorithm to obtain a data arrangement matrix graph;

determining data position information in the fragment data and the target data based on the data arrangement matrix diagram; the data position information comprises the intersection point position of corresponding data and the position of non-corresponding data in the fragment data and the target data;

determining the shortest comparison path of the corresponding elements in the fragmented data and the target data according to the data position information;

5. The method of claim 1 or 4, wherein the difference data array includes at least one of an anomaly percentage, an anomaly data stream, and an anomaly file total amount.

6. The method of determining report difference data of claim 1, further comprising:

and under the condition that the quantity of the data in the difference data array meets a target threshold value, sending a target data packet comprising the difference data array to a graph monitoring module for the graph monitoring module to construct a data analysis graph based on the target data packet.

7. The method of determining report difference data of claim 6, wherein the source report file comprises a data type; and under the condition that the quantity of the data in the difference data array meets a target threshold value, sending a target data packet comprising the difference data array to a graph monitoring module, wherein before the graph monitoring module builds a data analysis graph based on the target data packet, the method further comprises:

And determining a target threshold corresponding to the data type of the source report file according to the corresponding relation between the data type and the threshold, wherein the target threshold is used for representing the duty ratio of the difference data array relative to the fragmented data and the target data.

8. An apparatus for determining report difference data, comprising:

the acquisition module is used for acquiring a source report file to be detected under the condition of receiving an analysis instruction input by a user;

the computing module is used for carrying out hash computation on each piece of data in the source report file to obtain an index of the data of the source report file;

the determining module is used for determining a target cluster node corresponding to the index according to the corresponding relation between the index and the cluster node;

and the comparison module is used for comparing the fragment data in the source report file corresponding to the index with the target data by using a matrix graph algorithm through the target cluster node to obtain a difference data array.

9. An electronic device, comprising: a processor, a memory and a program stored on the memory and executable on the processor, which when executed by the processor implements the method of determining report difference data as claimed in any one of claims 1 to 7.

10. A computer readable storage medium, wherein a program or instructions is stored on the computer readable storage medium, which when executed by a processor, implements a method of determining report difference data as claimed in any one of claims 1 to 7.