CN115544092A - Data detection method, device, equipment and storage medium - Google Patents

Data detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN115544092A
CN115544092A CN202211281984.6A CN202211281984A CN115544092A CN 115544092 A CN115544092 A CN 115544092A CN 202211281984 A CN202211281984 A CN 202211281984A CN 115544092 A CN115544092 A CN 115544092A
Authority
CN
China
Prior art keywords
data
time
sequence
aggregation
model object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211281984.6A
Other languages
Chinese (zh)
Inventor
钟志明
张�浩
韩森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202211281984.6A priority Critical patent/CN115544092A/en
Publication of CN115544092A publication Critical patent/CN115544092A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data detection method, a device, equipment and a storage medium, wherein the method comprises the following steps: inquiring data of the distributed application server nodes, and formatting the inquired data to obtain a data model object which accords with a predefined data model; distributing the data model objects to corresponding channels for aggregation to obtain aggregated data sequence pairs, and collecting the data model objects associated with the aggregated data sequence pairs into a data stream; setting a dynamic data window range based on the starting time of each data model object entering the data stream and the tail end time of each data model object in the data stream; processing the aggregated data sequence pair in the data stream based on the set dynamic data window range to generate a detection result aiming at the inquired data; wherein, the detection result comprises whether the inquired data is consistent with the expected data.

Description

Data detection method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of data processing of financial technology (Fintech), and relates to but is not limited to a data detection method, a device, equipment and a storage medium.
Background
With the development of computer computing, more and more technologies are applied in the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but higher requirements are also put forward on the technologies due to the requirements of the financial industry on safety and real-time performance.
In the field of financial science and technology, projects are mostly deployed under a distributed architecture at present, applications and database instances are distributed in a multi-node area, and nodes communicate with each other through corresponding services. The current data consistency detection scheme is applied to a relational database management system such as Mysql master-slave architecture design, synchronization of files (such as binary format files Binlog) is started by relying on Mysql, a check sum (checksum) mode for generating data blocks is executed in a master library, the same data blocks checksum is calculated in a slave library after the data blocks are transmitted to the slave library through the Binlog synchronization log and executed. This data consistency detection scheme relies on the Binlog synchronization implementation of the master-slave architecture of the Mysql database. When multiple database types coexist simultaneously in a distributed cross-domain scenario, for example, the following multiple database relationship systems of different types exist in a distributed architecture design: in the scenarios of Mysql, tiDB, and Oracle, since various types of databases cannot form a master-slave architecture, detection in the scenario of distributed multiple database type architectures cannot be supported. Based on this, the related art cannot support cross-domain data consistency detection in the case of cross-domain and cross-database types.
Disclosure of Invention
The embodiment of the application provides a data detection method, a device, equipment and a storage medium, which are used for solving the problem that cross-domain data consistency detection cannot be supported under the condition of cross-domain and cross-database types in the prior art.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a data detection method, which comprises the following steps:
querying data of the distributed application server nodes, and formatting the queried data to obtain a data model object which accords with a predefined data model;
distributing the data model objects to corresponding channels for aggregation to obtain aggregated data sequence pairs, and collecting the data model objects associated with the aggregated data sequence pairs into a data stream;
setting a dynamic data window range based on a starting time of each data model object entering the data stream and a terminal time of each data model object in the data stream;
processing the aggregation data sequence pair in the data stream based on the set dynamic data window range to generate a detection result aiming at the inquired data; wherein the detection result comprises whether the inquired data is consistent with expected data.
A data detection apparatus, the apparatus comprising:
the preprocessing module is used for inquiring data of the distributed application server nodes and formatting the inquired data to obtain a data model object which accords with a predefined data model;
the aggregation module is used for distributing the data model objects to the corresponding channels for aggregation to obtain an aggregated data sequence pair, and collecting each data model object associated with the aggregated data sequence pair into a data stream;
the window setting module is used for setting a dynamic data window range based on the starting time of each data model object entering the data stream and the tail end time of each data model object in the data stream;
a detection result matching module, configured to process the aggregated data sequence pair in the data stream based on a set dynamic data window range, and generate a detection result for the queried data; wherein the detection result comprises whether the inquired data is consistent with expected data.
A data detection apparatus comprising:
a memory for storing executable instructions; and the processor is used for realizing the method when executing the executable instructions stored in the memory.
A computer readable storage medium having stored thereon executable instructions for causing a processor to perform the method described above when executed.
The embodiment of the application has the following beneficial effects:
querying data of a distributed application server node, and formatting the queried data to obtain a data model object which accords with a predefined data model; that is to say, the data detection method provided by the application can process data of distributed application server nodes, namely data of different data sources, format the data of the different data sources, and support cross-domain and cross-library data detection; further, the data model objects are distributed to corresponding channels for aggregation to obtain aggregated data sequence pairs, and the data model objects associated with the aggregated data sequence pairs are collected into a data stream; setting a dynamic data window range based on the starting time of each data model object entering the data stream and the tail end time of each data model object in the data stream; processing the aggregated data sequence pair in the data stream based on the set dynamic data window range to generate a detection result aiming at the inquired data; wherein, the detection result comprises whether the inquired data is consistent with the expected data; the data detection method provided by the application achieves the purposes of supporting cross-domain data consistency detection and improving detection efficiency under the condition of cross-domain and cross-database types.
Drawings
Fig. 1 is an alternative architecture diagram of a terminal provided in an embodiment of the present application;
fig. 2 is a first schematic flowchart of a data detection method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a data detection method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a data flow provided by an embodiment of the present application;
fig. 5 is a schematic flowchart diagram of a data detection method provided in the embodiment of the present application;
FIG. 6 is a schematic diagram of a time range defined by two time dimensions provided by an embodiment of the present application;
FIG. 7 is a diagram illustrating determining a maximum value within a query interval according to an embodiment of the present disclosure;
fig. 8 is a schematic diagram of streaming window calculation provided in an embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict. Unless defined otherwise, all technical and scientific terms used in the examples of this application have the same meaning as commonly understood by one of ordinary skill in the art to which the examples of this application belong. The terminology used in the embodiments of the present application is for the purpose of describing the embodiments of the present application only and is not intended to be limiting of the present application.
An exemplary application of the data detection device provided in the embodiment of the present application is described below, and the data detection device provided in the embodiment of the present application may be implemented as a notebook computer, a tablet computer, a desktop computer, a mobile device (for example, a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), an intelligent robot, or any other terminal with an on-screen display function, and may also be implemented as a server. Next, an exemplary application when the data detection apparatus is implemented as a terminal will be explained.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a terminal 100 according to an embodiment of the present application, where the terminal 100 shown in fig. 1 includes: at least one processor 110, at least one network interface 120, a user interface 130, and memory 150. The various components in terminal 100 are coupled together by a bus system 140. It is understood that the bus system 140 is used to enable connected communication between these components. The bus system 140 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 140 in fig. 1.
The Processor 110 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc., wherein the general purpose Processor may be a microprocessor or any conventional Processor, etc.
The user interface 130 includes one or more output devices 131, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 130 also includes one or more input devices 132 including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 150 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 150 optionally includes one or more storage devices physically located remotely from processor 110. The memory 150 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 150 described in embodiments herein is intended to comprise any suitable type of memory. In some embodiments, memory 150 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 151 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 152 for reaching other computing devices via one or more (wired or wireless) network interfaces 120, exemplary network interfaces 120 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
an input processing module 153 for detecting one or more user inputs or interactions from one of the one or more input devices 132 and translating the detected inputs or interactions.
In some embodiments, the apparatus provided in the embodiments of the present application may be implemented in software, and fig. 1 illustrates a data detection apparatus 154 stored in the memory 150, where the data detection apparatus 154 may be a data detection apparatus in the terminal 100, which may be software in the form of programs and plug-ins, and includes the following software modules: the preprocessing module 1541, the aggregation module 1542, the window setting module 1543, and the detection result matching module 1544 are logical, and thus may be arbitrarily combined or further divided according to the implemented functions. The functions of the respective modules will be explained below.
In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and as an example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to perform the data detection method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field-Programmable Gate arrays (FPGAs), or other electronic components.
Here, the current data consistency detection scheme is further explained, and in a scenario where a transaction link has strong consistency data requirements, data consistency of the transaction link can be generally ensured through transactions. However, for some abnormal scenarios that cannot be guaranteed by a program, such as application service downtime, manual intervention and pre-repair, data Migration (DM) Data synchronization conflict, master-slave synchronization inconsistency, etc., data inconsistency among Data Center Nodes (DCNs) in a production environment is caused, and thus Data conflict and Data splitting in the production environment are caused. Currently, a set of high-level command tools (Percona-Toolkit, PT) is commonly used in the industry to realize Mysql master-slave database consistency detection.
As described above, when multiple database types coexist in a distributed cross-domain scenario, for example, in a scenario of Mysql, tiDB, or Oracle existing in a distributed architecture design, detection in a scenario of distributed multiple database type architectures cannot be supported because various types of databases cannot form a master-slave architecture.
In addition, in the detection process, in the Mysql master-slave framework, the Binlog synchronization must be started, the detection table structures must be consistent, and the detection table can only work on one table for processing at one time. The reason is as follows: based on Binlog synchronization, the problem of master-slave synchronization delay is necessarily considered, and master-slave synchronization delay is probably generated under the condition of large synchronous data volume. Therefore, the current data consistency detection scheme needs to control the synchronization rate and the amount of the synchronization data, so that only a single table data can be detected once, the single table is divided into line blocks, and each block of master-slave node data is detected. This directly compromises efficiency and does not decouple the dependency on the underlying characteristics of the database.
Therefore, the data detection method is provided, and data consistency detection is realized and detection efficiency is improved under the condition of cross-domain and cross-database types.
The data detection method provided by the embodiment of the present application will be described below in conjunction with an exemplary application and implementation of the terminal 100 provided by the embodiment of the present application. Referring to fig. 2, fig. 2 is an alternative flow chart of the data detection method provided in the embodiment of the present application, which will be described in conjunction with the steps shown in fig. 2,
step S201, data of the distributed application server nodes are inquired, and the inquired data are formatted to obtain a data model object which accords with a predefined data model.
Applicable scenarios of the data detection method provided by the present application include, but are not limited to, combinations of one or more of the following: and multiple data sources coexist under the heterogeneous database scene and the distributed cross-database complex architecture scene.
In the embodiment of the application, a unified data model (MapMode) protocol is defined, and is used for performing unified mapping and packaging processing on data in a data set to be subjected to consistency detection, so as to realize a formatting processing process. In defining mapcode, the parameters involved include some or all of the following: data nodes, data types, data entities, data identity information, and data cursors. Here, the parameters involved are also called attributes, and therefore, the parameter values corresponding to the parameters are also called attribute values. Wherein, the attribute value corresponding to the data node is used for storing data segment DCN node information, such as a DCN area corresponding to the data; the attribute value corresponding to the data type is used for storing the type information of the data, such as the order form type; the attribute value corresponding to the data entity is used for storing the queried entity data information, for example { field 1: data 1, field 2: data 2}; the attribute value corresponding to the data identity information is used for storing the main key information of the inquired data; and the attribute value corresponding to the data cursor is used for storing the position information of the data cursor.
In some embodiments of the present application, taking a defined mapcode as an example of a data node, a data type, a data entity, data identity information, and a data cursor related to a data node, data of a queried distributed application server node is combined into a mapcode data packet, where the mapcode data packet includes parameters as shown in table 1:
properties Attribute name Attribute value
dataNode Data node DCN region
Datatype Data type order form type
dataEntity Data entity { field 1: data 1, field 2 data 2}
dataIdentity Data identity information Primary key information
dataVernier Data cursor 1
TABLE 1 parameters included in MapMode packets
In an achievable data preprocessing scenario, when a client of a terminal queries data of a distributed application server node, the data may be sequentially and progressively queried in a segmented query manner through a data cursor (dataVernier), the queried data is formatted and mapped and encapsulated into the defined MapMode object, and the MapMode object is transmitted based on a structured data storage manner, for example, a structured data serialization method (Protocol Buffer, protocol buf). The client can be a data sentry (WatchDog) client which is integrally deployed at the terminal. That is to say, in the present application, a data sentry WatchDog client may be integrally deployed at each distributed application server node, and data detection processing may be performed based on an open source framework, for example, an Input/Output (IO) thread model of Netty, so as to improve concurrency and speed of data transmission.
Illustratively, after the collection processing by the WatchDog client, mapcode data is mapped as follows:
the dataNode stores data source segment DCN node information;
the dataType stores the type information of the data;
entity data information inquired by the dataEntity;
the data identity inquired out the main key information of the data;
dataVernier data position information;
the following MapMode objects are obtained by MapMode data mapping after the acquisition and processing of the WatchDog client:
Figure BDA0003898560260000071
in the embodiment of the application, the client may be an application program running in the terminal, and may be a web application loaded in a web page.
Step S202, the data model objects are distributed to corresponding channels for aggregation, an aggregated data sequence pair is obtained, and the data model objects related to the aggregated data sequence pair are collected into a data stream.
In the embodiment of the application, the collected data set is serialized into a MapMode data model, the data model objects are distributed to corresponding channels to be aggregated to obtain aggregated data sequence pairs, and the aggregated data sequence pairs are associated with the data model objects and collected into data streams, so that the situation that the stream-oriented computation based on a dynamic data window range faces infinite input data streams is ensured, the admission reference of data amount is improved, the flow of a large amount of data can be processed, and the detection efficiency is improved.
Step S203 sets a dynamic data window range based on the start time of each data model object entering the data stream and the end time of each data model object in the data stream.
Wherein, the data window is the window time size of the data in the data stream.
In the embodiment of the application, the dynamic data window range is a data range determined according to at least two time dimensions, namely the starting time of each data model object entering the data stream and the end time of each data model object in the data stream, so that the problem of inaccurate data range in a single time dimension is avoided, and the data range is defined by adopting the double time dimensions, so that the data range is higher in accuracy.
And step S204, processing the aggregation data sequence pair in the data stream based on the set dynamic data window range, and generating a detection result aiming at the inquired data.
Wherein, the detection result comprises whether the inquired data is consistent with the expected data.
In the embodiment of the application, the aggregated data sequence pair in the data stream is processed based on the set dynamic data window range, and a detection result of whether the inquired data is consistent with the expected data is generated. Further, according to different detection results, event pattern matching (EventMode) can be triggered to notify the data detection results.
According to the data detection method, data of distributed application server nodes are inquired, and the inquired data are formatted, so that a data model object conforming to a predefined data model is obtained; that is to say, the data detection method provided by the application can process data of distributed application server nodes, namely data of different data sources, format the data of the different data sources, and support cross-domain and cross-library data detection; further, the data model objects are distributed to corresponding channels for aggregation to obtain aggregated data sequence pairs, and the data model objects associated with the aggregated data sequence pairs are collected into a data stream; setting a dynamic data window range based on the starting time of each data model object entering the data stream and the tail end time of each data model object in the data stream; processing the aggregated data sequence pair in the data stream based on the set dynamic data window range to generate a detection result aiming at the inquired data; the detection result comprises whether the inquired data is consistent with the expected data or not; the data detection method provided by the application supports cross-domain data consistency detection and improves detection efficiency under the condition of cross-domain and cross-database types.
In some embodiments of the present application, in step S202, the data model objects are shunted to the corresponding channels for aggregation, so as to obtain an aggregated data sequence pair, which may be implemented by the steps shown in fig. 3:
step S2021, obtain a data sequence pair of data model objects.
Wherein, the data sequence pair comprises a data entity characteristic sequence and a data information characteristic sequence.
Here, the data entity feature sequence includes features of the data entity included after the queried data is formatted; the data information characteristic sequence comprises the respective characteristics of the acquisition time of the inquired data, the data nodes and the data tables contained after formatting and the like.
Step S2022, determine a queue corresponding to the feature value included in the data entity feature sequence.
In the embodiment of the present application, the eigenvalues are the same, which indicates that the data entities are the same. The characteristic values and the queues have one-to-one correspondence, that is, the same queues are selected during the transmission of the data corresponding to the same characteristic values, so that the concurrency and the speed of data transmission are improved.
Step S2023, with the determined at least one queue, shunting the data model object to a corresponding channel for aggregation, so as to obtain an aggregated data sequence pair.
In the embodiment of the application, the collected data set is serialized into a MapMode data model, then the data is subjected to time sequence processing and feature sequence pair extraction, the time sequence of the data is ensured, and the same data set flows to the same queue channel. The data queue is then polled by a channel selector to determine to stream the data model into the queue channel for aggregation.
In some embodiments of the present application, the step S2021 of obtaining the data sequence pair of the data model object may be implemented by the following steps:
and A11, obtaining the data length, the product coefficient, the modulus coefficient and the encoding standard value of the character string mapping contained in the data model object.
In the embodiment of the application, whether the two character strings are equal or not is to compare the substrings mapped by the two character strings, rather than directly comparing the two character strings, so that the comparison accuracy is improved. Here, the substring of the character string mapping is embodied in the form of a characteristic value, and the encoding standard value, the multiplication coefficient, the modulus coefficient and the data length of the character string mapping are used for calculating the characteristic value of the character string mapping.
And A12, determining the characteristic value of the character string based on the multiplication coefficient, the modulus coefficient, the data length and the encoding standard value of the character string mapping, and taking the characteristic values of all the character strings contained in the data model object as the data entity characteristic sequence of the data model object.
Wherein, the data entity characteristic sequence h i And the method is used for judging the difference of the data entity characteristics.
In some embodiments, a12, determining the characteristic value of the character string based on the multiplication coefficient, the modulus coefficient, the data length, and the encoding standard value of the character string mapping, and substituting the multiplication coefficient, the modulus coefficient, the data length, and the encoding standard value of the character string mapping into a hash function to perform hash mapping on the character string to obtain the characteristic value of the character string. The hash mapping of the character string refers to mapping different character strings to different numbers by using a certain character string hash function.
In an implementation scenario, the characteristic value of the character string can be determined by a hash function as shown in the following formula (1), so as to obtain the characteristic sequence h of the data entity i
H is calculated from A11 to A12 i This can be achieved by the following calculation formula (1),
Figure BDA0003898560260000091
wherein, m [ j ]]Is a character, p, of a data entity dataEntity in MapMode j Is a multiplication coefficient and can be defined as a smaller value, mod is a modulus coefficient and can be defined as a larger value, n is the current data length, i and j are positive integers, idx (m [ j ] j)]) Encoding standard values, e.g. ASCII code values, for mapping characters, i.e. each character idx (m j) of a data entity dataEntity]) The ASCII table herein may refer to a related art ASCII code reference table corresponding to decimal values thereof in the ASCII table, which is not specifically limited in the present application.
In an achievable data consistency check scenario, referring to fig. 4, assume that there is a case where data of ABCD4 distributed application server nodes (i.e., cross-domain cross-library DCN node data) is normalized into a MapMpde object:
in the following, ABCD represents MapMpde objects processed by different DCN node data respectively,
A:{"dataNode":"AA0","dataType":"order","dataEntity":"{'table':'order_info','record':{'user_name':'zhangsan','seal_type':'2','trans_status':'SUCCESS'}}","dataIdentity":"2102240QD022000A96UN7M0LI0CXUDC0","dataVernier":"1"}
B:{"dataNode":"AJ0","dataType":"order","dataEntity":"{'table':'order_info','record':{'user_name':'zhangsan','seal_type':'2','trans_status':'SUCCESS'}}","dataIdentity":"2102240QD022000A96UN7M0LI0CXUDC0","dataVernier":"1"}
C:{"dataNode":"AK0","dataType":"order","dataEntity":"{"table":"order_info","record":{"user_name":"zhangsan","seal_type":"2","trans_status":"FAIL"}}","dataIdentity":"2102240QD022000A96UN7M0LI0CXUDC0","dataVernier":"1"}
D:{"dataNode":"AI0","dataType":"order","dataEntity":"{"table":"order_info","record":{"user_name":"zhangsan","seal_type":"2","trans_status":"SUCCESS"}}","dataIdentity":"2102240QD022000A96UN7M0LI0CXUDC0","dataVernier":"1"}
the data entities of the MapMpde objects after normalization processing of A, B and D in the corresponding nodes are consistent, the node data node corresponding to the MapMpde object after normalization processing of C is AK0, and the fields are subjected to variation. Further, the data entities of the ABCD are converted into uniform substrings of the characteristic values of the data entities.
Further, each character idx (mj) in the data entity dataEntity corresponds to its decimal value in the ASCII table. Assuming that p =3, mod =81001, h 2] =0, modulo is taken after accumulating the calculated value through each character string.
Here, taking a string in a as an example, the "{ 'SUCCESS' }" is calculated: a = ((0 × 3) 1 +123)+(123×3 2 +39)+(1146×3 3 +83)+(31025×3 4 +85)+(h[j-1]×3 j +idx(m[j])))%81001。
Here, taking a string in C as an example, the "{ 'FAIL' }" is calculated: c = ((0 × 3) 1 +123)+(123×3 2 +39)+(1146×3 3 +70)+(31012×3 4 +65)+(h[j-1]×3 j +idx(m[j])))%81001。
MapMode of ABCD is subjected to h i () The hash operation yields an index sequence of [ a, b, c, d]Where a = b = d, that is, the data entities of the ABD correspond to the same eigenvalue.
And A13, acquiring a source database region, a data source data table and data query time of the data model object, and taking the source database region, the data source data table and the data query time as a data information characteristic sequence of the data model object.
Wherein, d c As a source database DCN region, t a Is a data source data Table Table, w t The data query time is also called data acquisition time WatchTime (wt);
here, h (d) c +t a +w t ) Calculating to obtain a data information characteristic sequence h k
Wherein h (d) c +t a +w t ) Calculating to obtain h k The method can be realized by the following steps: d is to be c +t a +w t H is calculated as a character string by the above formula (1) k
And A14, forming a data sequence pair of the data model object based on the data entity characteristic sequence and the data information characteristic sequence.
Here, based on h i And h k And obtaining the data sequence pair w after operation h ,w h =<h i ,h k >。
Still taking the example of the MapMpde object processed by ABCD respectively representing different DCN node data, the data sequence pair w after ABCD calculation h The set of (a) is as follows: [<a,a′>,<b,b′>,<c,c′>,<d,d′>];
Where a = b = d is the same, assuming 1001, c is 1002.
Final data sequence pair w h The following:
<a{1001},a′{v:<d c :AA0,t a :order,w t :20220401080100>}>,
<b{1001},b′{v:<d c :AJ0,t a :order,w t :20220401080200>}>,
<c{1002},c′{v:<d c :AK0,t a :order,w t :20220401080300>}>,
<d{1001},d′{v:<d c :AI0,t a :order,w t :20220401080400>}>,
further, as shown in fig. 4, the client end uses a Selector (Selector) to select a characteristic sequence h according to the data entity i And (3) selecting the same Q1 Queue and C selecting a Q2 Queue due to the ABD characteristic value a = b = d. And then shunting the data model objects to corresponding channels in a queue form to be sent to a data aggregation center (AggCenter).
In some embodiments of the present application, in step S2023, the data model object is shunted to the corresponding channel for aggregation by the determined at least one queue, so as to obtain an aggregated data sequence pair, which may be implemented by the following steps:
and B21, creating a new temporary measurement block for the same data entity characteristic sequence, and marking the aggregation time sequence and the same block color for the temporary measurement block.
In the embodiment of the application, the AggCenter completes aggregation processing on the reported MapMode data. The AggCenter processes feature value dimension data of data from the same queue through an aggregator, creates a new temporary measurement block from the same data entity feature sequence, the new block is dyed to the same block color through the block, the dyed block (AggBlock) is an independent grain unit, as shown in fig. 4, aggblocks of different colors in the aggregation center are represented by different patterns, and 4 example patterns are shown in fig. 4. The block is responsible for uniformly executing state operation, and the processing efficiency of the source data is improved. And by marking (tag) and typing the aggregation time AggTime (at) for use in determining the dual time dimension range.
AggBlock aggregate data sequence pair a of ABD at this time h Is a abd
[<a{1001},a′{v:{<d c :AA0,t a :order,w t :20220401080100>,<tag:red,a t :20220401080400>}}>,
<b{1001},b′{v:{<d c :AJ0,t a :order,w t :20220401080200>,<tag:red,a t :20220401080400>}}>,
<d{1001},d′{v:{<d c :AI0,t a :order,w t :20220401080400>,<tag:red,a t :20220401080400>}}>]
C aggregated data sequence pair a h Is a c
[<c{1002},c′{v:{<d c :AK0,t a :order,w t :20220401080300>,<tag:blue,a t :20220401080500>}}>,d>]
Data are polymerized and then subjected to AggBlock dyeing, internal data state overturning is processed, and data processing efficiency is improved. Meanwhile, data is collected into a data stream and continuously transmitted into the data stream to form an unbounded data set.
And B22, extracting the query time sequence, the source database area and the data source data table from the data entity characteristic sequence of the data model object.
And B23, obtaining an aggregation data sequence pair based on the characteristic value, the source database area, the data source data table, the query time sequence, the block color and the aggregation time sequence.
In the embodiment of the application, aggTimes are connected in series to form a time axis. By ABCD data sequence pairs w h Extracting an acquisition time sequence:
Figure BDA0003898560260000121
a h extracting an aggregation time sequence:
Figure BDA0003898560260000122
wherein, the first and the second end of the pipe are connected with each other,
w t =[20220401080100,20220401080200,20220401080300,20220401080400];
a t =[20220401080400,20220401080400,20220401080500,20220401080400]。
in some embodiments of the present application, the start time is a data query time of each data model object, the end time is an aggregation time of each data model object, and step S203 sets a dynamic data window range based on the start time of each data model object entering a data stream and the end time of each data model object in the data stream, which may be implemented by the steps shown in fig. 5:
in step S2031, a set predetermined window and a delay window are obtained.
Wherein the delay window is the maximum delay time allowed by the data in the data stream.
In the embodiments of the present application, a window WindowTime (w) is predetermined it ) And a delay window DelayTime (d) et ) Can be flexibly set according to actual requirements. Wherein w it For securing reference range data, d et A window of data for guaranteeing delay reach.
Illustratively, set WindowTime (w) it ) 1 minute, delayTime (d) et ) Indicating the window allowed maximum delay time, also set to 1 minute.
Step S2032, the time range covered by the minimum time node to the maximum time node in the data query time and the aggregation time is used as a dynamic adjustment window.
In the embodiment of the present application, dynamic window time DynamicTime (d) yt ) Is the dynamic window range, and d yt By dual time dimension [ w ] t ,a t ]And (5) limiting. Wherein, the data query time is the data acquisition time w t Record the start time of each data entering the data stream, aggregation time a t D is determined by double dimensionality as the end time of each data in the data stream yt The problem of inaccurate data range in a single time dimension can be avoided, and the double time dimension limitation has higher accuracy.
Wherein the minimum time node is d min The maximum time node is d max Then d is yt =len[d min ,d max ]。
Here, the acquisition time sequence based on the aforementioned determination
Figure BDA0003898560260000123
And time series of polymerization
Figure BDA0003898560260000124
As shown in fig. 6, on the intersecting time axis, the minimum time node to the maximum time node may be regarded as a dynamic window range, that is, a time range defined by the double time dimensions. Time series w of acquisitions t And polymerization time series a t Combining to obtain dynamic window time sequence group d t =[t1,t2,t3,t4,t1′,t2′,t3′,t4′]And taking a union of the sequences, d yt =len[d min ,d max ]。
Step S2033, setting a dynamic data window range based on the predetermined window, the delay window, and the dynamic adjustment window.
In the embodiment of the present application, the dynamic data window range WindowScope (w) s ) The determination is based on a predetermined window, a delay window, and a dynamic adjustment window.
Exemplary, w s Can be determined by the following calculation formula (2),
w s =w it +d yt +d et formula (2)
In the embodiment of the application, the w is dynamically set by adopting multiple dimensions s By the method, the accuracy of frame selection of the streaming data range is improved
In some embodiments of the present application, before the time range covered by the minimum time node to the maximum time node in the data query time and the aggregation time is used as the dynamic adjustment window in step S2032, the minimum time node to the maximum time node may be determined through the following steps:
and retrieving the minimum time node and the maximum time node in the data query time and the aggregation time based on the sparse table.
In the embodiment of the application, the acquisition time sequence is obtained
Figure BDA0003898560260000131
And a polymerization time series
Figure BDA0003898560260000132
In case of (2), the time series w will be acquired t And polymerization time series a t Combining to obtain the dynamic window time sequence group d t =[t1,t2,t3,t4,t1′,t2′,t3′,t4′]And in the process of taking the union of the sequences, the minimum time node d in the data query time and the aggregation time can be searched out based on the sparse table min And a maximum time node d max . Here, the maximum value of the length of the dual time dimension range can be determined by finding the maximum value in the dynamic window time series by using a sparse table multiplication method.
In one achievable most-valued decision scenario, in the pre-processing stage, for array t, t [ i [ ]][j]Where i denotes the left end point and j denotes 2 j A length, i.e. in d t [i]2 continuous as starting point j The maximum value of the number. The number of elements is 2 j So that the part is divided into two parts from the middle, and the number of each part is 2 j-1 Thus t [ i, j ]]Is represented by the interval [ i, i +2 j-1 ]The maximum value within the range.
Here, the dynamic window time series d determined as described above t For illustration purposes: d is a radical of t =[20220401080100,20220401080200,20220401080300,20220401080400,20220401080400,20220401080400,20220401080500,20220401080400]
Wherein, t 1][0]Represents the 1 st number and has a length of 2 0 The maximum value of =1, which is the first value 20220401080100.
t[1][1]Represents the 1 st number and has a length of 2 1 Maximum value of =2, t [1, 1]]=max(20220401080100,20220401080200)=20220401080200。
t[1][2]Represents the 1 st number and has a length of 2 2 (ii) a maximum value of =4,t[1,2]=max(20220401080100,20220401080200,20220401080300,20220401080400)=20220401080400。
……
t [ i ] [0] represents the maximum value of 1 continuous point from i, namely [ i, i ];
t [ i ] [1] represents the maximum value of 2 continuous points from i, namely the maximum value in [ i, i +1 ];
t [ i ] [2] represents the maximum value of 4 consecutive points from i, i is the maximum value in [ i, i +1, i +2, i +3 ];
t [ i ] [3] represents the maximum value of 8 consecutive points from i, i.e., the maximum value in [ i, i +1, i +2, i +3, \ 8230;, i +7 ];
the state transition equation is expressed as the following calculation formula (3),
t[i,j]=max(t[i,j-1],t[i+2 j-1 ,j-1]) Formula (3)
Each t [ i ] in the above pre-processing][j]All have a section length of 2 j The most significant value of the interval. Referring to FIG. 7, assume that the interval to be queried is [ l, r]. The union of two sub-intervals is found to contain the whole query interval, and the two sub-intervals have the same length. To ensure that the two cells can contain the whole large interval, the length of a single cell is not less than half of the length of the query interval, so as to realize the complete coverage of the length of the query interval. Secondly, and the interval length is a power of 2, the inter-cell length cannot cover the large interval, but twice this inter-cell covers the large interval.
The interval maximum is queried as follows: the query interval is [ l, r ]]The interval length is r-l +1, and satisfies 2 k ≤r-l+1<2 k+1 K = log can be taken 2 (r-l + 1) is rounded down, then the maximum time node d is determined by equation (4) max
d max =max(t[l,k],t[r-2 k +1][k]) Formula (4)
E.g. d t [20220401080100,20220401080200,20220401080300,20220401080400,20220401080400,20220401080400,20220401080500,20220401080400]Taking the interval [1,8 ]]Medium maximum value, i.e. k = log 2 (8-1 +1) =3, then: max (t 1, 3)],t[8-2 3 +1][3]) The maximum value is 20220401080500.
In the same way, the method for preparing the composite material,equation (4) the minimum d is calculated by min () in reverse min To 20220401080100, gives the sequence pair d yt =len[20220401080100,20220401080500]. Finally calculating to obtain dynamic adjusting window d of ABCD time sequence yt Is length 4.
Thus, the dynamic data window range w s The size is as follows: 1+4+ 1=.
In some embodiments of the present application, the expected data is a configuration threshold of the distributed application server node, and step S204 processes the aggregated data sequence pair in the data stream based on the set dynamic data window range to generate a detection result for the queried data, which may be implemented by the following steps:
first, the aggregated data series pairs within the dynamic data window are operated on to screen out each color block and determine the data threshold within each color block.
Second, a feature quantity value for each color patch is determined in the aggregated data sequence over the dynamic data window.
Wherein the eigenvalue values are indicative of the number of data of the same eigenvalue.
Finally, a detection result is generated based on the data threshold value in each color block, the characteristic quantity value of each color block and the configuration threshold value.
FIG. 8 is a diagram of a dynamic data window range w for data task processing via a trigger (WindowTrigger) in an achievable streaming window setting scenario s Aggregation of data sequence pairs a within range h And (5) performing operation, screening each dyeing block and calculating a data threshold value in the block. Here, since the foregoing has already been directed to a abd ,a c The block data is identified as common characteristic data, and thus, only the aggregate data sequence pair a is required h The characteristic quantity value of each block is counted, and the DCN node quantity value is dynamically inquired in a correlated manner according to the characteristic quantity value. And further, judging whether the expected value is met, and triggering an event mode to match the EventMode notification data detection result according to different expected results.
In one achievable data detection scenario,ABD as described above abd In the sequence, the threshold of data in the block is 3, corresponding to table type t a Order, dcn node list is [ AA0, AJ0, AI0]. C is in a c In the sequence, the threshold of the data in the block is 1, corresponding to the table type t a Order, dcn node list is [ AK 0]]。
Further, according to the obtained DCN node configuration threshold number 4, it can be known that C is in a c Data threshold 1 in sequence is less than ABD at a abd And (4) judging that data difference occurs in the C node by using a data threshold value 3 in the sequence. Therefore, in the ABCD data set, the ABD data at the dcn nodes AA0, AJ0, and AI0 can be considered to conform to the same characteristics, and there is no data difference. And C, when the node AK0 has data variation, triggering event mode processing.
The embodiment of the application detects the data set condition through the real-time data sentinel and simultaneously processes various data source data sets. Data aggregation guarantees streaming computing timeliness. And determining a dynamic window calculation range through double time dimensions to improve the data accuracy rate, if the data sequence pair calculated by the streaming window is inconsistent with the expectation, notifying a trigger event mechanism, finding out abnormal data on the line, reducing a service influence surface and improving the fault tolerance rate of the system. The method adopts the streaming calculation, analyzes the large-scale flow data in real time in the constantly changing motion process, aggregates possibly useful information, and can also send the result to the next calculation node.
Continuing with the exemplary structure of the data detection device 154 provided by the embodiments of the present application implemented as a software module, in some embodiments, as shown in fig. 1, the software module stored in the data detection device 154 of the memory 150 may be a data detection device in the terminal 100, including:
the preprocessing module 1541 is configured to query data of the distributed application server node, and format the queried data to obtain a data model object that conforms to a predefined data model;
the aggregation module 1542 is configured to distribute the data model objects to corresponding channels for aggregation, to obtain an aggregated data sequence pair, and collect each data model object associated with the aggregated data sequence pair into a data stream;
a window setting module 1543, configured to set a dynamic data window range based on a start time of each data model object entering the data stream and a tail time of each data model object in the data stream;
a detection result matching module 1544, configured to process the aggregated data sequence pair in the data stream based on the set dynamic data window range, and generate a detection result for the queried data; wherein, the detection result comprises whether the inquired data is consistent with the expected data.
In some embodiments of the present application, the preprocessing module 1541 is configured to obtain a data sequence pair of a data model object; wherein, the data sequence pair comprises a data entity characteristic sequence and a data information characteristic sequence;
a preprocessing module 1541, configured to determine a queue corresponding to a feature value included in a data entity feature sequence;
and an aggregation module 1542, configured to distribute the data model object to the corresponding channel for aggregation by using the determined at least one queue, so as to obtain an aggregated data sequence pair.
In some embodiments of the present application, the preprocessing module 1541 is configured to obtain a data length, a multiplication coefficient, a modulus coefficient, and a coding standard value mapped by a character string included in the data model object; determining the characteristic value of the character string based on the product coefficient, the modulus coefficient, the data length and the coding standard value mapped by the character string, and taking the characteristic values of all the character strings contained in the data model object as the data entity characteristic sequence of the data model object; acquiring a source database region, a data source data table and data query time of the data model object, and taking the source database region, the data source data table and the data query time as a data information characteristic sequence of the data model object; and forming a data sequence pair of the data model object based on the data entity characteristic sequence and the data information characteristic sequence.
In some embodiments of the present application, the same feature value corresponds to a queue, and the aggregating module 1542 is configured to create a new temporary metric block for the same data entity feature sequence, and mark the aggregation time sequence and the same block color for the temporary metric block;
the aggregation module 1542 is configured to extract a query time sequence, a source database region, and a data source data table from the data entity feature sequence of the data model object;
the aggregation module 1542 is configured to obtain an aggregated data sequence pair based on the feature value, the source database region, the data source data table, the query time sequence, the block color, and the aggregation time sequence.
In some embodiments of the present application, the start time is a data query time of each data model object, the end time is an aggregation time of each data model object, and the detection result matching module 1544 is configured to obtain a preset window and a delay window; taking the time range covered from the minimum time node to the maximum time node in the data query time and the aggregation time as a dynamic adjustment window; a dynamic data window range is set based on the predetermined window, the delay window, and the dynamic adjustment window.
In some embodiments of the present application, the detection result matching module 1544 is configured to retrieve a minimum time node and a maximum time node from the data query time and the aggregation time based on the sparse table.
In some embodiments of the present application, the expected data is a configuration threshold of the distributed application server node, and based on the set dynamic data window range, the detection result matching module 1544 is configured to perform an operation on the aggregated data sequence pair in the dynamic data window range to screen out each color block and determine a data threshold in each color block; determining a characteristic quantity value of each color block in an aggregated data sequence within a dynamic data window; wherein the eigenvalue values represent the number of data of the same eigenvalue; and generating a detection result based on the data threshold value in each color block, the characteristic quantity value of each color block and the configuration threshold value.
The data detection device provided by the application queries data of distributed application server nodes and formats the queried data to obtain a data model object which accords with a predefined data model; that is to say, the data detection method provided by the application can process data of distributed application server nodes, namely data of different data sources, format the data of the different data sources, and support cross-domain and cross-library data detection; further, the data model objects are distributed to corresponding channels for aggregation to obtain aggregated data sequence pairs, and the data model objects associated with the aggregated data sequence pairs are collected into a data stream; setting a dynamic data window range based on the starting time of each data model object entering the data stream and the tail end time of each data model object in the data stream; processing the aggregated data sequence pair in the data stream based on the set dynamic data window range to generate a detection result aiming at the inquired data; wherein, the detection result comprises whether the inquired data is consistent with the expected data; the data detection method provided by the application achieves the purposes of supporting cross-domain data consistency detection and improving detection efficiency under the condition of cross-domain and cross-database types.
It should be noted that the description of the apparatus in the embodiment of the present application is similar to the description of the method embodiment, and has similar beneficial effects to the method embodiment, and therefore, the description is not repeated. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the present application for understanding.
Embodiments of the present application provide a storage medium having stored therein executable instructions, which when executed by a processor, will cause the processor to perform a method provided by embodiments of the present application, for example, the method as shown in fig. 2.
The computer-readable storage medium provided by the application queries data of distributed application server nodes and formats the queried data to obtain a data model object conforming to a predefined data model; that is to say, the data detection method provided by the application can process data of distributed application server nodes, namely data of different data sources, format the data of the different data sources, and support cross-domain and cross-library data detection; further, the data model objects are distributed to corresponding channels for aggregation to obtain aggregated data sequence pairs, and the data model objects associated with the aggregated data sequence pairs are collected into a data stream; setting a dynamic data window range based on the starting time of each data model object entering the data stream and the tail end time of each data model object in the data stream; processing the aggregated data sequence pair in the data stream based on the set dynamic data window range to generate a detection result aiming at the inquired data; the detection result comprises whether the inquired data is consistent with the expected data or not; the data detection method provided by the application achieves the purposes of supporting cross-domain data consistency detection and improving detection efficiency under the condition of cross-domain and cross-database types.
In some embodiments, the storage medium may be a computer-readable storage medium, such as a Ferroelectric Random Access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a charged Erasable Programmable Read Only Memory (EEPROM), a flash Memory, a magnetic surface Memory, an optical disc, or a Compact disc Read Only Memory (CD-ROM), among other memories; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may, but need not, correspond to files in a file system, and may also be stored in portions of files that hold other programs or data, such as in one or more scripts in a hypertext markup language (HyperText markup language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (10)

1. A method of data detection, the method comprising:
inquiring data of the distributed application server nodes, and formatting the inquired data to obtain a data model object which accords with a predefined data model;
distributing the data model objects to corresponding channels for aggregation to obtain aggregated data sequence pairs, and collecting the data model objects associated with the aggregated data sequence pairs into a data stream;
setting a dynamic data window range based on a start time of each data model object entering the data stream and an end time of each data model object in the data stream;
processing the aggregation data sequence pair in the data stream based on the set dynamic data window range to generate a detection result aiming at the inquired data; wherein the detection result comprises whether the inquired data is consistent with expected data.
2. The method according to claim 1, wherein the splitting the data model objects into corresponding channels for aggregation to obtain an aggregated data sequence pair includes:
obtaining a data sequence pair of the data model object; wherein the data sequence pair comprises a data entity characteristic sequence and a data information characteristic sequence;
determining a queue corresponding to a characteristic value contained in the data entity characteristic sequence;
and shunting the data model objects to corresponding channels for aggregation by using the determined at least one queue to obtain the aggregated data sequence pair.
3. The method of claim 2, wherein obtaining the pair of data sequences of the data model object comprises:
obtaining the data length, the product coefficient, the modulus coefficient and the coding standard value of the character string mapping contained in the data model object;
determining the characteristic value of the character string based on the multiplication coefficient, the modulus coefficient, the data length and the coding standard value mapped by the character string, and taking the characteristic values of all the character strings contained in the data model object as the data entity characteristic sequence of the data model object;
obtaining a source database region, a data source data table and data query time of the data model object, and taking the source database region, the data source data table and the data query time as the data information characteristic sequence of the data model object;
and forming a data sequence pair of the data model object based on the data entity characteristic sequence and the data information characteristic sequence.
4. The method according to claim 2, wherein the same feature value corresponds to a queue, and the splitting of the data model object into corresponding channels for aggregation with the determined at least one queue to obtain the aggregated data sequence pair includes:
creating a new temporary measurement block for the same data entity feature sequence, and marking the temporary measurement block with an aggregation time sequence and the same block color;
extracting a query time sequence, a source database region and a data source data table from the data entity characteristic sequence of the data model object;
and obtaining the aggregation data sequence pair based on the characteristic value, the source database area, the data source data table, the query time sequence, the block color and the aggregation time sequence.
5. The method of any of claims 1 to 4, wherein the starting time is a data query time of the respective data model objects, the ending time is an aggregation time of the respective data model objects, and the setting of the dynamic data window range based on the starting time of the respective data model objects into the data stream and the ending time of the respective data model objects in the data stream comprises:
obtaining a preset window and a delay window;
taking the time range covered by the minimum time node to the maximum time node in the data query time and the aggregation time as a dynamic adjustment window;
setting the dynamic data window range based on the predetermined window, the deferral window, and the dynamic adjustment window.
6. The method of claim 5, wherein before the time range from the minimum time node to the maximum time node included in the data query time and the aggregation time is used as a dynamic adjustment window, the method further comprises:
retrieving the minimum time node and the maximum time node of the data query time and the aggregation time based on a sparse table.
7. The method according to any one of claims 1 to 4, wherein the expected data is a configuration threshold of the distributed application server node, and the processing the aggregated data sequence pair in the data stream based on the set dynamic data window range to generate a detection result for the queried data comprises:
computing the aggregated data sequence pairs within the dynamic data window range to screen out each color block and determine a data threshold value within each color block;
determining a characteristic quantity value of each color block in an aggregation data sequence within the dynamic data window; wherein the eigenvalue magnitude characterizes the number of data of the same eigenvalue;
and generating the detection result based on the data threshold value in each color block, the characteristic quantity value of each color block and the configuration threshold value.
8. A data detection apparatus, characterized in that the apparatus comprises:
the preprocessing module is used for inquiring data of the distributed application server nodes and formatting the inquired data to obtain a data model object which accords with a predefined data model;
the aggregation module is used for shunting the data model objects to corresponding channels for aggregation to obtain an aggregated data sequence pair, and collecting each data model object associated with the aggregated data sequence pair into a data stream;
the window setting module is used for setting a dynamic data window range based on the starting time of each data model object entering the data stream and the tail end time of each data model object in the data stream;
a detection result matching module, configured to process the aggregated data sequence pair in the data stream based on a set dynamic data window range, and generate a detection result for the queried data; wherein the detection result comprises whether the inquired data is consistent with expected data.
9. A data detection apparatus, comprising:
a memory for storing executable instructions; a processor for implementing the method of any one of claims 1 to 7 when executing executable instructions stored in the memory.
10. A computer-readable storage medium having stored thereon executable instructions for causing a processor, when executed, to implement the method of any one of claims 1 to 7.
CN202211281984.6A 2022-10-19 2022-10-19 Data detection method, device, equipment and storage medium Pending CN115544092A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211281984.6A CN115544092A (en) 2022-10-19 2022-10-19 Data detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211281984.6A CN115544092A (en) 2022-10-19 2022-10-19 Data detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115544092A true CN115544092A (en) 2022-12-30

Family

ID=84736266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211281984.6A Pending CN115544092A (en) 2022-10-19 2022-10-19 Data detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115544092A (en)

Similar Documents

Publication Publication Date Title
US11055287B2 (en) Eigenvalue-based data query
US11977545B2 (en) Generation of an optimized query plan in a database system
WO2014015488A1 (en) Method and apparatus for data storage and query
CN107391770B (en) Method, device and equipment for processing data and storage medium
US11748264B1 (en) Approximate unique count
CN111506578A (en) Service data checking method, device, equipment and storage medium
WO2020140662A1 (en) Data table filling method, apparatus, computer device, and storage medium
CN110019205B (en) Data storage and restoration method and device and computer equipment
CN110569289A (en) Column data processing method, equipment and medium based on big data
WO2022199400A1 (en) Method and apparatus for retrieving persistent memory file system metadata, and storage structure
CN114968914A (en) Electronic archive management method and device, computer equipment and storage medium
US20240004858A1 (en) Implementing different secondary indexing schemes for different segments stored via a database system
GB2514779A (en) Information retrieval from a database system
CN103514284A (en) Data display system and data display method
CN115544092A (en) Data detection method, device, equipment and storage medium
WO2017157038A1 (en) Data processing method, apparatus and equipment
WO2022253131A1 (en) Data parsing method and apparatus, computer device, and storage medium
CN116340337A (en) Database-independent SQL sentence generation method
CN109522915B (en) Virus file clustering method and device and readable medium
CN113010488B (en) Data acquisition method, device, equipment and storage medium
US20210240670A1 (en) Efficient database query evaluation
US20220358129A1 (en) Visualization Data Reuse In A Data Analysis System
CN113806504A (en) Multi-dimensional report data calculation method and device and computer equipment
US20230252016A1 (en) Compacted Table Data Files Validation
Diván et al. Articulating heterogeneous data streams with the attribute-relation file format

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination