CN114428711A - Data detection method, device, equipment and storage medium - Google Patents

Data detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN114428711A
CN114428711A CN202210087939.0A CN202210087939A CN114428711A CN 114428711 A CN114428711 A CN 114428711A CN 202210087939 A CN202210087939 A CN 202210087939A CN 114428711 A CN114428711 A CN 114428711A
Authority
CN
China
Prior art keywords
data
information
target
determining
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210087939.0A
Other languages
Chinese (zh)
Inventor
宁智贤
刘桐仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210087939.0A priority Critical patent/CN114428711A/en
Publication of CN114428711A publication Critical patent/CN114428711A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The disclosure provides a data detection method, a data detection device, data detection equipment and a storage medium, and relates to the technical field of computers, in particular to the technical field of data processing. The specific technical scheme comprises the following steps: acquiring the recording information of target data in a target system; the record information comprises first record information corresponding to the target data at a starting node of the target system and/or second record information corresponding to the target data at a terminating node of the target system; determining a data state of the target data based on the recording information; and determining effective state information of the target data in the target system based on the data state. The technical scheme of the disclosure can reduce the invasion to the target system, can realize the detection of single data, and improve the detection fineness.

Description

Data detection method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data detection method, apparatus, device, and storage medium.
Background
The content validation systems such as the search engine and the recommendation engine have high requirements on the reliability and timeliness of data validation, and it is very important to construct a set of observation systems for observing the data stream (or called data validation stream) of the content validation system. The existing data flow observation system has high invasion to a content validation system, high access cost and low observation granularity, and is difficult to meet the actual requirement.
Disclosure of Invention
The disclosure provides a data detection method, a device, equipment and a storage medium.
According to a first aspect of the present disclosure, there is provided a data detection method, including:
acquiring the recording information of target data in a target system; the record information comprises first record information corresponding to the target data at a starting node of the target system and/or second record information corresponding to the target data at a terminating node of the target system;
determining a data state of the target data based on the recording information;
and determining effective state information of the target data in the target system based on the data state.
According to a second aspect of the present disclosure, there is provided a data detection apparatus comprising:
the first information acquisition module is used for acquiring the recording information of the target data in the target system; the record information comprises first record information corresponding to the target data at a starting node of the target system and/or second record information corresponding to the target data at a terminating node of the target system;
the data state determining module is used for determining the data state of the target data based on the recording information;
and the first effective state determining module is used for determining effective state information of the target data in the target system based on the data state.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the data detection method provided by any embodiment of the disclosure.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to execute the data detection method provided by any of the embodiments of the present disclosure.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the data detection method provided by any of the embodiments of the present disclosure.
The technical scheme of the present disclosure can at least realize the following beneficial effects:
the data state of the target data is determined based on the recording information of the target data in the target system, the whole process can be processed asynchronously to reduce the influence on the performance of the target system, the invasion to the target system is reduced, meanwhile, the detection of single data can be realized, and the detection fineness is improved. .
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic diagram of a structural framework of a content validation system and a principle that a data detection device acquires information of the content validation system according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a data detection method provided by an embodiment of the present disclosure;
FIG. 3 is a partial flow diagram of another data detection method provided by an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating a data detection method provided by an embodiment of the present disclosure;
FIG. 5 is a partial flow chart of another data detection method provided by the embodiments of the present disclosure;
FIG. 6 is a schematic diagram illustrating an inversion of a topological relationship between processing nodes in an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a calculation of a data aging indicator in an embodiment of the disclosure;
FIG. 8 is a schematic structural framework diagram of a data detection apparatus provided in an embodiment of the present disclosure;
FIG. 9 is a block diagram of a data detection system according to an embodiment of the present disclosure;
fig. 10 is a schematic structural framework diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
In the description of the embodiments of the present disclosure, it should be understood that the terms "first", "second", and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit indication of the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present disclosure, "a plurality" means two or more unless otherwise specified.
It should be further understood that the term "and/or" as used in connection with embodiments of the present disclosure includes all or any and all combinations of one or more of the associated listed items.
It will be understood by those of ordinary skill in the art that, unless otherwise defined, all terms (including technical and scientific terms) used in the embodiments of the present disclosure have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
First, a structural framework of a content validation system (or called data validation system, validation system) related to the present disclosure is described, where the content validation system generally includes a plurality of processing modules for processing data input into the system, and adjacent processing modules are connected by a storage structure capable of data transmission, such as a message queue, more specifically, an apache kafka message queue, or by other means, and the structural framework is shown as a part in a dashed box in fig. 1, where a module a, a module b, a module c, a module d, a module e, and a module f in fig. 1 are all processing modules, and connection arrows between the processing modules represent the message queue. Each processing module may include a plurality of processing instances for processing data input into the processing module, and adjacent processing instances are usually connected by a storage structure capable of implementing data transmission, and the type of the storage structure may be the same as or different from that of the storage structure between the processing modules.
The inventors of the present disclosure have found in their research that with the rapid growth of content business demand, the timeliness and reliability requirements for data validation of content validation systems are increasing. On one hand, most of the processing modules of the content validation system are in asynchronous communication through message queues, and the reliability of content validation needs to be observed in real time. On the other hand, most content validation systems support users to write customized offline processing operators and topological relations thereof, and the customized topology also needs to be analyzed in real time. At this time, it is difficult for the conventional observation scheme to sufficiently observe the situation of the data stream, and specifically, the conventional observation scheme mainly includes the following:
first, data flow is tracked (trace) by recording span (call interval) information of a call chain, and a module supporting OpenTracing protocol (an open distributed trace specification) such as skywalk (distributed trace system) is used for observation. Second, through a message queue between secondary consumption modules, meta information (meta information) of data is written into a log storage system, such as an elastic search, for analysis and query. And thirdly, acquiring parameter (metric) information such as Query Per Second (QPS), delay, message queue accumulation and the like among processing modules, and indirectly observing the health degree of the data stream.
The above observation scheme has the following drawbacks:
the solution has a poor support effect on the content validation system, and specifically, the observation system of the solution one has a high invasiveness on the content validation system and a high access cost. The scheme cannot realize observation of the processing condition of the data by the processing module and further positions the bottleneck module, the observation can be carried out only when the data is output to the downstream message queue, the abnormity can be found and the bottleneck module is positioned only when a user carries out inquiry spontaneously under the condition that the data is not output to the downstream message queue due to various abnormalities, and in the scene with huge data flow, in order to save log storage resources, most data flow tracking adopts a mode of additionally writing instead of in-situ updating, so that the observation difficulty is further increased. The observation of the data flow of the third scheme is only limited to the granularity of the module level, and the data flow cannot be refined to a single piece of data, and meanwhile, observation indexes such as query rate per second, delay, message queue accumulation and the like used by the third scheme cannot accurately find abnormal data flow in some special but important scenes, for example, in a data effective scene with small data flow but high requirement on timeliness, the accumulation degree of the message queues among the processing modules cannot accurately reflect the health degree (reliability, timeliness and the like) of the data flow.
The following describes the technical solutions of the present disclosure and how to solve the above technical problems in detail with specific embodiments.
According to an embodiment of the present disclosure, the present disclosure provides a data detection method, which may be applied to a data detection apparatus, as shown in fig. 2, the method including:
s201, acquiring the record information of the target data in the target system.
The record information comprises first record information corresponding to the target data at a starting node of the target system and/or second record information corresponding to the target data at a terminating node of the target system.
S202, the data state of the target data is determined based on the recording information.
S203, determining effective state information of the target data in the target system based on the data state.
According to the data detection method provided by the disclosure, the data state of the target data can be determined based on the recorded information of the target data in the target system, the whole process can be processed asynchronously to reduce the influence on the performance of the target system, the invasion to the target system is reduced, meanwhile, the detection of single data can be realized, and the detection fineness is improved.
The target system may be the content validation system shown in fig. 1, the start node may be a first processing module (e.g., module a shown in fig. 1) or a first processing instance in the content validation system, the stop module may be a last processing module (e.g., module f shown in fig. 1) or a last processing instance in the content validation system, the first record information corresponding to the start node may be a storage structure between the first processing module and the second processing module or a storage structure between the first processing instance and the second processing instance, the second record information corresponding to the stop node may be a storage structure between the penultimate processing module and the last processing module or a storage structure between the penultimate processing instance and the last processing instance, the storage structure may be a message queue, such as an apache kafka message queue, but is not limited thereto. Referring to the example of fig. 1, the data detection apparatus may acquire first recording information corresponding to a start node and/or second recording information corresponding to an end node in the content validation system.
Referring to the example of fig. 1, for the same data stream in the target system, when data in the data stream is stored in a location corresponding to the start module (corresponding to the start module shown in fig. 1), the data may be used as target data to obtain first record information of the target data, and when the target data corresponding to the start node is transmitted to a location corresponding to the end node (corresponding to the end node shown in fig. 1), second record information of the target data may be obtained.
The record information may include at least one type of meta information (meta information) such as identification information (logic or Id) of the target data, time information (for example, processing time of the target data processed in the target system), tag information (tag, which may also be referred to as label information), and the like, where a space occupied by the meta information is 2 to 4 orders of magnitude smaller than a space occupied by a complete data packet, so that occupation of resources may be greatly saved, and in an actual test, the space occupied by the meta information is one thousandth of the space occupied by the complete data packet.
In the embodiment of the present disclosure, the target data may be data at a module level, or may be single data, so as to implement detection on different fine-grained data.
In an alternative embodiment, as shown in fig. 3, in the step S202, determining the data state of the target data based on the recording information includes the following steps S301 to S303:
s301, distributing target arithmetic units for the target data based on the identification information of the target data.
Optionally, as shown in fig. 4, when the target arithmetic unit is allocated to the target data, the target arithmetic unit may be implemented by a distributed storage structure, for example, by a distributed message queue, or may be implemented by another storage structure.
In one example, the record information corresponding to each data stream in the content validation system is stored in a distributed message queue, the record information corresponding to each data stream is stored in a partition (partial) in the distributed message queue, each partition can transmit data in a FIFO (First Input First Output) manner, one target operation unit can be allocated for one or a specified number of multiple partitions, and then the record information in each partition is distributed to the corresponding target operation unit, so that the target operation unit processes the record information in one partition or the specified number of multiple partitions. Wherein one target arithmetic unit may be one processing thread.
In one example, the target operation unit may automatically acquire the record information in one or more corresponding segments in the distributed storage structure through an automatic subscription mechanism, thereby implementing data consumption, status query, and other processing on the specified segments.
By the method, the recorded information of the same data stream can be processed by the same target arithmetic unit, error processing possibly caused by cross processing of a plurality of data streams is avoided, and the processing accuracy is improved. In addition, the distributed storage structure has high expandability, corresponding target operation units are correspondingly distributed according to the fragments and data transmission and processing are carried out, and the expandability of the target operation units can be improved.
And S302, storing the record information in a first storage structure of the target arithmetic unit based on the identification information.
The storage of the record information is performed based on the identification information, so that the convenience of subsequent query can be improved, and the corresponding record information can be queried based on the identification information.
Optionally, the target arithmetic unit may include a first storage structure, for example, a map structure (a set of key value pairs), and when distributing the record information to the target arithmetic unit through the distributed message queue, for the target data in the same data stream, the identification information (which may be included in the record information) of the target data may be used as a key (key), and the record information stored in the same fragment may be used as a value (value), so as to form the key value pair stored in the map structure, and the storage manner of the key value pair may improve the subsequent query speed and may support a high-concurrency use scenario.
S303, determining a data state of the target data based on the record information stored in the first storage structure by using the target arithmetic unit.
The recording information can reflect the state of the data corresponding to the starting module and the ending node of the same data stream, and further can reflect the state of the whole data stream.
In another alternative embodiment, as shown in fig. 5, determining the data status of the target data based on the recording information includes the following steps S501 to S503:
s501, distributing target operation units for the target data based on the identification information of the target data.
The specific implementation of step S501 may refer to the related content of step S301, and is not described herein again.
And S502, storing the record information in a first storage structure of the target operation unit based on the identification information.
The detailed implementation of step S502 can refer to the related content of step S302, and is not described herein again.
S503, acquiring the attribute information of the target data in the target system.
The recording information may include at least one meta information (meta information) such as identification information (registered or Id), time information, tag information (tag) of the target data.
And S504, storing the attribute information in a second storage structure in the target operation unit based on the time information in the attribute information.
Optionally, the target operation unit may further include a second storage structure, where the second storage structure may be a message queue, for example, an apache kafka message queue, and each element in the queue may store attribute information of one piece of target data, and may also be another storage structure.
In one example, attribute information is stored in a distributed storage structure (e.g., a distributed message queue having a plurality of shards) and distributed to a second storage structure through the distributed storage structure. Specifically, for each target data in the same data stream, the storage timing of the attribute information corresponding to each target data may be determined based on the time information in the attribute information corresponding to each target data, and then the attribute information stored in the same segment may be distributed to the second storage structure according to the storage timing.
The steps S503 and S504 may be executed synchronously with the steps S501 and S502 (refer to fig. 5), may be executed before or after the steps S501 and S502, and may also be executed in other sequences according to actual requirements, which is not limited in the embodiment of the present disclosure.
And S505, reading the attribute information from the second storage structure by using the target operation unit based on the reading timing of the second storage structure.
The reading time sequence of the second storage structure is consistent with the storage time sequence, and the attribute information stored in the second storage structure firstly can be read firstly.
S506, the target arithmetic unit reads the corresponding record information from the first storage structure based on the read attribute information.
Under the condition that the attribute information and the record information both comprise the positioning identification information, the corresponding record information in the first storage structure of the same target data can be quickly read based on the attribute information read in the second storage structure.
The first storage structure can also store pointers for representing the reference relationship between the positioning identification information and the elements in the second storage structure, so that the attribute information stored in the elements in the second storage structure can be conveniently referenced when needed.
S507, based on the recording information, determines the data state of the target data.
In an alternative embodiment, the determining the data state of the target data based on the recorded information includes at least one of the following four ways:
in a first mode, under the condition that the first recording information and the second recording information of the target data are determined to be acquired based on the recording information, the data state is determined to include that the data stream corresponding to the target data is a normal data stream.
In this way, both the first recording information corresponding to the start node and the second recording information corresponding to the end node are acquired, which are complete recording information from head to tail, so that it is determined that the corresponding data stream is completed normally.
In the second mode, under the condition that the second recording information of the target data is determined to be collected and the first recording information is not collected based on the recording information, the data stream corresponding to the target data in the data state is determined to be a normal data stream.
In this way, since the data corresponding to the terminating node is derived from the data corresponding to the originating node for the same data stream, under the condition that only the second recording information corresponding to the terminating node is acquired, the first recording information corresponding to the originating node can be considered to exist in the target system, and then the corresponding data stream can be considered to be completed normally.
And determining that the data stream corresponding to the target data is an abnormal data stream in the data state under the condition that the first record information is determined to be acquired, the second record information is not acquired, and the time length from the processing time of the target data corresponding to the first record information to the current time is greater than the preset time length on the basis of the record information. The processing time of the target data corresponding to the first record information may be a time when the target data is input into the content validation system.
In this manner, when only the first record information corresponding to the start node is acquired, the completion condition of the corresponding data stream cannot be directly determined due to the absence of the second record information corresponding to the end node, and therefore, the completion condition of the data stream is further determined based on the timeout condition of the acquired first record information, and when the timeout condition is reached, the data stream is considered to be not normally completed.
And determining that the data state comprises that the data stream corresponding to the target data is a normal data stream under the condition that the first record information is determined to be collected, the second record information is not collected, and the duration from the processing time of the target data corresponding to the first record information to the current time is less than or equal to the preset duration based on the record information. The specific meaning of the processing time of the target data corresponding to the first record information may refer to the third method, which is not described herein again.
In this manner, when only the first record information corresponding to the start node is acquired, the completion of the corresponding data stream cannot be directly determined due to the absence of the second record information corresponding to the end node, and therefore the completion of the data stream is further determined based on the timeout condition of the acquired first record information.
In the first to fourth modes, whether the first recording information and the second recording information are acquired or not and whether the duration from the processing time corresponding to the acquired recording information to the current time is overtime or not can be determined according to the specific content of the recording information, and the data state of the target data can be determined quickly and accurately by combining the acquisition condition and the overtime of the recording information.
In an optional implementation manner, the data detection method provided by the present disclosure may further include: based on the record information, it is determined whether to store the consumption progress in the distributed storage structure.
Subsequent consumption requirements can be determined based on the record information, and corresponding storage or non-storage operation can be carried out on the consumption progress based on the consumption requirements so as to meet the subsequent consumption requirements. After the consumption progress is stored, the recording information and/or the attribute information in each fragment of the distributed storage structure before the progress cannot be consumed again, and the overall consumption efficiency and the fault tolerance can be improved.
Optionally, determining whether to store the consumption progress in the distributed storage structure based on the record information includes: under the conditions of the first mode and the third mode, the consumption progress of the corresponding fragments in the distributed storage structure is stored; in the second mode, when the duration from the processing time of the target data corresponding to the second recording information to the current time is greater than the preset duration, the consumption progress of the corresponding fragments in the distributed storage structure is stored, and when the duration from the processing time of the target data corresponding to the second recording information to the current time is less than or equal to the preset duration, the consumption progress of the corresponding fragments in the distributed storage structure is not stored; under the condition of the fourth mode, the consumption progress of the corresponding fragment in the distributed storage structure is not stored. Wherein, the processing time of the target data corresponding to the second record information may be the time when the target data is output to the target system.
Under the condition of the first mode, the first record information corresponding to the starting node and the second record information corresponding to the terminating node are collected, are complete record information from beginning to end, are in a determined state, and are not required to consume the consumed data (record information and/or attribute information) again, so that the consumption progress is stored, and the consumed data before the progress in the fragment is not consumed repeatedly.
In the case of the second method, the situation that only the second record information corresponding to the terminating node is collected is usually caused by data disorder, and it is necessary to further determine the subsequent consumption demand in the case of timeout according to the processing time of the target data corresponding to the second record information. Under the condition of overtime, for the same data stream, the data corresponding to the terminating node of the target system is from the data corresponding to the starting node, and under the condition of acquiring the second record information, the corresponding first record information can be determined to exist in the target system, so that the overtime state is taken as a determined state, the consumed data does not need to be consumed again, and the consumption progress is stored, so that the consumed data before the progress in the fragment is not repeatedly consumed; if the consumption progress is stored before the first record information is stored in the fragment, the data state of the target data may be erroneously recognized as an abnormal state, and a false determination may be made.
In the third manner, only the first record information corresponding to the start node is acquired, and the processing time of the target data corresponding to the first record information is overtime, that is, the second record information corresponding to the end node is still missing until overtime, in order to control the resource occupied by the record information, the missing first record information is not acquired after overtime, and the consumed data does not need to be consumed again, so that the consumption progress is stored, and the consumed data before the progress in the fragment is not consumed repeatedly.
Under the condition of the fourth mode, only the first record information corresponding to the start node is acquired, because the second record information corresponding to the end node is missing, and the processing time of the target data corresponding to the first record information is not overtime, the missing second record information may be stored in the fragment in a delayed manner, if the consumption progress is stored before the second record information is stored in the fragment, the data state of the target data may be mistakenly identified as an abnormal state, and a false judgment is generated, so that the consumption progress is not stored under the condition of not overtime, so as to avoid the false judgment; in addition, under the condition that the data detection device is restarted after being crashed, the data in the fragments needs to be consumed again, the consumption progress is not stored, and consumption abnormity after being crashed and restarted can be avoided.
Optionally, in the first to fourth modes, the operation of determining whether the first record information and the second record information are collected, determining whether the processing time of the target data corresponding to the collected record information is overtime, and determining whether the data state and the consumption progress in the distributed storage structure are stored according to the determination result may be performed multiple times. In one example, this operation may be performed periodically.
Optionally, in a case of storing the consumption progress of the distributed storage structure, the data stored in the first storage structure and the second storage structure in the target operation unit may be cleared, so as to release the resources of the first storage structure and the second storage structure.
Optionally, in the case of storing the consumption progress in the distributed storage structure, delay information of the corresponding data stream may also be stored and displayed, where the delay information of the data stream includes a difference between a processing time of the target data corresponding to the terminating node and a processing time of the target data corresponding to the starting node in the data stream.
In an optional implementation manner, in step S203, determining effective state information of the target data in the target system based on the data state includes:
determining that the effective state information comprises that the target data is in an abnormal effective state under the condition that the data state comprises that the data stream corresponding to the target data is an abnormal data stream; and under the condition that the data state includes that the data stream corresponding to the target data is a normal data stream, determining that the effective state information includes that the target data is in a normal effective state.
Based on the data state, effective state information which accurately reflects the effective state of the target data in the target system can be obtained.
In another optional implementation manner, referring to fig. 4, in step S203, determining effective state information of the target data in the target system based on the data state includes determining a bottleneck node based on an exception log, which is specifically as follows:
acquiring an exception log of each processing node (or processing operator) in the target system under the condition that the data state includes that the data stream corresponding to the target data is an exception data stream (the storage form of the exception log is shown in fig. 4); determining a bottleneck node in each processing node according to the abnormal log; determining effective state information based on the bottleneck node; the effective state information includes identification information of a bottleneck node, and the bottleneck node is a processing node which causes the data flow to be an abnormal data flow.
And under the condition that the data state comprises that the data stream corresponding to the target data is an abnormal data stream, further determining a bottleneck node causing the abnormality according to the abnormal log so as to take follow-up measures.
A processing node may be a processing module or a processing instance within a processing module. In one example, a bottleneck module can be determined in each processing module based on the exception log, and a bottleneck instance can be determined in each instance of the bottleneck module based on the exception log.
The exception log records exception information of each processing node, and according to the topological relation among the processing nodes, the last exception processing node generally has a large influence on the data stream, so that the last processing node recorded by the exception log can be used as a bottleneck node.
In an optional implementation manner, the data detection method provided by the present disclosure further includes:
acquiring data processing information of target data in a target system, and determining data aging indexes of processing nodes in the target system; determining bottleneck nodes in each processing node according to the data processing information and the data aging indexes; determining effective state information based on the bottleneck node; the effective state information comprises identification information of a bottleneck node, and the bottleneck node is a processing node of which the data flow is an abnormal data flow.
The data processing information may include hardware identification information of a processing node and a processing time for processing the target data, and the hardware identification information of the processing node and the processing time for processing the target data may be added to the data processing information each time the target data is processed by a new processing node.
The specific meaning of the bottleneck node and the specific example of determining the bottleneck node can refer to the related contents, and the details are not described herein.
The data processing information can reflect the characteristics of the target system for processing the data, the data aging index can reflect the characteristics of the delay of the data of the processing node, the mode of determining the bottleneck node according to the data processing information and the data aging index has strong pertinence to the personalized processing characteristics of the target system, and meanwhile, the bottleneck node can be quickly positioned according to the delay characteristics of the data.
Optionally, determining the data aging index of each module in the target system includes:
determining the topological relation among the processing nodes according to the data processing information; determining an upstream storage structure of each processing node according to the topological relation; for each processing node, acquiring time information of first target data and last target data in an upstream storage structure of the processing node; and determining the data aging index of the processing node according to the time information of the first target data and the last target data. The upstream storage structure is used for storing the data sent to the processing node by the last processing node.
In some embodiments, determining a topological relationship between processing nodes from data processing information includes: analyzing and clustering hardware identification information in the data processing information through a topology analysis module to obtain data topology information, and determining the topology relation among the processing nodes according to the data topology information. The data topology information may be presented in the form of a Directed Acyclic Graph (DAG).
In the process of determining the topological relation between the processing nodes according to the data topology information, the actually determined topological relation may change on the basis of the set or user-defined original topological relation, in one example, a topology inversion as shown in fig. 6 may occur, and the direction of the arrow is entirely inverted.
The topology analysis module may also determine and output tag information (tag) based on the hardware identification information and determine query rate per second and delay information between processing nodes based on processing time in the data processing information.
Target data can be input and output to an upstream storage structure according to a set time sequence, the first target data in the upstream storage structure can be the latest data input into the upstream storage structure, and the last target data can be the oldest data input into the upstream storage structure, namely the data about to be output out of the upstream storage structure. The time information of the target data may be a time when the target data is input into the target system (i.e., select time, hereinafter referred to as system time).
Fig. 7 shows the calculation principle of the data aging indicator of the processing module (specifically, module b), and referring to fig. 7, the absolute value of the difference (Δ system time shown in fig. 7) between the system time corresponding to the first target data and the system time corresponding to the last target data in the upstream storage structure of module b is determined as the data aging indicator of module b.
Optionally, the data aging indicator may be calculated based on a streaming calculation framework (e.g., a flink), where the streaming calculation framework may calculate information of high/low water levels (high/low water level) of each processing node, where the information of the high/low water levels corresponds to the system time corresponding to the first target data and the system time corresponding to the last target data, respectively, and further, the data aging indicator may be calculated by referring to the manner shown in fig. 7.
Based on the mode, the embodiment of the disclosure can analyze and determine the topological relation among the processing nodes in real time, and can realize good support for the customized topological relation; secondly, the data aging condition of each node can be rapidly and accurately determined based on the data aging index and the aging threshold value, so that the efficiency and the accuracy of locating bottleneck nodes influencing operator topology (AOV) are improved; meanwhile, the data aging index determined based on the time information can reflect the time accumulation of the upstream storage structure, and compared with the traditional quantity accumulation, the time accumulation can reflect the timeliness of the data flow; in addition, the existing mode for positioning the bottleneck node mainly detects the most downstream node causing congestion through backpressure, but due to the difference between the size of a storage structure between nodes and the flow, the influence of the bottleneck node on the timeliness of data cannot be reflected, and the serious congestion of the data flow is often caused at the moment, and the mode for positioning the bottleneck node based on the data aging index can find the node which most influences the timeliness of the data flow in the data flow before the backpressure phenomenon is generated, so that the problems are avoided.
In another optional implementation, the data detection method provided in the present disclosure further includes:
acquiring data processing information of target data in a target system; determining the topological relation among processing nodes in the target system according to the data processing information; determining whether the storage space of the upstream storage structure of each processing node is completely occupied or not according to the topological relation; for each processing node, determining the processing node as a bottleneck node when the storage space of the upstream storage structure of the processing node is completely occupied; in the case where the storage space of the upstream storage structure of the processing node is not fully occupied, it may be determined whether the processing node is a bottleneck node based on the data aging indicator of the processing node; determining effective state information based on the bottleneck node; the effective state information comprises identification information of the bottleneck node, and the bottleneck node is a processing node of which the data flow is abnormal.
The data processing information of the target data in the target system is acquired, and the topological relation between the processing nodes in the target system is determined according to the data processing information, and the specific implementation manner of the method can refer to the related contents, which is not described herein again.
The specific implementation manner of determining whether the processing node is a bottleneck node based on the data aging indicator of the processing node may refer to the foregoing related contents, and details are not described here.
Based on the mode, the occupation condition of the storage space of the upstream storage structure of the processing node can be observed, the accumulation condition of the upstream storage structure can be observed, and the bottleneck node can be quickly positioned based on the accumulation condition.
Optionally, after the bottleneck node is determined, capacity expansion or repair can be performed on the bottleneck node according to actual requirements, so as to overcome adverse effects of the bottleneck node on the data flow.
In an optional implementation manner, the data detection method provided by the present disclosure may further include: and after the abnormal data flow is determined, displaying the abnormal data flow.
In one example, the abnormal data flow can be sent to a client device of the business party, and the abnormal data flow is displayed through the client device of the business party, so that the business party can alarm and intervene on the abnormal data flow.
The inventor of the present disclosure performs performance analysis on the data detection method provided by the present disclosure, and finds that the data detection method provided by the present disclosure occupies smaller resources and space complexity. In one example, assuming that the size of a single piece of data of a data stream is 1 kilobyte (including spatial amplification caused by a map structure), the preset time for the data stream to take effect is 1 hour, the average flow rate of the data stream is characterized by a query rate per second, the query rate per second is 1 ten thousand (W), and all map structures are stored in a memory, the data detection method provided by the present disclosure occupies only about 30 megabytes (G) of resources in total. In one example, the writing complexity of a single piece of data is O (1) -O (logN), and the complexity of performing operations for processing timeout records and storing consumption progress (compact) on the single piece of data can reach O (logN) after optimization of the message queue. The processing of the timeout record and the storage of the consumption progress may refer to the related content, for example, a determination of whether a duration from a processing time of the target data corresponding to the collected record information to a current time is greater than a preset duration, and an operation of determining whether to store the consumption progress in the distributed storage structure based on the record may be performed.
The data detection method provided by the disclosure can be applied to an overall vertical search content validation system (hereinafter referred to as a vertical search content validation system), and only about 100 megabytes (GB) of storage space and a part of message queue space are required for buffering in the scene of the vertical search content validation system.
According to an embodiment of the present disclosure, there is also provided a data detection apparatus, as shown in fig. 8, the apparatus including: a first information acquisition module 801, a data state determination module 802, and a first lifetime state determination module 803.
A first information obtaining module 801, configured to obtain record information of target data in a target system; the record information comprises first record information corresponding to the target data at a starting node of the target system and/or second record information corresponding to the target data at a terminating node of the target system.
A data state determining module 802, configured to determine a data state of the target data based on the recording information.
A first effective state determining module 803, configured to determine effective state information of the target data in the target system based on the data state.
Optionally, the data state determining module 802 includes: the device comprises an allocation submodule, a first storage submodule and a data state determining submodule.
The distribution submodule is used for distributing a target operation unit for the target data based on the identification information of the target data; the first storage submodule is used for storing the record information into a first storage structure of the target operation unit based on the identification information; and a data state determination submodule for determining a data state of the target data based on the record information stored in the first storage structure by using the target arithmetic unit.
Optionally, the data state determining module 802 may further include: the information acquisition submodule and the second storage submodule.
The information acquisition submodule is used for acquiring the attribute information of the target data in the target system; and the second storage submodule is used for storing the attribute information into a second storage structure in the target operation unit based on the time information in the attribute information.
The data state determination submodule is specifically configured to: reading attribute information from the second storage structure based on the read timing of the second storage structure; reading corresponding record information from the first storage structure based on the read attribute information; based on the recording information, a data state of the target data is determined.
In an optional implementation manner, the data state determining module 802 is specifically configured to: and under the condition that the first recording information and the second recording information of the target data are determined to be acquired based on the recording information, determining that the data state comprises that the data stream corresponding to the target data is a normal data stream.
In another optional implementation, the data state determining module 802 is specifically configured to: and under the condition that the second recording information of the target data is determined to be acquired and the first recording information is not acquired based on the recording information, determining that the data state comprises that the data stream corresponding to the target data is a normal data stream.
In another alternative embodiment, the data state determining module 802 is specifically configured to: and under the condition that the first record information is determined to be acquired and the second record information is not acquired based on the record information, and the time length from the processing time of the target data corresponding to the first record information to the current time is greater than the preset time length, determining that the data state comprises that the data stream corresponding to the target data is an abnormal data stream.
In yet another optional implementation, the data state determination module is specifically configured to: and under the condition that the first record information is determined to be collected and the second record information is not collected based on the record information, and the time length from the processing time of the target data corresponding to the first record information to the current time is less than or equal to the preset time length, determining that the data state comprises that the data stream corresponding to the target data is a normal data stream.
In an optional implementation, the first lifetime status determining module 803 is specifically configured to: determining that the effective state information comprises that the target data is in an abnormal effective state under the condition that the data state comprises that the data stream corresponding to the target data is an abnormal data stream; and under the condition that the data state includes that the data stream corresponding to the target data is a normal data stream, determining that the effective state information includes that the target data is in a normal effective state.
In another alternative embodiment, the first lifetime status determining module 803 includes: the system comprises a log obtaining submodule, a bottleneck determining submodule and an effective state determining submodule.
The log obtaining sub-module is used for obtaining abnormal logs of each processing node in the target system under the condition that the data state includes that the data stream corresponding to the target data is an abnormal data stream; the bottleneck determining submodule is used for determining a bottleneck node in each processing node according to the abnormal log; the bottleneck node is a processing node which causes the data flow to be abnormal; and the effective state determining submodule is used for determining that the effective state information comprises an abnormal effective state of the target data based on the bottleneck node.
In an optional implementation manner, the data detection apparatus provided by the present disclosure may further include: the system comprises an index determining module, a first bottleneck determining module and a second effective state determining module.
The index determining module is used for acquiring data processing information of the target data in the target system and determining data aging indexes of each processing node in the target system; the first bottleneck determining module is used for determining a bottleneck node in each processing node according to the data aging index; the bottleneck node is a processing node of which the data flow is abnormal; and the second effective state determining module is used for determining that the effective state information comprises the target data which is in the abnormal effective state based on the bottleneck node.
Optionally, the index determining module is specifically configured to: determining the topological relation among the processing nodes according to the data processing information; determining an upstream storage structure of each processing node according to the topological relation; for each processing node, acquiring time information of first target data and last target data in an upstream storage structure of the processing node; determining a data aging index of the processing node according to the time information of the first target data and the last target data; the upstream storage structure is used for storing the data sent to the processing node by the last processing node.
In another optional embodiment, the data detection apparatus provided in the present disclosure may further include: the system comprises a second information acquisition module, a topology determination module, a space determination module, a second bottleneck determination module and a third generation state determination module.
The second information acquisition module is used for acquiring data processing information of the target data in the target system; the topology determining module is used for determining the topological relation among the processing nodes in the target system according to the data processing information; the space determining module is used for determining whether the storage space of the upstream storage structure of each processing node is completely occupied or not according to the topological relation; a second bottleneck determination module, configured to determine, for each processing node, that the processing node is a bottleneck node when a storage space of an upstream storage structure of the processing node is completely occupied; the bottleneck node is a processing node of which the data flow is abnormal; and the third effective state determining module is used for determining that the effective state information comprises the target data which is in the abnormal effective state based on the bottleneck node.
In accordance with a disclosed embodiment, the present disclosure also provides a data detection system as shown in fig. 9, including a framework layer (a core module of the data detection system), an aggregation layer, a presentation layer, and a storage layer. The data detection device provided by the present disclosure can be applied to a framework layer, and can specifically implement the steps of determining a data state, determining a bottleneck node, collecting data topology information, collecting an abnormal log, etc., and fig. 9 only shows some of the steps. To reduce storage costs, the data detection system may stream record information, attribute information, data topology information, and the like at an aggregation layer through open source software (e.g., statsd, skywalk, etc.). The data detection system can display the record information, the attribute information and the data topology information after the stream type aggregation in the display layer, and also display the abnormal log to realize log query; the data detection system can store related data in the data detection system through a storage unit such as an elastic search, a remote dictionary service, a logo, a prometheus (system and service monitoring system) of a storage layer, wherein the redis can be used for caching key indexes in the data detection system, such as normal data and/or abnormal data in each data stream, the logo can be used for storing meta information of the data streams, such as label information used for marking the data streams and other related information, and the prometheus can be used for storing the aggregated information at each moment as time-series data, so that a data base for querying and presenting according to time latitude is provided.
The functions of the modules, sub-modules, and units in the apparatuses in the embodiments of the present disclosure may refer to the corresponding descriptions in the above method embodiments, and are not described herein again.
The present disclosure also provides an electronic device, a non-transitory computer readable storage medium, and a computer program product according to embodiments of the present disclosure.
The present disclosure provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the data detection method provided by any embodiment of the disclosure.
The present disclosure provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute a data detection method provided in any one of the embodiments of the present disclosure.
The present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements the data detection method provided by any of the embodiments of the present disclosure.
FIG. 10 illustrates a schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 executes the respective methods and processes described above. For example, in some embodiments, the above-described methods may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1008. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into RAM 1003 and executed by the computing unit 1001, one or more steps of the method described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured by any other suitable means (e.g., by means of firmware) to perform the above-described method.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (25)

1. A method of data detection, comprising:
acquiring the recording information of target data in a target system; the record information comprises first record information corresponding to the target data at a starting node of the target system and/or second record information corresponding to the target data at a terminating node of the target system;
determining a data state of the target data based on the recording information;
based on the data state, determining effective state information of the target data in the target system.
2. The data detection method of claim 1, wherein the determining the data state of the target data based on the logging information comprises:
distributing a target operation unit for the target data based on the identification information of the target data;
storing the record information in a first storage structure of the target arithmetic unit based on the identification information;
determining, with the target arithmetic unit, a data state of the target data based on the record information stored in the first storage structure.
3. The data detection method of claim 2, wherein the determining a data state of the target data based on the logging information further comprises:
acquiring attribute information of the target data in the target system;
a second storage structure that stores the attribute information in the target arithmetic unit based on time information in the attribute information;
the determining the data state of the target data based on the record information stored in the first storage structure comprises:
reading the attribute information from the second storage structure based on a read timing of the second storage structure;
reading corresponding record information from the first storage structure based on the read attribute information;
determining a data state of the target data based on the recording information.
4. The data detection method of any one of claims 1 to 3, wherein the determining a data state of the target data based on the logging information comprises:
and determining that the data state comprises that the data stream corresponding to the target data is a normal data stream under the condition that the second recording information is determined to be collected based on the recording information.
5. The data detection method of any one of claims 1 to 3, wherein the determining a data state of the target data based on the logging information comprises:
and determining that the data state comprises that the data stream corresponding to the target data is an abnormal data stream when the first record information is determined to be acquired, the second record information is not acquired, and the time length from the processing time of the target data corresponding to the first record information to the current time is greater than a preset time length on the basis of the record information.
6. The data detection method of any one of claims 1 to 3, wherein the determining a data state of the target data based on the logging information comprises:
and under the condition that the first record information is determined to be collected and the second record information is not collected based on the record information, and the time length from the processing time of the target data corresponding to the first record information to the current time is less than or equal to the preset time length, determining that the data state comprises that the data stream corresponding to the target data is a normal data stream.
7. The data detection method of any one of claims 1 to 3, wherein the determining, based on the data state, validation state information of the target data in the target system comprises:
determining that the effective state information includes that the target data is in an abnormal effective state under the condition that the data state includes that the data stream corresponding to the target data is an abnormal data stream;
and determining that the effective state information includes that the target data is in a normal effective state under the condition that the data state includes that the data stream corresponding to the target data is a normal data stream.
8. The data detection method of any one of claims 1 to 3, wherein the determining, based on the data state, validation state information of the target data in the target system comprises:
acquiring abnormal logs of each processing node in the target system under the condition that the data state includes that a data stream corresponding to target data is an abnormal data stream;
determining a bottleneck node in each processing node according to the abnormal log; the bottleneck node is a processing node which causes the data flow to be an abnormal data flow;
determining the validation state information based on the bottleneck node; wherein the validation state information includes identification information of the bottleneck node.
9. The data detection method of any of claims 1 to 3, further comprising:
acquiring data processing information of the target data in the target system, and determining data aging indexes of processing nodes in the target system;
determining a bottleneck node in each processing node according to the data aging index; the bottleneck node is a processing node of which the data flow is an abnormal data flow;
determining the validation state information based on the bottleneck node; wherein the validation state information includes identification information of the bottleneck node.
10. The data detection method of claim 9, wherein the determining data aging indicators for the respective modules in the target system comprises:
determining the topological relation among the processing nodes according to the data processing information;
determining an upstream storage structure of each processing node according to the topological relation;
for each processing node, acquiring time information of first target data and last target data in an upstream storage structure of the processing node; the upstream storage structure is used for storing data sent to the processing node by a previous processing node;
and determining the data aging index of the processing node according to the time information of the first target data and the last target data.
11. The data detection method of any of claims 1 to 3, further comprising:
acquiring data processing information of the target data in the target system;
determining the topological relation among the processing nodes in the target system according to the data processing information;
determining whether the storage space of the upstream storage structure of each processing node is completely occupied or not according to the topological relation;
for each processing node, determining the processing node as a bottleneck node when the storage space of the upstream storage structure of the processing node is completely occupied; the bottleneck node is a processing node of which the data flow is an abnormal data flow;
determining the validation state information based on the bottleneck node; wherein the validation status information comprises identification information of the bottleneck node.
12. A data detection apparatus comprising:
the first information acquisition module is used for acquiring the recording information of the target data in the target system; the record information comprises first record information corresponding to the target data at a starting node of the target system and/or second record information corresponding to the target data at a terminating node of the target system;
a data state determination module for determining a data state of the target data based on the recording information;
and the first effective state determining module is used for determining effective state information of the target data in the target system based on the data state.
13. The data detection apparatus of claim 12, wherein the data state determination module comprises:
the distribution submodule is used for distributing a target operation unit for the target data based on the identification information of the target data;
the first storage submodule is used for storing the record information in a first storage structure of the target operation unit based on the identification information;
and a data state determination sub-module that determines, by the target arithmetic unit, a data state of the target data based on the record information stored in the first storage structure.
14. The data detection apparatus of claim 13, wherein the data state determination module further comprises:
the information acquisition submodule is used for acquiring the attribute information of the target data in the target system;
the second storage submodule is used for storing the attribute information into a second storage structure in the target operation unit based on the time information in the attribute information;
the data state determination submodule is specifically configured to: reading the attribute information from the second storage structure based on a read timing of the second storage structure; reading corresponding record information from the first storage structure based on the read attribute information; determining a data state of the target data based on the recording information.
15. The data detection apparatus according to any one of claims 12 to 14, wherein the data state determination module is specifically configured to:
and under the condition that the second recording information is determined to be acquired based on the recording information, determining that the data state comprises that the data stream corresponding to the target data is a normal data stream.
16. The data detection apparatus according to any one of claims 12 to 14, wherein the data state determination module is specifically configured to:
and determining that the data state comprises that the data stream corresponding to the target data is an abnormal data stream when the first record information is determined to be acquired, the second record information is not acquired, and the time length from the processing time of the target data corresponding to the first record information to the current time is greater than a preset time length on the basis of the record information.
17. The data detection apparatus according to any one of claims 12 to 14, wherein the data state determination module is specifically configured to:
and under the condition that the first record information is determined to be collected and the second record information is not collected based on the record information, and the time length from the processing time of the target data corresponding to the first record information to the current time is less than or equal to the preset time length, determining that the data state comprises that the data stream corresponding to the target data is a normal data stream.
18. The data detection apparatus according to any one of claims 12 to 14, wherein the first lifetime status determination module is specifically configured to:
determining that the effective state information includes that the target data is in an abnormal effective state under the condition that the data state includes that the data stream corresponding to the target data is an abnormal data stream;
and determining that the effective state information includes that the target data is in a normal effective state under the condition that the data state includes that the data stream corresponding to the target data is a normal data stream.
19. The data detection apparatus as defined in any one of claims 12 to 14, wherein the first health status determination module comprises:
the log obtaining sub-module is used for obtaining abnormal logs of each processing node in the target system under the condition that the data state includes that a data stream corresponding to target data is an abnormal data stream;
a bottleneck determining submodule, configured to determine a bottleneck node in each processing node according to the abnormal log; the bottleneck node is a processing node which causes the data flow to be an abnormal data flow;
an effective state determining submodule, configured to determine the effective state information based on the bottleneck node; wherein the validation state information includes identification information of the bottleneck node.
20. The data detection apparatus of any one of claims 12 to 14, further comprising:
the index determining module is used for acquiring data processing information of the target data in the target system and determining data aging indexes of each processing node in the target system;
the first bottleneck determining module is used for determining a bottleneck node in each processing node according to the data aging index; the bottleneck node is a processing node of which the data flow is an abnormal data flow;
a second validation state determination module that determines the validation state information based on the bottleneck node; wherein the validation state information includes identification information of the bottleneck node.
21. The data detection apparatus as claimed in claim 20, wherein the indicator determination module is specifically configured to:
acquiring data processing information of the target data in the target system; determining the topological relation among the processing nodes according to the data processing information; determining an upstream storage structure of each processing node according to the topological relation; for each processing node, acquiring time information of first target data and last target data in an upstream storage structure of the processing node; determining a data aging index of the processing node according to the time information of the first target data and the last target data; wherein, the upstream storage structure is used for storing the data sent to the processing node by the last processing node.
22. The data detection apparatus as claimed in any one of claims 12 to 14, further comprising:
the second information acquisition module is used for acquiring data processing information of the target data in the target system;
the topology determining module is used for determining the topological relation among the processing nodes in the target system according to the data processing information;
a space determining module, configured to determine whether all storage spaces of the upstream storage structures of the processing nodes are occupied according to the topological relation;
a second bottleneck determination module, configured to determine, for each processing node, that the processing node is a bottleneck node when a storage space of an upstream storage structure of the processing node is completely occupied; the bottleneck node is a processing node of which the data flow is an abnormal data flow;
a third validation state determination module configured to determine the validation state information based on the bottleneck node; wherein the validation state information includes identification information of the bottleneck node.
23. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data detection method of any one of claims 1-11.
24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the data detection method of any one of claims 1-11.
25. A computer program product comprising a computer program which, when executed by a processor, implements a data detection method according to any one of claims 1-11.
CN202210087939.0A 2022-01-25 2022-01-25 Data detection method, device, equipment and storage medium Pending CN114428711A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210087939.0A CN114428711A (en) 2022-01-25 2022-01-25 Data detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210087939.0A CN114428711A (en) 2022-01-25 2022-01-25 Data detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114428711A true CN114428711A (en) 2022-05-03

Family

ID=81312949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210087939.0A Pending CN114428711A (en) 2022-01-25 2022-01-25 Data detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114428711A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116166709A (en) * 2022-11-17 2023-05-26 北京白龙马云行科技有限公司 Time length correction method, device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116166709A (en) * 2022-11-17 2023-05-26 北京白龙马云行科技有限公司 Time length correction method, device, electronic equipment and storage medium
CN116166709B (en) * 2022-11-17 2023-10-13 北京白龙马云行科技有限公司 Time length correction method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106815254B (en) Data processing method and device
WO2014208139A1 (en) Fault detection device, control method, and program
CN109379305B (en) Data issuing method, device, server and storage medium
CN112615742A (en) Method, device, equipment and storage medium for early warning
CN111897700B (en) Application index monitoring method and device, electronic equipment and readable storage medium
CN114428711A (en) Data detection method, device, equipment and storage medium
CN112800061B (en) Data storage method, device, server and storage medium
CN114461792A (en) Alarm event correlation method, device, electronic equipment, medium and program product
US7890444B2 (en) Visualization of data availability and risk
CN113778644A (en) Task processing method, device, equipment and storage medium
CN113312321A (en) Abnormal monitoring method for traffic and related equipment
CN116545740B (en) Threat behavior analysis method and server based on big data
CN110943887B (en) Probe scheduling method, device, equipment and storage medium
CN113220705A (en) Slow query identification method and device
US20200117640A1 (en) Method, device and computer program product for managing storage system
CN110958137A (en) Traffic management method and device and electronic equipment
CN114885014A (en) Method, device, equipment and medium for monitoring external field equipment state
CN114661562A (en) Data warning method, device, equipment and medium
CN111090646B (en) Electromagnetic data processing platform
CN113485891A (en) Service log monitoring method and device, storage medium and electronic equipment
US20210208998A1 (en) Function analyzer, function analysis method, and function analysis program
CN111858579A (en) Data storage method and device
CN115242799B (en) Data reporting method, device, equipment, storage medium and program product
CN113220230B (en) Data export method and device, electronic equipment and storage medium
CN113326243B (en) Method and device for analyzing log data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination