WO2024000585A1

WO2024000585A1 - Data processing method, apparatus, and system for data tracking and electronic device

Info

Publication number: WO2024000585A1
Application number: PCT/CN2022/103434
Authority: WO
Inventors: 李聪超; 陈嘉雯; 刘晓南; 陈维御; 周林飞; 王刚华
Original assignee: 西门子股份公司; 西门子（中国）有限公司
Priority date: 2022-07-01
Filing date: 2022-07-01
Publication date: 2024-01-04

Abstract

A data processing method for data tracking, the method comprising: collecting data, and cleaning the collected data to obtain cleaned data; storing the cleaned data as an entity, an entity attribute, and an entity attribute value format to obtain entity, entity attribute, and entity attribute value format data; and generating directed acyclic graph format data by using the obtained entity, entity attribute, and entity attribute value format data. In the method, the entity, the entity attribute, and the entity attribute value format are applied to tracking processing, so as to be able to implement transparent tracking on stored data. In addition, the entity, entity attribute, and entity attribute value format data generate the directed acyclic graph format data, which can enable a client to better understand a working principle in a data processing process, and the data can be called from any node on the basis of the directed acyclic graph format data, such that replay of original data can be supported more conveniently.

Description

Data processing methods, devices, systems and electronic equipment for data tracking

Technical field

The invention relates to the field of information processing, in particular to a data processing method, device, system and electronic equipment for data tracking.

Background technique

Modern industrial systems are composed of many different and complex subsystems. In particular, in the context of the great integration of information technology and operational technology, industrial systems include not only industrial protocol data sources, but also components of information technology systems. In the field of data analysis, some results are produced through a series of processing steps. If you only look at the results themselves, a lot of information will be ignored. For example, the root cause of the error cannot be obtained.

Data processing frameworks mainly focus on data processing based on batch and stream processing. Existing third-party open source software does not provide advanced tracking mechanisms for data fusion because it only focuses on functional features and has no maintainable processing step information. Although the included components become complex, as many companies do, the framework can be extended and include much transformation logic in industrial settings, such as when users want to analyze key performance indicators or artificial intelligence results, existing tools Unable to handle. Existing tools are outcome-oriented rather than process-oriented. While the process appears fine under normal circumstances, the investigation steps are not transparent if the process encounters a problem. Especially during streaming, there are no tools that can track and generate the resulting data module by module. Existing tools aim to decouple data processing steps, making the final result an introduced black box that cannot be accurately tracked.

Contents of the invention

In view of this, the present invention proposes a data processing method, device, system and electronic equipment for data tracking, which can play back entities, entity attributes and entity attribute value format data to achieve transparent tracking of data.

A data processing method for data tracking provided according to an embodiment of the present invention includes:

Collect data, clean the collected data, and obtain the cleaned data;

Store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain data in the format of entities, entity attributes and entity attribute values;

The obtained entity, entity attributes and entity attribute value format data are generated into directed acyclic graph format data.

A data processing device for data tracking provided according to an embodiment of the present invention includes:

Collection module, used to collect data;

The cleaning module is used to clean the collected data and obtain the cleaned data;

The entity, entity attribute and entity attribute value generation module is used to store the cleaned data in the entity, entity attribute and entity attribute value format, and obtain the entity, entity attribute and entity attribute value format data;

A directed acyclic graph generation module is used to generate directed acyclic graph format data from the obtained entities, entity attributes and entity attribute value format data.

A data processing system for data tracking provided according to an embodiment of the present invention includes:

Sensors to collect data;

A data processing device is used to clean the collected data and obtain cleaned data; store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain data in the format of entities, entity attributes and entity attribute values. ; and generate directed acyclic graph format data from the obtained entity, entity attributes and entity attribute value format data.

Collect data, clean the collected data, and obtain the cleaned data;

Generate directed acyclic graph format data from the obtained entity, entity attributes and entity attribute value format data;

Enter an identifier to display the data to be played back;

Data playback is performed based on the directed acyclic graph format data to obtain the root cause.

An electronic device for data tracking provided according to an embodiment of the present invention includes a computing module and a storage module that stores a program, wherein the program includes instructions for executing the method in the above embodiment by the computing module.

The data processing methods, devices, systems and electronic devices of the embodiments of the present invention store entities, entity attributes and entity attribute value format data, and generate directed acyclic graph format data from entities, entity attributes and entity attribute value format data. Because entities, entity attributes and entity attribute value formats are unified data storage formats that can be used in standard databases, and entities, entity attributes and entity attribute value formats contain icons, graphics or charts and other transparent content, therefore entities, entity attributes and The entity attribute value format is applied to the methods, devices and systems of the embodiments of the present invention for data tracking, which can realize transparent tracking of stored data. At the same time, the data in entity, entity attribute and entity attribute value format generates directed acyclic graph format data. Since the directed acyclic graph format data is based on a more fine-grained process monitoring data format and process design, this can enable customers to better Understand the working principle of the data processing process, and data can be called from any node based on the directed acyclic graph format data. This can more conveniently support the playback of the original data, allowing developers to find problems based on the playback of the original data. , and then adjust and optimize every aspect of data processing.

Description of drawings

The above-mentioned characteristics, technical features, advantages and implementation methods of the present invention will be further explained in a clear and easy-to-understand manner through the description of the preferred embodiments in conjunction with the accompanying drawings, in which:

Figure 1 is a schematic scene diagram of one embodiment of traditional industrial system data tracking according to the present invention;

Figure 2 is a framework diagram of an embodiment of the industrial system of the present invention;

Figure 3 is a schematic flow chart of one embodiment of data processing for data tracking according to the present invention;

Figure 4 is a schematic data structure diagram of one embodiment of entity, entity attribute and entity attribute value format data of the present invention;

Figure 5 is a schematic diagram of the data structure of another embodiment of the present invention;

Figure 6 is a schematic diagram of a specific data structure according to another embodiment of the present invention;

Figure 7 is a schematic flow chart of an embodiment of the present invention for data playback;

Figure 8 is a structural block diagram of an exemplary electronic device that can be used to implement embodiments of the present invention.

Among them, the reference signs are as follows:

100: Traditional industrial system 101, 201: Data process link 102, 202: The first sensor

103, 203: Second sensor 104, 105: Device 106, 206: Database

200: Industrial system 204, 205: Device 207: System

401: Processing pipeline 402: Description 403: Storage format

4011: Collection module 4012: First cleaning module 4013: Second cleaning module

4014: Data analysis module 4021-4024: Function description 501: Data structure of linked list relationship

4031: Description 4032: Description 4033: Description

502: Data structure of tree relationship 503: Data structure of graph relationship

S301: Collect data and clean the collected data to obtain the cleaned data.

S302: Store the cleaned data in entity, entity attribute and entity attribute value format, and obtain entity, entity attribute and entity attribute value format data.

S303: Generate directed acyclic graph format data from the obtained entities, entity attributes and entity attribute value format data

S701: Enter the identifier to display the data to be played back

S702: Perform data playback based on directed acyclic graph format data to obtain the root cause.

Detailed ways

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, which rather are provided for A more thorough and complete understanding of this disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that various steps described in the method implementations of the present disclosure may be executed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performance of illustrated steps. The scope of the present disclosure is not limited in this regard.

As used herein, the term "include" and its variations are open-ended, ie, "including but not limited to." The term "based on" means "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; and the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below. It should be noted that concepts such as “first” and “second” mentioned in this disclosure are only used to distinguish different devices, modules or modules, and are not used to limit the order of functions performed by these devices, modules or modules. Or interdependence.

It should be noted that the modifications of "one" and "plurality" mentioned in this disclosure are illustrative and not restrictive. Those skilled in the art will understand that unless the context clearly indicates otherwise, it should be understood as "one or Multiple”. The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are for illustrative purposes only and are not used to limit the scope of these messages or information.

Figure 1 shows an application diagram of a traditional industrial system for data analysis and tracking. Specifically, the traditional industrial system 100 is used for data analysis and data tracking, and mainly includes the data process link 101 in the industrial process, where the data process link 101 Each link includes a test point. The example in Figure 1 takes two test points as an example. Each test point has a sensor for collecting data and monitoring the collected data. The first test point has a first sensor 102 and the second test point has a second sensor 103. The first sensor 102 and the second sensor 103 transmits the collected data to each specific device running in the data process link. For example, the first sensor 102 transmits the collected data to the device 104, and the second sensor 103 transmits the collected data to the device 105. middle. Taking an industrial system as an example, the device 104 and the device 105 can be actual devices such as valves and compressors. The actual devices are devices actually used in the data collection and analysis process. A specific monitoring system can also be taken as an example. The actual devices The device is the specific application equipment that monitors each link during the data collection and analysis process. The device 104 and the device 105 transmit and store the collected data into the database 106, and generate a simulation diagram based on the stored data, where the simulation diagram is a two-dimensional simulation diagram generated by the industrial system during startup based on existing software. In the process of streaming data collected for each test point in this traditional industrial system, there is no tool to track the data generation results of each module, which makes the result generation process like a black box and cannot accurately play back the original data or The root cause of the error. The root cause refers to the initial cause or key cause of the causal chain that leads to a certain outcome or consequence.

FIG. 2 shows an application diagram of an industrial system for data analysis and tracking according to an exemplary embodiment of the present disclosure. The industrial system 200 can be any system that includes multiple modules for data analysis and tracking. Specifically, the industrial system 200 can be any system that includes multiple modules for data analysis and tracking. 200 can be applied to various practical industrial systems such as alarm systems, detection systems or tracking systems. In industrial systems, data is the most powerful asset for digital innovation. But in most traditional industries, its final stages can only be seen in data storage. In order to increase the transparency of the data, the embodiment of the present invention adds a data collection step in the data flow link 201 (details are explained below with examples). In actual industrial system applications, in the data process link 201, not only the data generated by which equipment should be collected is added, but also upstream and downstream information and other factors or parameters are added. Each link in the data process link 201 includes a test point. The example in Figure 2 takes two test points as an example. Each test point has a sensor for collecting data and monitoring the collected data. The first test point has a first sensor 202 , and the second test point has a second sensor 203 . The first sensor 202 and the second sensor 203 transmits the collected data to each specific device running in the data process link. For example, the first sensor 202 transmits the collected data to the device 204, and the second sensor 203 transmits the collected data to the device. In 205, the device 204 and the device 205 transmit and store the collected data into the database 206. For example, the collected data can be used to organize the data flow and can be merged into the database 206 to further generate simulation diagrams, such as SIMATIC PCS. The database 206 further stores the stored data in the system 207, where the system can be a second database that stores data in a directed acyclic graph (DAG) format. The directed acyclic graph format data shows the real system data processing logic. A directed acyclic graph is a finite directed graph without directed cycles. Specifically, a directed acyclic graph consists of a limited number of vertices and directed edges. Each directed edge points from one vertex to another; starting from any vertex, you cannot return to the original vertex through these directed edges. . Among them, a directed acyclic graph is a graph, which is a data structure like arrays, queues, linked lists, etc. A graph is a data structure composed of vertices and edges connecting the vertices. In computer science, graphs are one of the most flexible data structures. Many problems can be modeled and solved using graph models. For example, social networks between people, analyzing the topology of computer networks to determine whether two computers can communicate, finding the shortest path between two locations in logistics systems, etc. In the data process link 201, when collecting data, additional data with source or additional information can be linked into directed acyclic graph format data, so that all information can be migrated to the industrial system to clearly understand the data processing process working principle. Among them, the data in the directed acyclic graph format in the system 207 can be regarded as a data flow skeleton, which is a high-standard picture generated by data processing.

Among them, the data processing system for data tracking shown in Figure 2 can be summarized as follows: a sensor for collecting data; and a data processing device for performing data cleaning on the collected data to obtain clean data. The cleaned data is stored in the entity, entity attribute and entity attribute value format, the entity, entity attribute and entity attribute value format data are obtained, and the obtained entity, entity attribute and entity attribute value format data are Generate data in directed acyclic graph format. Among them, entities, entity attributes and entity attribute values are also called subjects, predicates and objects (Subject-Predicate-Object, SPO). Among them, entities, entity attributes and entity attribute values are a term for reading the database, indicating the format of describing the term. Description terms can be described in many ways, such as the RDF (Resource Description Framework, Resource Description Framework) standard format that conforms to the definition of entities, entity attributes, and entity attribute value terms. Entities, entity attributes and entity attribute values are used in the field of knowledge graph, which is a knowledge representation form. Knowledge graphs focus on describing the relationships between entities and are composed of some interconnected entities and their attributes. The knowledge graph consists of pieces of knowledge, and each piece of knowledge can be represented as a triplet of an entity, entity attributes, and entity attribute values. Taking a job of the embodiment of the present invention as an example to describe entities, entity attributes and entity attribute values:

job:123link:next log:Transaction, where job:123 is the entity in the entity, entity attributes and entity attribute values;

log:pJob job:123, where pJob is the entity attribute among entities, entity attributes and entity attribute values, job:123 is the entity attribute value among entities, entity attributes and entity attribute values;

log:nextJob job:124, where nextJob is the entity attribute among entities, entity attributes and entity attribute values, job:124 is the entity attribute value among entities, entity attributes and entity attribute values;

log:input srv:Content, where input srv is the entity attribute among entities, entity attributes and entity attribute values, and Content is the entity attribute value among entities, entity attributes and entity attribute values;

log:processedBy srv:A, where processedBy is the entity attribute among entities, entity attributes and entity attribute values, srv:A is the entity attribute value among entities, entity attributes and entity attribute values;

log:processedAt "2022-02-10T10:22:23", where processedAt is the entity attribute in the entity, entity attribute and entity attribute value, "2022-02-10T10:22:23" is the entity, entity attribute and entity attribute Entity attribute value in value;

log:statusCode 200, where statusCode is the entity attribute among entities, entity attributes, and entity attribute values, and 200 is the entity attribute value among entities, entity attributes, and entity attribute values.

Among them, data cleaning is a step of data cleaning, which refers to the process of detecting and correcting (or deleting) damaged or inaccurate records from record sets, data charts, or databases. Data cleaning can identify incomplete, incorrect, inaccurate or irrelevant parts of the data and then replace, modify or delete the dirty or rough data. Data cleaning can be performed interactively with data processing tools or batch processing through scripts. After data cleaning, a data set should be consistent with other similar data sets in the system.

The embodiment of the present invention stores entities, entity attributes and entity attribute value format data, and generates directed acyclic graph format data from the entities, entity attributes and entity attribute value format data. Because entities, entity attributes and entity attribute value formats are unified data storage formats that can be used in standard databases, and entities, entity attributes and entity attribute value formats contain icons, graphics or charts and other transparent content, therefore entities, entity attributes and and entity attribute value formats are applied to the methods, devices and systems of examples of the present invention to track data, which can realize transparent tracking of stored data. Among them, transparent tracking means that each step of the storage process in a complex black box data can be displayed transparently, and the original data can be accurately tracked. At the same time, the data in entity, entity attribute and entity attribute value format generates directed acyclic graph format data. Since the directed acyclic graph format data is based on a more fine-grained process monitoring data format and process design, this can enable customers to better Understand the working principle of the data processing process, and data can be called from any node based on the directed acyclic graph format data. This can more conveniently support the playback of the original data, allowing developers to find problems based on the playback of the original data. , and then adjust and optimize every aspect of data processing. The playback tracks groups of original data based on the identifiers in the stored entities, entity attributes, and entity attribute value format data. Each time an identifier is entered as needed, the specific details in the stored entities, entity attributes, and entity attribute value format data can be displayed. The content of the node. In this way, the stored entity, entity attribute and entity attribute value format data can be displayed again from any node according to the playback, and the storage process of each data in the middle will be displayed, thereby accurately tracking the data.

Figure 3 is a schematic diagram of a method for data analysis and tracking in an exemplary industrial system of the present disclosure. The method specifically includes:

S301: Collect data, and clean the collected data to obtain cleaned data.

Among them, collecting data includes collecting original data and additional data. When collecting data, the original data is the original data obtained when collecting data based on the sensors at each test point. Each test point can be collected according to different Collect in the time dimension, for example: collect data in units of every millisecond, or collect in parallel expansion jobs in time, where parallel expansion jobs are to implement multiple job operations at the same time point at the same time. In addition, each test point collects many different elements as needed during collection, such as collecting the values of the first time point and the second time point, and recording and storing them in the specific time periods of the first time point and the second time point on the timeline. During data playback, the original data can be played back based on the data stored in the specific time period of the first time point and the second time point. Additional data is to add additional parameters as needed for better data analysis and monitoring. For example, additional information can be supplementary data, and the supplementary data is the upstream and downstream parameters configured as needed.

Among them, data cleaning of the collected data can be performed N times according to the needs of different test points, where N is a positive integer. Data cleaning is a data preprocessing process, that is, some redundant values are removed during the preprocessing process. In special cases, the redundant values can be empty. For example, in industrial systems, data needs to be cleaned to remove noise. The cleaning process can include different steps according to the needs of different test points.

Among them, the obtained entity, entity attributes and entity attribute value format data are stored in the asynchronous system. This asynchronous system refers to a storage system that no longer needs to wait for storage confirmation or return. In order to avoid impact on the original storage system, the collected data is directly and synchronously stored in the asynchronous system. This way, there is no need to wait for storage confirmation or return, and storage is more convenient and faster. Moreover, the data collected in the asynchronous system is stored and can be called for convenient data playback when needed. Among them, all data to be played back will be transferred to the asynchronous system in the format of entities, entity attributes and entity attribute values, which will have minimal impact on the performance of the original storage system.

In the embodiment of the present invention, entities, entity attributes and entity attribute value formats are used to store data, which can realize the following three functions: data transparency, multi-time dimension data analysis and stream data playback. Figure 4 is a structural schematic diagram of an exemplary embodiment of the present disclosure using entities, entity attributes and entity attribute value data formats to describe the data processing process and data storage format. The processing pipeline 401 mainly includes four functional modules, namely the collection module 4011, The first cleaning module 4012, the second cleaning module 4013 and the data analysis module 4014. The description 402 is the specific function description of each module, the function description 4021 is the collection of data, and the execution flow description of the specific steps is detailed in the above step S301. Function description 4022 is cleaning step 1, function description 4023 is cleaning step 2, and the execution flow description of the specific steps can be found in the above-mentioned step S301. Function description 4024 performs data analysis on the prepared data. Specifically, the above-mentioned cleaned data is input into the data analysis module 4014 to perform data analysis on the input data. Specifically, the data analysis module 4014 can analyze the data collected and cleaned by the sensor at a certain test point. Optionally, the data collected by sensors at multiple test points can be cleaned and added for analysis. That is, the data analysis module 4014 can analyze the data collected by some devices according to user scenarios and specific needs. It can also be used for All data collected by the device are analyzed. The data analysis module 4014 analyzes the cleaned data, mainly for preset maintenance of equipment or systems, such as life prediction in some systems and scenarios. Among them, the analysis performed by the data analysis module 4014 is mainly closely related to the value of each test point. Compared with the existing technology, the data analysis module 4014 is added in this embodiment, and the maintenance of the equipment or system is no longer a black box. More transparent analysis results can be obtained, and better maintenance of equipment or systems can be planned or other needs can be met. Storage format 403 uses entities, entity attributes and entity attribute value formats to link the data processed in each step. Description 4031 corresponds to the description of the data collection step, description 4032 corresponds to the description of cleaning, and description 4033 is the continuing step. Description, where the above steps describe the relationship as follows:

Gathering Job ID 1: JobInfo—>Cleaning Job ID 2: JobInfo

The specific data formats involved in the collection steps and cleaning steps are detailed in Table 1 and Table 2 below.

The entity, entity attributes and entity attribute value format data can be designed as follows. Table 1 is just an example, and the embodiments of the present invention can also be implemented with other technologies. Among them, the entity, entity attribute and entity attribute value format data can be expanded according to the system design requirements. Table 1 illustrates some factors that the system may include.

Table 1 example:

The example parameter descriptions in Table 1 are shown in Table 2:

Table 2

S303: Obtain entity, entity attributes and entity attribute value format data to generate directed acyclic graph format data.

Among them, the solution design basis provided by the embodiments of the present invention is to track these steps during the process and help engineers identify the root cause of the problem. These steps can be converted into directed acyclic graph format data. A directed acyclic graph is a finite directed graph without directed cycles. Specifically, a directed acyclic graph consists of a limited number of vertices and directed edges. Each directed edge points from one vertex to another; starting from any vertex, you cannot return to the original vertex through these directed edges. . Among them, a directed acyclic graph is a graph, which is a data structure like arrays, queues, linked lists, etc. There are many nodes on the graph, also called vertices, and those connecting two nodes are called edges. Figure 5 is a schematic diagram of data structures of three progressive complexity levels of directed acyclic graphs in an exemplary embodiment of the present disclosure. The three data structures include a data structure 501 of a linked list relationship, a data structure 502 of a tree relationship, and a data structure of a graph relationship. Structure 503. The data structure 501 of the linked list relationship is a directional line. The data structure 502 of the tree relationship is bifurcated, but there is only one path between any two nodes to reach another point, that is, a closed graph cannot be formed. The graph relationship data structure 503 can have a closed graph. In particular, the tree relational data structure 502 is also a special case of the graph relational data structure 503. The last letter G in a directed acyclic graph refers to the graph. The corresponding word D is directed, which means there is a clear direction. There is a pointer to node B in node A, but there is no pointer to the new node A in node B. The pointer, if drawn, is a one-way arrow from A to B. In a directed acyclic graph, the direction from one node to another node is unidirectional, which is the meaning of direction. The word corresponding to A is acyclic, which means that the entire graph is not allowed to follow the arrow from one node and finally return to the starting point.

Figure 6 is a schematic diagram of a specific directed acyclic graph data structure in an exemplary embodiment of the present disclosure. Based on the above entities, entity attributes and entity attribute value formats and the parameters and definitions of Table 1 and Figure 2, it shows in detail Data upstream and downstream relationships between entities, entity attributes, and entity attribute value formats in the acyclic graph. The namespaces in Figure 6 are as follows:

job:http://example.org/data/job/

link:http://example.org/data/relation/

srv:http://example.org/data/server/

prefix log:http://example.org/ont/transaction-log/

xsd:<http://www.w3.org/2001/XMLSchema#

Compared with the simple relationships in traditional industrial scenarios, the directed acyclic graph format data of the embodiment of the present invention provided in Figure 6 shows the parameters and specific data of each link, for example, the relationship between data per second and the data of parallel expansion operations. display.

Among them, the directed acyclic graph format data is used to display a high-standard transparent processing process. Moreover, based on the embodiment of the present invention, each result can be played back, so the generation of the result is no longer a black box. Among them, there are different granularity records upstream and downstream of each step. Based on these records, the embodiment of the present invention not only transparently displays each step of data processing, that is, presents the original data and data processing process, but also provides Each result provides records from various source data trees. In addition, traditional industrial systems do not use special entities, entity attributes, and entity attribute value formats to store and analyze data, that is, traditional industrial systems do not have a unified standard. The present invention uses data storage in the format of entities, entity attributes and entity attribute values. During the data processing process, directed acyclic graph format data is generated, thus introducing an asynchronous system for data storage in the format of entities, entity attributes and entity attribute values. The asynchronous system is independent of the traditional industrial system and will not affect the traditional industrial system. Entities, entity attributes and entity attribute value format data can be processed using some advanced analysis methods and tools, such as directed acyclic graph format data processing. The processed data can be played back as needed to achieve the present invention. The main goal of the embodiment is to provide transparent tracking of data in data processing.

According to the data processing working principle of the industrial system of the embodiment of the present invention provided in Figures 2 and 4, the processing steps can be decomposed into data collection, cleaning step 1, and cleaning step 2, and then the prepared data is fed to the data analysis module. The data structure mentioned in Figure 2 only shows data flow information. In common data analysis projects, each batch or streaming job information needs to be tracked, and the output of each module is the upstream of another module. When a directed acyclic graph becomes complex, it is impossible to accurately investigate the root cause because detailed transformation information is added at each step in the data transformation process. Optionally, detailed transformation information is another time dimension of the matrix in Figure 2. All data can be viewed as a multi-dimensional matrix, which allows for easier operations, such as data conversion in the time dimension. Each data transformation can be an N-N mapping matrix because the processing nodes scale the job in parallel.

The embodiment of the present invention stores entities, entity attributes and entity attribute value format data. Entities, entity attributes and entity attribute value formats are unified data storage formats that can be used in standard databases. The entities, entity attributes and entity attribute value formats are The data generates data in directed acyclic graph format, enabling transparent tracking of traditional industrial systems. At the same time, based on the more fine-grained process monitoring data format and process design, customers can better understand the working principles of data processing modules such as black boxes, and developers can better adjust the data processing module to support the playback of original data.

An embodiment of the present invention provides a data processing method for data tracking, which includes S301 to S303 in the above-mentioned Figure 3. The specific implementation method is consistent with the above-mentioned Figure 3 and the description of the corresponding embodiment, and will not be described again. Furthermore, the data processing method further includes the following S701 to S702, as shown in Figure 7 . Figure 7 is a schematic diagram of a data playback method according to an exemplary embodiment of the present disclosure. The method includes:

S701: Enter the identifier to display the data to be played back.

Among them, the data to be played back can be all the original data or the original data of a specific module or node. Playback is to display the data processing process again. The data processing process, such as the data collection and data cleaning disclosed in the embodiments of Figure 2, Figure 3, and Figure 4, etc., each process in all the data will be displayed again. Next it is shown how to conduct root cause analysis. When a problem is discovered during the data analysis step, the job ID can be played back and the entire treemap can be loaded from collection to data analysis. Within this tree, playback can be started in any trunk to verify data analysis results.

Secondly, the embodiments disclosed in Figures 2, 3, and 4 above store entities, entity attributes, and entity attribute value formats based on collected data and additional information, and then rely on advanced transparency tools to describe virtual representations, such as using directed acyclic graph. During the transparent tracking process, the upstream and downstream data relationships and connections of each data flow can be displayed in real time. In the upstream and downstream relationships of each data flow, the line is the test point, which includes many elements or parameters, for example, the values of the first time point and the second time point, the first time point and the second time point on the timeline When the data stored in the time period needs to be played back, the playback of the original data can be obtained by calling the data stored in the time period of the first time point and the second time point.

Among them, according to the example of Figure 6, the directed acyclic graph format data is generated during the running process. Two circles and one connection represent the upstream and downstream relationships. According to the description of the above embodiment, the entities, entity attributes and entity attribute value formats are The data is stored in the asynchronous system database. When the data needs to be called, the original data can be automatically generated using the directed acyclic graph format data in the example in Figure 6.

Compared with the simple relationships in traditional industrial scenarios, data in the directed acyclic graph format displays the parameters and specific data of each link, for example, the relationship between data per second, or the display of data that extends operations in parallel in time. In the embodiment of the present invention, the industrial system is a mature industrial scenario. The use of directed acyclic graphs can help programmers achieve data playback and trace the root cause of problems with simple transparency. The data collected can also help users investigate the data processing module and help the data processing module owner better explain the data analysis functions.

Exemplary embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is used to cause the computer to execute the method according to the embodiment of the present disclosure. method.

Exemplary embodiments of the present disclosure also provide a computer program product, including a computer program, wherein the computer program, when executed by a processor of a computer, is used to cause the computer to perform a method according to an embodiment of the present disclosure.

Referring to FIG. 8 , a structural block diagram of an electronic device 800 that may serve as a server or client of the present disclosure will now be described, which is an example of a hardware device that may be applied to aspects of the present disclosure. Electronic devices are intended to refer to various forms of digital electronic computing equipment, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in FIG. 8 , the electronic device 800 includes a computing module 801 that can perform calculations according to a computer program stored in a read-only memory (ROM) 802 or loaded from a storage module 808 into a random access memory (RAM) 803 . Perform various appropriate actions and processing. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. Computing module 801, ROM 802 and RAM 803 are connected to each other through bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

Multiple components in the electronic device 800 are connected to the I/O interface 805, including: an input module 806, an output module 807, a storage module 808, and a communication module 809. The input module 806 may be any type of device capable of inputting information to the electronic device 800. The input module 806 may receive input numeric or character information and generate key signal inputs related to user settings and/or functional controls of the electronic device. The output module 807 may be any type of device capable of presenting information, and may include, but is not limited to, a display, speakers, video/audio output terminal, vibrator, and/or printer. The storage module 804 may include, but is not limited to, magnetic disks and optical disks. The communication module 809 allows the electronic device 800 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunications networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chip Groups such as Bluetooth™ devices, WiFi devices, WiMax devices, cellular communications devices and/or the like.

Computing module 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing module 801 include, but are not limited to, a central processing module (CPU), a graphics processing module (GPU), various dedicated artificial intelligence (AI) computing chips, various computing modules that run machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc. The computing module 801 performs various methods and processes described above. For example, in some embodiments, methods S301-S303, S701 to S702 may be implemented as a computer software program, which is tangibly embodied in a machine-readable medium, such as the storage module 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication module 809. In some embodiments, the computing module 801 may be configured to perform methods S301-S303, S701 to S702 in any other suitable manner (eg, by means of firmware).

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor (eg, magnetic disk, optical disk, memory, programmable logic device (PLD)), including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.

The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.

Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.

The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection.

Claims

A data processing method for data tracking, characterized by:

Collect data, clean the collected data, and obtain the cleaned data;

Store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain data in the format of entities, entity attributes and entity attribute values;

The obtained entity, entity attributes and entity attribute value format data are generated into directed acyclic graph format data.
The data processing method according to claim 1, wherein the method further includes:

The obtained entity, entity attributes and entity attribute value format data are input into the data analysis module for data analysis.
The data processing method according to claim 1, wherein the method further includes:

The obtained entity, entity attributes and entity attribute value format data are input into the asynchronous system for storage.
The data processing method according to claim 1, wherein the collecting data, cleaning the collected data, and obtaining the cleaned data further includes:

Collect data, perform the first cleaning of the collected data, and obtain the data after the first cleaning;

Perform the Nth cleaning on the obtained data after the first cleaning to obtain the Nth cleaning data, where N is an integer greater than 1.
The data processing method according to claim 1, wherein before generating the directed acyclic graph from the obtained entities, entity attributes and entity attribute value format data, the method further includes:

The obtained entity, entity attributes and entity attribute value format data are stored in a database and a simulation diagram is generated.
The data processing method according to claim 1, wherein collecting data includes collecting data of each test point according to different time dimensions.
The data processing method according to any one of claims 1 to 6, wherein the collected data includes original data and additional data.
A data processing device for data tracking, including:

Collection module, used to collect data;

The cleaning module is used to clean the collected data and obtain the cleaned data;

An entity, entity attribute and entity attribute value generation module is used to store the cleaned data in the entity, entity attribute and entity attribute value format, and obtain the entity, entity attribute and entity attribute value format data;

A directed acyclic graph generation module is used to generate directed acyclic graph format data from the obtained entity, entity attributes and entity attribute value format data.
The data processing apparatus according to claim 8, wherein the apparatus further comprises:

A data analysis module, configured to analyze the input entity, entity attributes and entity attribute value format data obtained.
The data processing apparatus according to claim 8, wherein the apparatus further comprises:

An asynchronous system for storing the obtained entity, entity attributes and entity attribute value format data.
The data processing apparatus according to claim 8, wherein the cleaning module further comprises:

The first cleaning module is used to clean the collected data for the first time and obtain the data after the first cleaning;

The Nth cleaning module is used to perform the Nth cleaning on the obtained data after the first cleaning, and obtain the Nth cleaning data, where N is an integer greater than 1.
The data processing device according to claim 8, wherein the collection module further includes collecting data of each test point according to different time dimensions.
The data processing device according to any one of claims 8 to 12, wherein the data includes original data and additional data.
A data processing system for data tracking, including:

Sensors to collect data;

Data processing device, used to clean the collected data and obtain cleaned data; store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain entities, entity attributes and entity attribute values. Format data; and generate directed acyclic graph format data from the obtained entity, entity attributes and entity attribute value format data.
The data processing system of claim 14, wherein the system further includes:

A data analysis device, configured to analyze the input obtained entity, entity attributes and entity attribute value format data.
The data processing system of claim 14, wherein the system further includes:

An asynchronous system for storing the obtained entity, entity attributes and entity attribute value format data.
The data processing system according to claim 14, wherein the data processing device further includes: cleaning the collected data for the first time to obtain the data after the first cleaning; The data after the first cleaning is cleaned for the Nth time to obtain the data after the Nth cleaning, where N is an integer greater than 1.
The data processing system of claim 14, wherein the system further includes:

The first database is used to store the obtained entity, entity attributes and entity attribute value format data into the database and generate a simulation diagram.
The data processing system of claim 14, wherein the sensor further collects data of each test point according to different time dimensions.
The data processing system according to any one of claims 14 to 19, wherein the collected data includes original data and additional data.
A data processing method for data tracking, characterized by:

Collect data, clean the collected data, and obtain the cleaned data;

Store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain data in the format of entities, entity attributes and entity attribute values;

Generate directed acyclic graph format data from the obtained entities, entity attributes and entity attribute value format data;

Enter an identifier to display the data to be played back;

Data playback is performed based on the directed acyclic graph format data to obtain the root cause.
An electronic device for data tracking, including a computing module and a storage module that stores a program, wherein the program includes instructions, and the instructions are executed by the computing module according to any one of claims 1-7. method.