WO2024000585A1 - Data processing method, apparatus, and system for data tracking and electronic device - Google Patents

Data processing method, apparatus, and system for data tracking and electronic device Download PDF

Info

Publication number
WO2024000585A1
WO2024000585A1 PCT/CN2022/103434 CN2022103434W WO2024000585A1 WO 2024000585 A1 WO2024000585 A1 WO 2024000585A1 CN 2022103434 W CN2022103434 W CN 2022103434W WO 2024000585 A1 WO2024000585 A1 WO 2024000585A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
entity
entity attribute
attribute value
format
Prior art date
Application number
PCT/CN2022/103434
Other languages
French (fr)
Chinese (zh)
Inventor
李聪超
陈嘉雯
刘晓南
陈维御
周林飞
王刚华
Original Assignee
西门子股份公司
西门子(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西门子股份公司, 西门子(中国)有限公司 filed Critical 西门子股份公司
Priority to PCT/CN2022/103434 priority Critical patent/WO2024000585A1/en
Publication of WO2024000585A1 publication Critical patent/WO2024000585A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models

Definitions

  • the invention relates to the field of information processing, in particular to a data processing method, device, system and electronic equipment for data tracking.
  • industrial systems are composed of many different and complex subsystems.
  • industrial systems include not only industrial protocol data sources, but also components of information technology systems.
  • results are produced through a series of processing steps. If you only look at the results themselves, a lot of information will be ignored. For example, the root cause of the error cannot be obtained.
  • Data processing frameworks mainly focus on data processing based on batch and stream processing.
  • Existing third-party open source software does not provide advanced tracking mechanisms for data fusion because it only focuses on functional features and has no maintainable processing step information.
  • the framework can be extended and include much transformation logic in industrial settings, such as when users want to analyze key performance indicators or artificial intelligence results, existing tools Unable to handle.
  • Existing tools are outcome-oriented rather than process-oriented. While the process appears fine under normal circumstances, the investigation steps are not transparent if the process encounters a problem.
  • Existing tools aim to decouple data processing steps, making the final result an introduced black box that cannot be accurately tracked.
  • the present invention proposes a data processing method, device, system and electronic equipment for data tracking, which can play back entities, entity attributes and entity attribute value format data to achieve transparent tracking of data.
  • the obtained entity, entity attributes and entity attribute value format data are generated into directed acyclic graph format data.
  • Collection module used to collect data
  • the cleaning module is used to clean the collected data and obtain the cleaned data
  • the entity, entity attribute and entity attribute value generation module is used to store the cleaned data in the entity, entity attribute and entity attribute value format, and obtain the entity, entity attribute and entity attribute value format data;
  • a directed acyclic graph generation module is used to generate directed acyclic graph format data from the obtained entities, entity attributes and entity attribute value format data.
  • a data processing device is used to clean the collected data and obtain cleaned data; store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain data in the format of entities, entity attributes and entity attribute values. ; and generate directed acyclic graph format data from the obtained entity, entity attributes and entity attribute value format data.
  • Data playback is performed based on the directed acyclic graph format data to obtain the root cause.
  • An electronic device for data tracking includes a computing module and a storage module that stores a program, wherein the program includes instructions for executing the method in the above embodiment by the computing module.
  • the data processing methods, devices, systems and electronic devices of the embodiments of the present invention store entities, entity attributes and entity attribute value format data, and generate directed acyclic graph format data from entities, entity attributes and entity attribute value format data.
  • entities, entity attributes and entity attribute value formats are unified data storage formats that can be used in standard databases, and entities, entity attributes and entity attribute value formats contain icons, graphics or charts and other transparent content, therefore entities, entity attributes and The entity attribute value format is applied to the methods, devices and systems of the embodiments of the present invention for data tracking, which can realize transparent tracking of stored data.
  • the data in entity, entity attribute and entity attribute value format generates directed acyclic graph format data.
  • the directed acyclic graph format data is based on a more fine-grained process monitoring data format and process design, this can enable customers to better Understand the working principle of the data processing process, and data can be called from any node based on the directed acyclic graph format data. This can more conveniently support the playback of the original data, allowing developers to find problems based on the playback of the original data. , and then adjust and optimize every aspect of data processing.
  • Figure 1 is a schematic scene diagram of one embodiment of traditional industrial system data tracking according to the present invention.
  • Figure 2 is a framework diagram of an embodiment of the industrial system of the present invention.
  • Figure 3 is a schematic flow chart of one embodiment of data processing for data tracking according to the present invention.
  • Figure 4 is a schematic data structure diagram of one embodiment of entity, entity attribute and entity attribute value format data of the present invention.
  • Figure 5 is a schematic diagram of the data structure of another embodiment of the present invention.
  • Figure 6 is a schematic diagram of a specific data structure according to another embodiment of the present invention.
  • Figure 7 is a schematic flow chart of an embodiment of the present invention for data playback
  • Figure 8 is a structural block diagram of an exemplary electronic device that can be used to implement embodiments of the present invention.
  • Processing pipeline 402 Description 403: Storage format
  • 502 Data structure of tree relationship
  • 503 Data structure of graph relationship
  • S301 Collect data and clean the collected data to obtain the cleaned data.
  • S302 Store the cleaned data in entity, entity attribute and entity attribute value format, and obtain entity, entity attribute and entity attribute value format data.
  • S702 Perform data playback based on directed acyclic graph format data to obtain the root cause.
  • the term “include” and its variations are open-ended, ie, “including but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”.
  • Relevant definitions of other terms will be given in the description below. It should be noted that concepts such as “first” and “second” mentioned in this disclosure are only used to distinguish different devices, modules or modules, and are not used to limit the order of functions performed by these devices, modules or modules. Or interdependence.
  • Figure 1 shows an application diagram of a traditional industrial system for data analysis and tracking.
  • the traditional industrial system 100 is used for data analysis and data tracking, and mainly includes the data process link 101 in the industrial process, where the data process link 101
  • Each link includes a test point.
  • the example in Figure 1 takes two test points as an example.
  • Each test point has a sensor for collecting data and monitoring the collected data.
  • the first test point has a first sensor 102 and the second test point has a second sensor 103.
  • the first sensor 102 and the second sensor 103 transmits the collected data to each specific device running in the data process link.
  • the first sensor 102 transmits the collected data to the device 104
  • the second sensor 103 transmits the collected data to the device 105. middle.
  • the device 104 and the device 105 can be actual devices such as valves and compressors.
  • the actual devices are devices actually used in the data collection and analysis process.
  • a specific monitoring system can also be taken as an example.
  • the actual devices The device is the specific application equipment that monitors each link during the data collection and analysis process.
  • the device 104 and the device 105 transmit and store the collected data into the database 106, and generate a simulation diagram based on the stored data, where the simulation diagram is a two-dimensional simulation diagram generated by the industrial system during startup based on existing software.
  • the root cause refers to the initial cause or key cause of the causal chain that leads to a certain outcome or consequence.
  • FIG. 2 shows an application diagram of an industrial system for data analysis and tracking according to an exemplary embodiment of the present disclosure.
  • the industrial system 200 can be any system that includes multiple modules for data analysis and tracking.
  • the industrial system 200 can be any system that includes multiple modules for data analysis and tracking.
  • 200 can be applied to various practical industrial systems such as alarm systems, detection systems or tracking systems.
  • data is the most powerful asset for digital innovation. But in most traditional industries, its final stages can only be seen in data storage.
  • the embodiment of the present invention adds a data collection step in the data flow link 201 (details are explained below with examples).
  • Each link in the data process link 201 includes a test point.
  • the example in Figure 2 takes two test points as an example.
  • Each test point has a sensor for collecting data and monitoring the collected data.
  • the first test point has a first sensor 202
  • the second test point has a second sensor 203 .
  • the first sensor 202 and the second sensor 203 transmits the collected data to each specific device running in the data process link. For example, the first sensor 202 transmits the collected data to the device 204, and the second sensor 203 transmits the collected data to the device.
  • the device 204 and the device 205 transmit and store the collected data into the database 206.
  • the collected data can be used to organize the data flow and can be merged into the database 206 to further generate simulation diagrams, such as SIMATIC PCS.
  • the database 206 further stores the stored data in the system 207, where the system can be a second database that stores data in a directed acyclic graph (DAG) format.
  • DAG directed acyclic graph
  • the directed acyclic graph format data shows the real system data processing logic.
  • a directed acyclic graph is a finite directed graph without directed cycles. Specifically, a directed acyclic graph consists of a limited number of vertices and directed edges.
  • a directed acyclic graph is a graph, which is a data structure like arrays, queues, linked lists, etc.
  • a graph is a data structure composed of vertices and edges connecting the vertices.
  • graphs are one of the most flexible data structures.
  • Many problems can be modeled and solved using graph models. For example, social networks between people, analyzing the topology of computer networks to determine whether two computers can communicate, finding the shortest path between two locations in logistics systems, etc.
  • the data process link 201 when collecting data, additional data with source or additional information can be linked into directed acyclic graph format data, so that all information can be migrated to the industrial system to clearly understand the data processing process working principle.
  • the data in the directed acyclic graph format in the system 207 can be regarded as a data flow skeleton, which is a high-standard picture generated by data processing.
  • the data processing system for data tracking shown in Figure 2 can be summarized as follows: a sensor for collecting data; and a data processing device for performing data cleaning on the collected data to obtain clean data.
  • the cleaned data is stored in the entity, entity attribute and entity attribute value format, the entity, entity attribute and entity attribute value format data are obtained, and the obtained entity, entity attribute and entity attribute value format data are Generate data in directed acyclic graph format.
  • entities, entity attributes and entity attribute values are also called subjects, predicates and objects (Subject-Predicate-Object, SPO).
  • entities, entity attributes and entity attribute values are a term for reading the database, indicating the format of describing the term.
  • Entities, entity attributes and entity attribute values are used in the field of knowledge graph, which is a knowledge representation form.
  • Knowledge graphs focus on describing the relationships between entities and are composed of some interconnected entities and their attributes.
  • the knowledge graph consists of pieces of knowledge, and each piece of knowledge can be represented as a triplet of an entity, entity attributes, and entity attribute values. Taking a job of the embodiment of the present invention as an example to describe entities, entity attributes and entity attribute values:
  • log:pJob job:123 where pJob is the entity attribute among entities, entity attributes and entity attribute values, job:123 is the entity attribute value among entities, entity attributes and entity attribute values;
  • log:nextJob job:124 where nextJob is the entity attribute among entities, entity attributes and entity attribute values, job:124 is the entity attribute value among entities, entity attributes and entity attribute values;
  • log:input srv:Content where input srv is the entity attribute among entities, entity attributes and entity attribute values, and Content is the entity attribute value among entities, entity attributes and entity attribute values;
  • log:processedBy srv:A where processedBy is the entity attribute among entities, entity attributes and entity attribute values, srv:A is the entity attribute value among entities, entity attributes and entity attribute values;
  • log:processedAt "2022-02-10T10:22:23", where processedAt is the entity attribute in the entity, entity attribute and entity attribute value, "2022-02-10T10:22:23" is the entity, entity attribute and entity attribute Entity attribute value in value;
  • log:statusCode 200 where statusCode is the entity attribute among entities, entity attributes, and entity attribute values, and 200 is the entity attribute value among entities, entity attributes, and entity attribute values.
  • data cleaning is a step of data cleaning, which refers to the process of detecting and correcting (or deleting) damaged or inaccurate records from record sets, data charts, or databases.
  • Data cleaning can identify incomplete, incorrect, inaccurate or irrelevant parts of the data and then replace, modify or delete the dirty or rough data.
  • Data cleaning can be performed interactively with data processing tools or batch processing through scripts. After data cleaning, a data set should be consistent with other similar data sets in the system.
  • the embodiment of the present invention stores entities, entity attributes and entity attribute value format data, and generates directed acyclic graph format data from the entities, entity attributes and entity attribute value format data.
  • entities, entity attributes and entity attribute value formats are unified data storage formats that can be used in standard databases, and entities, entity attributes and entity attribute value formats contain icons, graphics or charts and other transparent content, therefore entities, entity attributes and and entity attribute value formats are applied to the methods, devices and systems of examples of the present invention to track data, which can realize transparent tracking of stored data.
  • transparent tracking means that each step of the storage process in a complex black box data can be displayed transparently, and the original data can be accurately tracked.
  • the data in entity, entity attribute and entity attribute value format generates directed acyclic graph format data.
  • the directed acyclic graph format data is based on a more fine-grained process monitoring data format and process design, this can enable customers to better Understand the working principle of the data processing process, and data can be called from any node based on the directed acyclic graph format data.
  • This can more conveniently support the playback of the original data, allowing developers to find problems based on the playback of the original data. , and then adjust and optimize every aspect of data processing.
  • the playback tracks groups of original data based on the identifiers in the stored entities, entity attributes, and entity attribute value format data. Each time an identifier is entered as needed, the specific details in the stored entities, entity attributes, and entity attribute value format data can be displayed. The content of the node. In this way, the stored entity, entity attribute and entity attribute value format data can be displayed again from any node according to the playback, and the storage process of each data in the middle will be displayed, thereby accurately tracking the data.
  • Figure 3 is a schematic diagram of a method for data analysis and tracking in an exemplary industrial system of the present disclosure. The method specifically includes:
  • S301 Collect data, and clean the collected data to obtain cleaned data.
  • collecting data includes collecting original data and additional data.
  • the original data is the original data obtained when collecting data based on the sensors at each test point.
  • Each test point can be collected according to different Collect in the time dimension, for example: collect data in units of every millisecond, or collect in parallel expansion jobs in time, where parallel expansion jobs are to implement multiple job operations at the same time point at the same time.
  • each test point collects many different elements as needed during collection, such as collecting the values of the first time point and the second time point, and recording and storing them in the specific time periods of the first time point and the second time point on the timeline.
  • the original data can be played back based on the data stored in the specific time period of the first time point and the second time point.
  • Additional data is to add additional parameters as needed for better data analysis and monitoring.
  • additional information can be supplementary data, and the supplementary data is the upstream and downstream parameters configured as needed.
  • data cleaning of the collected data can be performed N times according to the needs of different test points, where N is a positive integer.
  • Data cleaning is a data preprocessing process, that is, some redundant values are removed during the preprocessing process. In special cases, the redundant values can be empty. For example, in industrial systems, data needs to be cleaned to remove noise.
  • the cleaning process can include different steps according to the needs of different test points.
  • S302 Store the cleaned data in entity, entity attribute and entity attribute value format, and obtain entity, entity attribute and entity attribute value format data.
  • the obtained entity, entity attributes and entity attribute value format data are stored in the asynchronous system.
  • This asynchronous system refers to a storage system that no longer needs to wait for storage confirmation or return.
  • the collected data is directly and synchronously stored in the asynchronous system. This way, there is no need to wait for storage confirmation or return, and storage is more convenient and faster.
  • the data collected in the asynchronous system is stored and can be called for convenient data playback when needed. Among them, all data to be played back will be transferred to the asynchronous system in the format of entities, entity attributes and entity attribute values, which will have minimal impact on the performance of the original storage system.
  • FIG. 4 is a structural schematic diagram of an exemplary embodiment of the present disclosure using entities, entity attributes and entity attribute value data formats to describe the data processing process and data storage format.
  • the processing pipeline 401 mainly includes four functional modules, namely the collection module 4011, The first cleaning module 4012, the second cleaning module 4013 and the data analysis module 4014.
  • the description 402 is the specific function description of each module, the function description 4021 is the collection of data, and the execution flow description of the specific steps is detailed in the above step S301.
  • Function description 4022 is cleaning step 1
  • function description 4023 is cleaning step 2
  • the execution flow description of the specific steps can be found in the above-mentioned step S301.
  • Function description 4024 performs data analysis on the prepared data. Specifically, the above-mentioned cleaned data is input into the data analysis module 4014 to perform data analysis on the input data. Specifically, the data analysis module 4014 can analyze the data collected and cleaned by the sensor at a certain test point. Optionally, the data collected by sensors at multiple test points can be cleaned and added for analysis. That is, the data analysis module 4014 can analyze the data collected by some devices according to user scenarios and specific needs. It can also be used for All data collected by the device are analyzed.
  • the data analysis module 4014 analyzes the cleaned data, mainly for preset maintenance of equipment or systems, such as life prediction in some systems and scenarios. Among them, the analysis performed by the data analysis module 4014 is mainly closely related to the value of each test point. Compared with the existing technology, the data analysis module 4014 is added in this embodiment, and the maintenance of the equipment or system is no longer a black box. More transparent analysis results can be obtained, and better maintenance of equipment or systems can be planned or other needs can be met.
  • Storage format 403 uses entities, entity attributes and entity attribute value formats to link the data processed in each step. Description 4031 corresponds to the description of the data collection step, description 4032 corresponds to the description of cleaning, and description 4033 is the continuing step. Description, where the above steps describe the relationship as follows:
  • Job ID 1 JobInfo—>Cleaning Job ID 2: JobInfo
  • the entity, entity attributes and entity attribute value format data can be designed as follows. Table 1 is just an example, and the embodiments of the present invention can also be implemented with other technologies. Among them, the entity, entity attribute and entity attribute value format data can be expanded according to the system design requirements. Table 1 illustrates some factors that the system may include.
  • Table 2 The example parameter descriptions in Table 1 are shown in Table 2:
  • S303 Obtain entity, entity attributes and entity attribute value format data to generate directed acyclic graph format data.
  • a directed acyclic graph is a finite directed graph without directed cycles.
  • a directed acyclic graph consists of a limited number of vertices and directed edges. Each directed edge points from one vertex to another; starting from any vertex, you cannot return to the original vertex through these directed edges.
  • a directed acyclic graph is a graph, which is a data structure like arrays, queues, linked lists, etc. There are many nodes on the graph, also called vertices, and those connecting two nodes are called edges.
  • Figure 5 is a schematic diagram of data structures of three progressive complexity levels of directed acyclic graphs in an exemplary embodiment of the present disclosure.
  • the three data structures include a data structure 501 of a linked list relationship, a data structure 502 of a tree relationship, and a data structure of a graph relationship.
  • the data structure 501 of the linked list relationship is a directional line.
  • the data structure 502 of the tree relationship is bifurcated, but there is only one path between any two nodes to reach another point, that is, a closed graph cannot be formed.
  • the graph relationship data structure 503 can have a closed graph.
  • the tree relational data structure 502 is also a special case of the graph relational data structure 503.
  • the last letter G in a directed acyclic graph refers to the graph.
  • the corresponding word D is directed, which means there is a clear direction.
  • There is a pointer to node B in node A but there is no pointer to the new node A in node B.
  • the pointer if drawn, is a one-way arrow from A to B.
  • the direction from one node to another node is unidirectional, which is the meaning of direction.
  • the word corresponding to A is acyclic, which means that the entire graph is not allowed to follow the arrow from one node and finally return to the starting point.
  • Figure 6 is a schematic diagram of a specific directed acyclic graph data structure in an exemplary embodiment of the present disclosure. Based on the above entities, entity attributes and entity attribute value formats and the parameters and definitions of Table 1 and Figure 2, it shows in detail Data upstream and downstream relationships between entities, entity attributes, and entity attribute value formats in the acyclic graph.
  • the namespaces in Figure 6 are as follows:
  • the directed acyclic graph format data of the embodiment of the present invention provided in Figure 6 shows the parameters and specific data of each link, for example, the relationship between data per second and the data of parallel expansion operations. display.
  • the directed acyclic graph format data is used to display a high-standard transparent processing process.
  • each result can be played back, so the generation of the result is no longer a black box.
  • traditional industrial systems do not use special entities, entity attributes, and entity attribute value formats to store and analyze data, that is, traditional industrial systems do not have a unified standard.
  • the present invention uses data storage in the format of entities, entity attributes and entity attribute values.
  • directed acyclic graph format data is generated, thus introducing an asynchronous system for data storage in the format of entities, entity attributes and entity attribute values.
  • the asynchronous system is independent of the traditional industrial system and will not affect the traditional industrial system.
  • Entities, entity attributes and entity attribute value format data can be processed using some advanced analysis methods and tools, such as directed acyclic graph format data processing.
  • the processed data can be played back as needed to achieve the present invention.
  • the main goal of the embodiment is to provide transparent tracking of data in data processing.
  • the processing steps can be decomposed into data collection, cleaning step 1, and cleaning step 2, and then the prepared data is fed to the data analysis module.
  • the data structure mentioned in Figure 2 only shows data flow information.
  • each batch or streaming job information needs to be tracked, and the output of each module is the upstream of another module.
  • a directed acyclic graph becomes complex, it is impossible to accurately investigate the root cause because detailed transformation information is added at each step in the data transformation process.
  • detailed transformation information is another time dimension of the matrix in Figure 2. All data can be viewed as a multi-dimensional matrix, which allows for easier operations, such as data conversion in the time dimension.
  • Each data transformation can be an N-N mapping matrix because the processing nodes scale the job in parallel.
  • the embodiment of the present invention stores entities, entity attributes and entity attribute value format data.
  • Entities, entity attributes and entity attribute value formats are unified data storage formats that can be used in standard databases.
  • the entities, entity attributes and entity attribute value formats are The data generates data in directed acyclic graph format, enabling transparent tracking of traditional industrial systems.
  • customers can better understand the working principles of data processing modules such as black boxes, and developers can better adjust the data processing module to support the playback of original data.
  • An embodiment of the present invention provides a data processing method for data tracking, which includes S301 to S303 in the above-mentioned Figure 3.
  • the specific implementation method is consistent with the above-mentioned Figure 3 and the description of the corresponding embodiment, and will not be described again.
  • the data processing method further includes the following S701 to S702, as shown in Figure 7 .
  • Figure 7 is a schematic diagram of a data playback method according to an exemplary embodiment of the present disclosure. The method includes:
  • S701 Enter the identifier to display the data to be played back.
  • the data to be played back can be all the original data or the original data of a specific module or node. Playback is to display the data processing process again.
  • the data processing process such as the data collection and data cleaning disclosed in the embodiments of Figure 2, Figure 3, and Figure 4, etc., each process in all the data will be displayed again.
  • the job ID can be played back and the entire treemap can be loaded from collection to data analysis. Within this tree, playback can be started in any trunk to verify data analysis results.
  • the embodiments disclosed in Figures 2, 3, and 4 above store entities, entity attributes, and entity attribute value formats based on collected data and additional information, and then rely on advanced transparency tools to describe virtual representations, such as using directed acyclic graph.
  • the upstream and downstream data relationships and connections of each data flow can be displayed in real time.
  • the line is the test point, which includes many elements or parameters, for example, the values of the first time point and the second time point, the first time point and the second time point on the timeline
  • the playback of the original data can be obtained by calling the data stored in the time period of the first time point and the second time point.
  • the directed acyclic graph format data is generated during the running process.
  • Two circles and one connection represent the upstream and downstream relationships.
  • the entities, entity attributes and entity attribute value formats are The data is stored in the asynchronous system database.
  • the original data can be automatically generated using the directed acyclic graph format data in the example in Figure 6.
  • S702 Perform data playback based on directed acyclic graph format data to obtain the root cause.
  • data in the directed acyclic graph format displays the parameters and specific data of each link, for example, the relationship between data per second, or the display of data that extends operations in parallel in time.
  • the industrial system is a mature industrial scenario.
  • the use of directed acyclic graphs can help programmers achieve data playback and trace the root cause of problems with simple transparency.
  • the data collected can also help users investigate the data processing module and help the data processing module owner better explain the data analysis functions.
  • Exemplary embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is used to cause the computer to execute the method according to the embodiment of the present disclosure. method.
  • Exemplary embodiments of the present disclosure also provide a computer program product, including a computer program, wherein the computer program, when executed by a processor of a computer, is used to cause the computer to perform a method according to an embodiment of the present disclosure.
  • Electronic devices are intended to refer to various forms of digital electronic computing equipment, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the electronic device 800 includes a computing module 801 that can perform calculations according to a computer program stored in a read-only memory (ROM) 802 or loaded from a storage module 808 into a random access memory (RAM) 803 . Perform various appropriate actions and processing. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored.
  • Computing module 801, ROM 802 and RAM 803 are connected to each other through bus 804.
  • An input/output (I/O) interface 805 is also connected to bus 804.
  • the input module 806 may be any type of device capable of inputting information to the electronic device 800.
  • the input module 806 may receive input numeric or character information and generate key signal inputs related to user settings and/or functional controls of the electronic device.
  • the output module 807 may be any type of device capable of presenting information, and may include, but is not limited to, a display, speakers, video/audio output terminal, vibrator, and/or printer.
  • the storage module 804 may include, but is not limited to, magnetic disks and optical disks.
  • the communication module 809 allows the electronic device 800 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunications networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chip Groups such as BluetoothTM devices, WiFi devices, WiMax devices, cellular communications devices and/or the like.
  • Computing module 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing module 801 include, but are not limited to, a central processing module (CPU), a graphics processing module (GPU), various dedicated artificial intelligence (AI) computing chips, various computing modules that run machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the computing module 801 performs various methods and processes described above. For example, in some embodiments, methods S301-S303, S701 to S702 may be implemented as a computer software program, which is tangibly embodied in a machine-readable medium, such as the storage module 808.
  • part or all of the computer program may be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication module 809.
  • the computing module 801 may be configured to perform methods S301-S303, S701 to S702 in any other suitable manner (eg, by means of firmware).
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor (eg, magnetic disk, optical disk, memory, programmable logic device (PLD)), including machine-readable media that receive machine instructions as machine-readable signals.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • Computer systems may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact over a communications network.
  • the relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.

Abstract

A data processing method for data tracking, the method comprising: collecting data, and cleaning the collected data to obtain cleaned data; storing the cleaned data as an entity, an entity attribute, and an entity attribute value format to obtain entity, entity attribute, and entity attribute value format data; and generating directed acyclic graph format data by using the obtained entity, entity attribute, and entity attribute value format data. In the method, the entity, the entity attribute, and the entity attribute value format are applied to tracking processing, so as to be able to implement transparent tracking on stored data. In addition, the entity, entity attribute, and entity attribute value format data generate the directed acyclic graph format data, which can enable a client to better understand a working principle in a data processing process, and the data can be called from any node on the basis of the directed acyclic graph format data, such that replay of original data can be supported more conveniently.

Description

用于数据跟踪的数据处理方法、装置、系统以及电子设备Data processing methods, devices, systems and electronic equipment for data tracking 技术领域Technical field
本发明涉及信息处理领域,特别是一种数据跟踪的数据处理方法、装置、系统以及电子设备。The invention relates to the field of information processing, in particular to a data processing method, device, system and electronic equipment for data tracking.
背景技术Background technique
现代工业系统是由很多不同复杂的子系统组成。尤其,在信息技术和运营技术大融合的背景下,工业系统不仅包括工业协议数据源,还有信息技术系统的组件。在数据分析领域,一些结果是通过一系列的处理步骤产生的。假设只看这些结果本身,很多信息会被忽略掉,例如,不能获得导致错误发生的根本原因。Modern industrial systems are composed of many different and complex subsystems. In particular, in the context of the great integration of information technology and operational technology, industrial systems include not only industrial protocol data sources, but also components of information technology systems. In the field of data analysis, some results are produced through a series of processing steps. If you only look at the results themselves, a lot of information will be ignored. For example, the root cause of the error cannot be obtained.
数据处理框架主要侧重于基于批处理和流处理的数据处理。现有的第三方开源软件没有为数据融合提供先进的跟踪机制,因为只关注功能特性,没有可维护的处理步骤信息。虽然包含的组件变得复杂,但正如许多公司所做的那样,该框架可以被扩展并包含许多转换逻辑在工业环境中,譬如当用户想要分析关键绩效指标或人工智能结果时,现有工具无法处理。现有工具是以结果为导向,而不是以过程为导向。该过程虽然在正常情况下看起来很好,但如果流程遇到问题,调查步骤并不透明。特别是在流式处理过程中,没有工具可以逐个模块地跟踪并生成结果的数据。现有工具旨在解耦数据处理步骤,使最终生成结果像引入的黑匣子,无法被精确跟踪。Data processing frameworks mainly focus on data processing based on batch and stream processing. Existing third-party open source software does not provide advanced tracking mechanisms for data fusion because it only focuses on functional features and has no maintainable processing step information. Although the included components become complex, as many companies do, the framework can be extended and include much transformation logic in industrial settings, such as when users want to analyze key performance indicators or artificial intelligence results, existing tools Unable to handle. Existing tools are outcome-oriented rather than process-oriented. While the process appears fine under normal circumstances, the investigation steps are not transparent if the process encounters a problem. Especially during streaming, there are no tools that can track and generate the resulting data module by module. Existing tools aim to decouple data processing steps, making the final result an introduced black box that cannot be accurately tracked.
发明内容Contents of the invention
有鉴于此,本发明提出了一种数据跟踪的数据处理方法、装置、系统以及电子设备,能够对实体、实体属性和实体属性值格式数据进行回放,实现数据的透明化跟踪。In view of this, the present invention proposes a data processing method, device, system and electronic equipment for data tracking, which can play back entities, entity attributes and entity attribute value format data to achieve transparent tracking of data.
根据本发明实施例提供的一种用于数据跟踪的数据处理方法,包括:A data processing method for data tracking provided according to an embodiment of the present invention includes:
收集数据,并对该收集的数据进行清理,获得清理后的数据;Collect data, clean the collected data, and obtain the cleaned data;
将该清理后的数据以实体、实体属性和实体属性值格式进行存储,获得实体、实体属性和实体属性值格式数据;Store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain data in the format of entities, entity attributes and entity attribute values;
将该获得的实体、实体属性和实体属性值格式数据生成有向无环图格式数据。The obtained entity, entity attributes and entity attribute value format data are generated into directed acyclic graph format data.
根据本发明实施例提供的一种用于数据跟踪的数据处理装置,包括:A data processing device for data tracking provided according to an embodiment of the present invention includes:
收集模块,用于收集数据;Collection module, used to collect data;
清理模块,用于对该收集的数据进行清理,获得清理后的数据;The cleaning module is used to clean the collected data and obtain the cleaned data;
实体、实体属性和实体属性值生成模块,用于将该清理后的数据以实体、实体属性和实体属性值格式进行存储,获得实体、实体属性和实体属性值格式数据;The entity, entity attribute and entity attribute value generation module is used to store the cleaned data in the entity, entity attribute and entity attribute value format, and obtain the entity, entity attribute and entity attribute value format data;
有向无环图生成模块,用于将该获得的实体、实体属性和实体属性值格式数据生成有向无环图格式数据。A directed acyclic graph generation module is used to generate directed acyclic graph format data from the obtained entities, entity attributes and entity attribute value format data.
根据本发明实施例提供的一种用于数据跟踪的数据处理系统,包括:A data processing system for data tracking provided according to an embodiment of the present invention includes:
传感器,用于收集数据;Sensors to collect data;
数据处理装置,用于对该收集的数据进行清理,获得清理后的数据;将该清理后的数据以实体、实体属性和实体属性值格式进行存储,获得实体、实体属性和实体属性值格式数据;并将该获得的实体、实体属性和实体属性值格式数据生成有向无环图格式数据。A data processing device is used to clean the collected data and obtain cleaned data; store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain data in the format of entities, entity attributes and entity attribute values. ; and generate directed acyclic graph format data from the obtained entity, entity attributes and entity attribute value format data.
根据本发明实施例提供的一种用于数据跟踪的数据处理方法,包括:A data processing method for data tracking provided according to an embodiment of the present invention includes:
收集数据,并对该收集的数据进行清理,获得清理后的数据;Collect data, clean the collected data, and obtain the cleaned data;
将该清理后的数据以实体、实体属性和实体属性值格式进行存储,获得实体、实体属性和实体属性值格式数据;Store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain data in the format of entities, entity attributes and entity attribute values;
将该获得的实体、实体属性和实体属性值格式数据生成有向无环图格式数据;Generate directed acyclic graph format data from the obtained entity, entity attributes and entity attribute value format data;
输入标识以显示要回放的数据;Enter an identifier to display the data to be played back;
根据该有向无环图格式数据进行数据回放以获得根本原因。Data playback is performed based on the directed acyclic graph format data to obtain the root cause.
根据本发明实施例提供的一种用于数据跟踪的电子设备,包括计算模块以及存储程序的存储模块,其中,该程序包括指令,该指令在由该计算模块执行上述实施例中的方法。An electronic device for data tracking provided according to an embodiment of the present invention includes a computing module and a storage module that stores a program, wherein the program includes instructions for executing the method in the above embodiment by the computing module.
本发明实施例数据处理方法、装置、系统以及电子设备采用对实体、实体属性和实体属性值格式数据进行存储,将实体、实体属性和实体属性值格式的数据生成有向无环图格式数据。因为实体、实体属性和实体属性值格式是可用于标准数据库中的统一数据存储格式,且实体、实体属性和实体属性值格式包含图标、图形或者图表和其他透明化的内容,因此实体、实体属性和实体属性值格式应用于本发明实示例的方法、装置和系统中进行数据跟踪,可实现对存储的数据进行透明化跟踪。同时实体、实体属性和实体属性值格式的数据生成有向无环图格式数据,由于有向无环图格式数据是基于更细粒度的流程监控数据格式和过程设计,如此可以使客户更好地了解数据处理过程中的工作原理,并且基于有向 无环图格式数据可以从任何一个节点进行数据的调用,如此可以更方便的支持原始数据的回放,使开发人员能够根据原始数据的回放发现问题,进而调整和优化数据处理的每个环节。The data processing methods, devices, systems and electronic devices of the embodiments of the present invention store entities, entity attributes and entity attribute value format data, and generate directed acyclic graph format data from entities, entity attributes and entity attribute value format data. Because entities, entity attributes and entity attribute value formats are unified data storage formats that can be used in standard databases, and entities, entity attributes and entity attribute value formats contain icons, graphics or charts and other transparent content, therefore entities, entity attributes and The entity attribute value format is applied to the methods, devices and systems of the embodiments of the present invention for data tracking, which can realize transparent tracking of stored data. At the same time, the data in entity, entity attribute and entity attribute value format generates directed acyclic graph format data. Since the directed acyclic graph format data is based on a more fine-grained process monitoring data format and process design, this can enable customers to better Understand the working principle of the data processing process, and data can be called from any node based on the directed acyclic graph format data. This can more conveniently support the playback of the original data, allowing developers to find problems based on the playback of the original data. , and then adjust and optimize every aspect of data processing.
附图说明Description of drawings
下文将以明确易懂的方式通过对优选实施例的说明并结合附图来对本发明上述特性、技术特征、优点及其实现方式予以进一步说明,其中:The above-mentioned characteristics, technical features, advantages and implementation methods of the present invention will be further explained in a clear and easy-to-understand manner through the description of the preferred embodiments in conjunction with the accompanying drawings, in which:
图1是本发明传统工业系统数据跟踪的一个实施例的场景示意图;Figure 1 is a schematic scene diagram of one embodiment of traditional industrial system data tracking according to the present invention;
图2是本发明工业系统的一个实施例的框架图;Figure 2 is a framework diagram of an embodiment of the industrial system of the present invention;
图3是本发明用于数据跟踪的数据处理的一个实施例的流程示意图;Figure 3 is a schematic flow chart of one embodiment of data processing for data tracking according to the present invention;
图4是本发明实体、实体属性和实体属性值格式数据的一个实施例的数据结构示意图;Figure 4 is a schematic data structure diagram of one embodiment of entity, entity attribute and entity attribute value format data of the present invention;
图5是本发明另一个实施例的数据结构示意图;Figure 5 is a schematic diagram of the data structure of another embodiment of the present invention;
图6是本发明另一个实施例的具体数据结构示意图;Figure 6 is a schematic diagram of a specific data structure according to another embodiment of the present invention;
图7是本发明用于数据回放的一个实施例的流程示意图;Figure 7 is a schematic flow chart of an embodiment of the present invention for data playback;
图8是能够用于实现本发明的实施例的示例性电子设备的结构框图。Figure 8 is a structural block diagram of an exemplary electronic device that can be used to implement embodiments of the present invention.
其中,附图标记如下:Among them, the reference signs are as follows:
100:传统工业系统         101、201:数据流程环节   102、202:第一传感器100: Traditional industrial system 101, 201: Data process link 102, 202: The first sensor
103、203:第二传感器      104、105:装置           106、206:数据库103, 203: Second sensor 104, 105: Device 106, 206: Database
200:工业系统             204、205:装置           207:系统200: Industrial system 204, 205: Device 207: System
401:处理流水线           402:描述                403:存储格式401: Processing pipeline 402: Description 403: Storage format
4011:收集模块            4012:第一清理模块       4013:第二清理模块4011: Collection module 4012: First cleaning module 4013: Second cleaning module
4014:数据分析模块        4021-4024:功能描述      501:链表关系的数据结构4014: Data analysis module 4021-4024: Function description 501: Data structure of linked list relationship
4031:描述                4032:描述               4033:描述4031: Description 4032: Description 4033: Description
502:树关系的数据结构     503:图关系的数据结构502: Data structure of tree relationship 503: Data structure of graph relationship
S301:收集数据,并对该收集的数据进行清理获得清理后的数据S301: Collect data and clean the collected data to obtain the cleaned data.
S302:将该清理后的数据以实体、实体属性和实体属性值格式进行存储,获得实体、实体属性和实体属性值格式数据S302: Store the cleaned data in entity, entity attribute and entity attribute value format, and obtain entity, entity attribute and entity attribute value format data.
S303:将获得实体、实体属性和实体属性值格式数据生成有向无环图格式数据S303: Generate directed acyclic graph format data from the obtained entities, entity attributes and entity attribute value format data
S701:输入标识以显示要回放的数据S701: Enter the identifier to display the data to be played back
S702:根据有向无环图格式数据进行数据回放以获得根本原因。S702: Perform data playback based on directed acyclic graph format data to obtain the root cause.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, which rather are provided for A more thorough and complete understanding of this disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that various steps described in the method implementations of the present disclosure may be executed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performance of illustrated steps. The scope of the present disclosure is not limited in this regard.
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或模块进行区分,并非用于限定这些装置、模块或模块所执行的功能的顺序或者相互依存关系。As used herein, the term "include" and its variations are open-ended, ie, "including but not limited to." The term "based on" means "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; and the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below. It should be noted that concepts such as “first” and “second” mentioned in this disclosure are only used to distinguish different devices, modules or modules, and are not used to limit the order of functions performed by these devices, modules or modules. Or interdependence.
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。It should be noted that the modifications of "one" and "plurality" mentioned in this disclosure are illustrative and not restrictive. Those skilled in the art will understand that unless the context clearly indicates otherwise, it should be understood as "one or Multiple”. The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are for illustrative purposes only and are not used to limit the scope of these messages or information.
图1示出了传统工业系统用于数据分析和跟踪的一个应用示意图,具体的,传统工业系统100用以数据分析和数据跟踪,主要包括工业过程中的数据流程环节101,其中数据流程环节101中每个环节都包括一个测试点,图1中示例以两个测试点为例。其中每个测试点都具有一个传感器用以进行数据的采集并对采集的数据进行监测,其中第一个测试点具有第一传感器102,第二测试点具有第二传感器103,其中第一传感器102和第二传感器103将采集的数据传输到数据流程环节中运行的每个具体装置中,例如第一传感器102将采集的数据传输到装置104中,第二传感器103将采集的数据传输到装置105中。以工业系统为例,该装置104和装置105可以是阀门、压缩机等实际装置,该实际装置是在数 据采集和分析过程中实际应用的装置,也可以以具体的监控系统为例,该实际装置是在数据采集和分析过程中监控每个环节的具体应用设备。其中装置104和装置105将采集的数据传输并存储到数据库106中,并根据存储的数据生成仿真图,其中该仿真图是工业系统根据现有软件在启动工作中生成的二维的仿真图。在该传统工业系统中针对每个测试点采集数据进行流式处理过程中,没有工具可以对每个模块的数据生成结果进行跟踪,进而使得结果的生成过程像黑匣子一样,无法精确回放原始数据或者产生错误的根本原因。其中根本原因是指导致某种结局或者后果的因果关系链条的初始原因或者关键原因。Figure 1 shows an application diagram of a traditional industrial system for data analysis and tracking. Specifically, the traditional industrial system 100 is used for data analysis and data tracking, and mainly includes the data process link 101 in the industrial process, where the data process link 101 Each link includes a test point. The example in Figure 1 takes two test points as an example. Each test point has a sensor for collecting data and monitoring the collected data. The first test point has a first sensor 102 and the second test point has a second sensor 103. The first sensor 102 and the second sensor 103 transmits the collected data to each specific device running in the data process link. For example, the first sensor 102 transmits the collected data to the device 104, and the second sensor 103 transmits the collected data to the device 105. middle. Taking an industrial system as an example, the device 104 and the device 105 can be actual devices such as valves and compressors. The actual devices are devices actually used in the data collection and analysis process. A specific monitoring system can also be taken as an example. The actual devices The device is the specific application equipment that monitors each link during the data collection and analysis process. The device 104 and the device 105 transmit and store the collected data into the database 106, and generate a simulation diagram based on the stored data, where the simulation diagram is a two-dimensional simulation diagram generated by the industrial system during startup based on existing software. In the process of streaming data collected for each test point in this traditional industrial system, there is no tool to track the data generation results of each module, which makes the result generation process like a black box and cannot accurately play back the original data or The root cause of the error. The root cause refers to the initial cause or key cause of the causal chain that leads to a certain outcome or consequence.
图2示出了本公开示例性实施例工业系统用于数据分析和跟踪的一个应用示意图,其中,工业系统200可以为任何包含多个模块进行数据分析和跟踪的系统,具体的,该工业系统200可以应用于报警系统、探测系统或者跟踪系统等各种实际应用的工业系统。在工业系统中,数据是数字创新最强大的资产。但在大多数传统工业中,只能在数据存储中看到其最后阶段。而本发明实施例为了增加数据的透明度,在数据流程环节201中增加了收集数据步骤(具体下面举例详细说明)。在实际工业系统应用中,在数据流程环节201中,不但增加应该收集哪些设备产生的数据,还增加上下游信息等因素或者参数。数据流程环节201中每个环节都包括一个测试点,图2中示例以两个测试点为例。其中每个测试点都具有一个传感器用以进行数据的采集并对采集的数据进行监测,其中第一个测试点具有第一传感器202,第二测试点具有第二传感器203,其中第一传感器202和第二传感器203将采集的数据传输到数据流程环节中的运行的每个具体装置中,例如第一传感器202将采集的数据传输到装置204中,第二传感器203将采集的数据传输到装置205中,其中装置204和装置205将采集的数据传输并存储到数据库206中,例如被收集的数据可用于整理数据流,并且可以合并到数据库206中进一步生成仿真图,例如SIMATIC PCS。该数据库206将存储的数据进一步存储在系统207中,其中该系统可为第二数据库,存储有向无环图(directed acyclic graph,DAG)格式数据,该有向无环图格式数据显示了真实的系统数据处理逻辑。有向无环图是一个没有有向循环的、有限的有向图。具体来说,有向无环图由有限个顶点和有向边组成,每条有向边都从一个顶点指向另一个顶点;从任意一个顶点出发都不能通过这些有向边回到原来的顶点。其中,有向无环图是一个图,图与数组、队列、链表等一样,都是一种数据结构。图是由顶点和连接顶点的边构成的数据结构,在计算机科学中,图是最灵活的数据结构之一,很多问题都可以使用图模型进行建模求解。例如,人与人之间的社交网络、分析计算机网络的拓扑结构已确定两台计算机是否可以通信、物流系统中找到两个地点之间的最短路径等。在数据流程环节201中,在收集数据时,带有源或附加信息的附加数据可以链接为有向无环图格式数据,如此所有信息都 可以迁移到工业系统中,以清楚地了解数据处理流程的工作原理。其中,在系统207中的有向无环图格式数据可视为数据流骨架,其为数据处理生成的高标准图片。FIG. 2 shows an application diagram of an industrial system for data analysis and tracking according to an exemplary embodiment of the present disclosure. The industrial system 200 can be any system that includes multiple modules for data analysis and tracking. Specifically, the industrial system 200 can be any system that includes multiple modules for data analysis and tracking. 200 can be applied to various practical industrial systems such as alarm systems, detection systems or tracking systems. In industrial systems, data is the most powerful asset for digital innovation. But in most traditional industries, its final stages can only be seen in data storage. In order to increase the transparency of the data, the embodiment of the present invention adds a data collection step in the data flow link 201 (details are explained below with examples). In actual industrial system applications, in the data process link 201, not only the data generated by which equipment should be collected is added, but also upstream and downstream information and other factors or parameters are added. Each link in the data process link 201 includes a test point. The example in Figure 2 takes two test points as an example. Each test point has a sensor for collecting data and monitoring the collected data. The first test point has a first sensor 202 , and the second test point has a second sensor 203 . The first sensor 202 and the second sensor 203 transmits the collected data to each specific device running in the data process link. For example, the first sensor 202 transmits the collected data to the device 204, and the second sensor 203 transmits the collected data to the device. In 205, the device 204 and the device 205 transmit and store the collected data into the database 206. For example, the collected data can be used to organize the data flow and can be merged into the database 206 to further generate simulation diagrams, such as SIMATIC PCS. The database 206 further stores the stored data in the system 207, where the system can be a second database that stores data in a directed acyclic graph (DAG) format. The directed acyclic graph format data shows the real system data processing logic. A directed acyclic graph is a finite directed graph without directed cycles. Specifically, a directed acyclic graph consists of a limited number of vertices and directed edges. Each directed edge points from one vertex to another; starting from any vertex, you cannot return to the original vertex through these directed edges. . Among them, a directed acyclic graph is a graph, which is a data structure like arrays, queues, linked lists, etc. A graph is a data structure composed of vertices and edges connecting the vertices. In computer science, graphs are one of the most flexible data structures. Many problems can be modeled and solved using graph models. For example, social networks between people, analyzing the topology of computer networks to determine whether two computers can communicate, finding the shortest path between two locations in logistics systems, etc. In the data process link 201, when collecting data, additional data with source or additional information can be linked into directed acyclic graph format data, so that all information can be migrated to the industrial system to clearly understand the data processing process working principle. Among them, the data in the directed acyclic graph format in the system 207 can be regarded as a data flow skeleton, which is a high-standard picture generated by data processing.
其中,图2所示的用于数据跟踪的数据处理系统,概括的,该数据处理系统可以包括:传感器用于收集数据;以及数据处理装置,用于对该收集的数据进行数据清理,获得清理后的数据;将该清理后的数据以实体、实体属性和实体属性值格式进行存储,获得实体、实体属性和实体属性值格式数据,并将该获得的实体、实体属性和实体属性值格式数据生成有向无环图格式数据。其中,实体、实体属性和实体属性值亦称为主题、谓词和对象(Subject-Predicate-Object,SPO),其中实体、实体属性和实体属性值是读数据库的一个术语,指示描述术语的格式。描述术语可以由很多方式,例如rdf(Resource Description Framework,资源描述框架)标准格式符合实体、实体属性和实体属性值术语的定义。实体、实体属性和实体属性值应用于知识图谱领域,知识图谱是一种知识表示形式。知识图谱侧重于描述实体与实体之间的关系,由一些相互连接的实体以及它们的属性构成的。知识图谱由一条条知识组成,每条知识可以表示为一个实体、实体属性和实体属性值三元组。以本发明实施例的一个作业作为实施例来描述实体、实体属性和实体属性值:Among them, the data processing system for data tracking shown in Figure 2 can be summarized as follows: a sensor for collecting data; and a data processing device for performing data cleaning on the collected data to obtain clean data. The cleaned data is stored in the entity, entity attribute and entity attribute value format, the entity, entity attribute and entity attribute value format data are obtained, and the obtained entity, entity attribute and entity attribute value format data are Generate data in directed acyclic graph format. Among them, entities, entity attributes and entity attribute values are also called subjects, predicates and objects (Subject-Predicate-Object, SPO). Among them, entities, entity attributes and entity attribute values are a term for reading the database, indicating the format of describing the term. Description terms can be described in many ways, such as the RDF (Resource Description Framework, Resource Description Framework) standard format that conforms to the definition of entities, entity attributes, and entity attribute value terms. Entities, entity attributes and entity attribute values are used in the field of knowledge graph, which is a knowledge representation form. Knowledge graphs focus on describing the relationships between entities and are composed of some interconnected entities and their attributes. The knowledge graph consists of pieces of knowledge, and each piece of knowledge can be represented as a triplet of an entity, entity attributes, and entity attribute values. Taking a job of the embodiment of the present invention as an example to describe entities, entity attributes and entity attribute values:
job:123link:next log:Transaction,其中job:123是实体、实体属性和实体属性值中的实体;job:123link:next log:Transaction, where job:123 is the entity in the entity, entity attributes and entity attribute values;
log:pJob job:123,其中pJob是实体、实体属性和实体属性值中的实体属性,job:123是实体、实体属性和实体属性值中的实体属性值;log:pJob job:123, where pJob is the entity attribute among entities, entity attributes and entity attribute values, job:123 is the entity attribute value among entities, entity attributes and entity attribute values;
log:nextJob job:124,其中nextJob是实体、实体属性和实体属性值中的实体属性,job:124是实体、实体属性和实体属性值中的实体属性值;log:nextJob job:124, where nextJob is the entity attribute among entities, entity attributes and entity attribute values, job:124 is the entity attribute value among entities, entity attributes and entity attribute values;
log:input srv:Content,其中input srv是实体、实体属性和实体属性值中的实体属性,Content是实体、实体属性和实体属性值中的实体属性值;log:input srv:Content, where input srv is the entity attribute among entities, entity attributes and entity attribute values, and Content is the entity attribute value among entities, entity attributes and entity attribute values;
log:processedBy srv:A,其中processedBy是实体、实体属性和实体属性值中的实体属性,srv:A是实体、实体属性和实体属性值中的实体属性值;log:processedBy srv:A, where processedBy is the entity attribute among entities, entity attributes and entity attribute values, srv:A is the entity attribute value among entities, entity attributes and entity attribute values;
log:processedAt"2022-02-10T10:22:23",其中processedAt是实体、实体属性和实体属性值中的实体属性,"2022-02-10T10:22:23"是实体、实体属性和实体属性值中的实体属性值;log:processedAt "2022-02-10T10:22:23", where processedAt is the entity attribute in the entity, entity attribute and entity attribute value, "2022-02-10T10:22:23" is the entity, entity attribute and entity attribute Entity attribute value in value;
log:statusCode 200,其中statusCode是实体、实体属性和实体属性值中的实体属性,200是实体、实体属性和实体属性值中的实体属性值。log:statusCode 200, where statusCode is the entity attribute among entities, entity attributes, and entity attribute values, and 200 is the entity attribute value among entities, entity attributes, and entity attribute values.
其中,数据清洗是对进行数据清洗的一个步骤,是指从记录集、数据图表或者数据库中检测和纠正(或删除)损坏或者不准确的记录过程。数据清洗可以识别数据的不完整、不正确、不准确或者不相关的部分,然后替换、修改或者删除脏数据或者粗数据。数据清洗可以与数据加工工具交互执行,也可以通过脚本进行批处理。数据清洗后,一个数据集应该与系统中其它类似的数据集保持一致。Among them, data cleaning is a step of data cleaning, which refers to the process of detecting and correcting (or deleting) damaged or inaccurate records from record sets, data charts, or databases. Data cleaning can identify incomplete, incorrect, inaccurate or irrelevant parts of the data and then replace, modify or delete the dirty or rough data. Data cleaning can be performed interactively with data processing tools or batch processing through scripts. After data cleaning, a data set should be consistent with other similar data sets in the system.
本发明实施例对实体、实体属性和实体属性值格式数据进行存储,将实体、实体属性和实体属性值格式的数据生成有向无环图格式数据。因为实体、实体属性和实体属性值格式是可用于标准数据库中的统一数据存储格式,且实体、实体属性和实体属性值格式包含图标、图形或者图表和其他透明化的内容,因此实体、实体属性和实体属性值格式应用于本发明实示例的方法、装置和系统对数据进行跟踪,可实现对存储的数据进行透明化跟踪。其中,透明化跟踪是指把一个复杂黑盒子的数据中的每一步存储过程都可以透明的显示出来,进而对原始数据进行精确的跟踪。同时实体、实体属性和实体属性值格式的数据生成有向无环图格式数据,由于有向无环图格式数据是基于更细粒度的流程监控数据格式和过程设计,如此可以使客户更好地了解数据处理过程中的工作原理,并且基于有向无环图格式数据可以从任何一个节点进行数据的调用,如此可以更方便的支持原始数据的回放,使开发人员能够根据原始数据的回放发现问题,进而调整和优化数据处理的每个环节。其中回放根据存储的实体、实体属性和实体属性值格式数据中的标识来跟踪原始数据的组,每次根据需要输入一个标识,从而能够显示存储的实体、实体属性和实体属性值格式数据中具体节点的内容。如此根据回放可以将存储的实体、实体属性和实体属性值格式数据从任何一个节点再显示一次,中间的每个数据的存储过程都会显示出来,进而对数据进行精确的跟踪。The embodiment of the present invention stores entities, entity attributes and entity attribute value format data, and generates directed acyclic graph format data from the entities, entity attributes and entity attribute value format data. Because entities, entity attributes and entity attribute value formats are unified data storage formats that can be used in standard databases, and entities, entity attributes and entity attribute value formats contain icons, graphics or charts and other transparent content, therefore entities, entity attributes and and entity attribute value formats are applied to the methods, devices and systems of examples of the present invention to track data, which can realize transparent tracking of stored data. Among them, transparent tracking means that each step of the storage process in a complex black box data can be displayed transparently, and the original data can be accurately tracked. At the same time, the data in entity, entity attribute and entity attribute value format generates directed acyclic graph format data. Since the directed acyclic graph format data is based on a more fine-grained process monitoring data format and process design, this can enable customers to better Understand the working principle of the data processing process, and data can be called from any node based on the directed acyclic graph format data. This can more conveniently support the playback of the original data, allowing developers to find problems based on the playback of the original data. , and then adjust and optimize every aspect of data processing. The playback tracks groups of original data based on the identifiers in the stored entities, entity attributes, and entity attribute value format data. Each time an identifier is entered as needed, the specific details in the stored entities, entity attributes, and entity attribute value format data can be displayed. The content of the node. In this way, the stored entity, entity attribute and entity attribute value format data can be displayed again from any node according to the playback, and the storage process of each data in the middle will be displayed, thereby accurately tracking the data.
图3为本公开示例性工业系统用于数据分析和跟踪的方法示意图,该方法具体包括:Figure 3 is a schematic diagram of a method for data analysis and tracking in an exemplary industrial system of the present disclosure. The method specifically includes:
S301:收集数据,并对该收集的数据进行清理获得清理后的数据。S301: Collect data, and clean the collected data to obtain cleaned data.
其中,收集数据包括收集原始数据与附加数据,在收集数据时,原始数据是根据在每个测试点的传感器进行数据的采集时获得的原始数据,其中每个测试点在采集时可以根据不同的时间维度进行采集,例如:以每毫秒为单位进行数据的采集,或者在时间上平行扩展作业的采集,其中平行扩展作业就是在同一时间点同时实现多个作业操作。另外,每个测试点在采集时根据需要采集很多不同的元素,例如采集第一时间点和第二时间点的值,在时间轴第一时间点和第二时间点具体的时间段上记录存储的数据,在数据回放时,根据该第一时间点和第二时间点具体的时间段上存储的数据就能回放该原始数据。附加数据是 对根据需要增加附加的参数,用于更好的数据分析与监测,例如附加的信息可为补充的数据,该补充的数据是根据需要配置的上下游参数。Among them, collecting data includes collecting original data and additional data. When collecting data, the original data is the original data obtained when collecting data based on the sensors at each test point. Each test point can be collected according to different Collect in the time dimension, for example: collect data in units of every millisecond, or collect in parallel expansion jobs in time, where parallel expansion jobs are to implement multiple job operations at the same time point at the same time. In addition, each test point collects many different elements as needed during collection, such as collecting the values of the first time point and the second time point, and recording and storing them in the specific time periods of the first time point and the second time point on the timeline. During data playback, the original data can be played back based on the data stored in the specific time period of the first time point and the second time point. Additional data is to add additional parameters as needed for better data analysis and monitoring. For example, additional information can be supplementary data, and the supplementary data is the upstream and downstream parameters configured as needed.
其中,对该收集的数据进行数据清理可以根据不同测试点的需要执行N次,N为正整数。其中数据清理是一个数据预处理过程,即在该预处理过程中去掉一些多余的值,在特殊情况下多余的值可以为空。例如在工业系统中,需要对数据进行清理以除去噪声,清理的过程根据不同测试点的需要可以包括不同的步骤。Among them, data cleaning of the collected data can be performed N times according to the needs of different test points, where N is a positive integer. Data cleaning is a data preprocessing process, that is, some redundant values are removed during the preprocessing process. In special cases, the redundant values can be empty. For example, in industrial systems, data needs to be cleaned to remove noise. The cleaning process can include different steps according to the needs of different test points.
S302:将该清理后的数据以实体、实体属性和实体属性值格式进行存储,获得实体、实体属性和实体属性值格式数据。S302: Store the cleaned data in entity, entity attribute and entity attribute value format, and obtain entity, entity attribute and entity attribute value format data.
其中,该获得实体、实体属性和实体属性值格式数据存储到异步系统中。该异步系统是指不需要再等待存储的确认或者返回的存储系统。为了避免对原有存储系统的影响,把收集的数据直接同步存储在异步系统中,如此不需要再等待存储的确认或者返回,存储更加方便快速。而且存储在异步系统中收集到的数据,在需要时进行调用可方便的进行数据的回放。其中,所有要回放数据都将以实体、实体属性和实体属性值格式传递到异步系统中,如此对原来存储系统性能产生的影响最小。Among them, the obtained entity, entity attributes and entity attribute value format data are stored in the asynchronous system. This asynchronous system refers to a storage system that no longer needs to wait for storage confirmation or return. In order to avoid impact on the original storage system, the collected data is directly and synchronously stored in the asynchronous system. This way, there is no need to wait for storage confirmation or return, and storage is more convenient and faster. Moreover, the data collected in the asynchronous system is stored and can be called for convenient data playback when needed. Among them, all data to be played back will be transferred to the asynchronous system in the format of entities, entity attributes and entity attribute values, which will have minimal impact on the performance of the original storage system.
本发明实施例中使用实体、实体属性和实体属性值格式存储数据,可以实现如下三个功能:数据透明度、多时间维度数据分析以及流数据回放。图4是本公开示例性实施例使用实体、实体属性和实体属性值数据格式来描述数据处理过程以及数据存储格式的结构示意图,其中处理流水线401主要包括四个功能模块,分别为收集模块4011、第一清理模块4012,第二清理模块4013以及数据分析模块4014。其中描述402为每个模块的具体功能描述,功能描述4021为收集数据,其具体步骤的执行流程描述详见上述步骤S301。功能描述4022为清理步骤1,功能描述4023为清理步骤2,其具体步骤的执行流程描述详见上述步骤S301。功能描述4024将准备好的数据进行数据分析,具体的就是将上述清理后的数据输入到数据分析模块4014中对其输入的数据进行数据分析。具体的,数据分析模块4014可以对某个测试点的传感器采集和清理的数据进行分析。可选的,可以对多个测试点的传感器采集的数据,进行清理和添加后进行分析,即数据分析模块4014可以根据用户场景和具体的需求,对部分装置采集的数据进行分析,亦可为全部装置采集的数据进行分析。数据分析模块4014对清理后的数据进行分析,主要是对设备或者系统进行预设性的维护,例如一些系统和场景中进行寿命的预测。其中,数据分析模块4014进行哪些分析主要跟每个测试点的值密切相关,相较于现有技术,本实施例中增加数据分析模块4014,设备或者系统的维护不再是一个黑盒子,其可获得更透明的分析结果,可对设 备或者系统规划更好的维护或者满足其他需求。存储格式403是采用实体、实体属性和实体属性值格式对每个步骤处理后的数据进行链接,其中描述4031对应是收集数据步骤的描述,描述4032对应的是清理的描述,描述4033是继续步骤的描述,其中上述步骤描述关系如下:In the embodiment of the present invention, entities, entity attributes and entity attribute value formats are used to store data, which can realize the following three functions: data transparency, multi-time dimension data analysis and stream data playback. Figure 4 is a structural schematic diagram of an exemplary embodiment of the present disclosure using entities, entity attributes and entity attribute value data formats to describe the data processing process and data storage format. The processing pipeline 401 mainly includes four functional modules, namely the collection module 4011, The first cleaning module 4012, the second cleaning module 4013 and the data analysis module 4014. The description 402 is the specific function description of each module, the function description 4021 is the collection of data, and the execution flow description of the specific steps is detailed in the above step S301. Function description 4022 is cleaning step 1, function description 4023 is cleaning step 2, and the execution flow description of the specific steps can be found in the above-mentioned step S301. Function description 4024 performs data analysis on the prepared data. Specifically, the above-mentioned cleaned data is input into the data analysis module 4014 to perform data analysis on the input data. Specifically, the data analysis module 4014 can analyze the data collected and cleaned by the sensor at a certain test point. Optionally, the data collected by sensors at multiple test points can be cleaned and added for analysis. That is, the data analysis module 4014 can analyze the data collected by some devices according to user scenarios and specific needs. It can also be used for All data collected by the device are analyzed. The data analysis module 4014 analyzes the cleaned data, mainly for preset maintenance of equipment or systems, such as life prediction in some systems and scenarios. Among them, the analysis performed by the data analysis module 4014 is mainly closely related to the value of each test point. Compared with the existing technology, the data analysis module 4014 is added in this embodiment, and the maintenance of the equipment or system is no longer a black box. More transparent analysis results can be obtained, and better maintenance of equipment or systems can be planned or other needs can be met. Storage format 403 uses entities, entity attributes and entity attribute value formats to link the data processed in each step. Description 4031 corresponds to the description of the data collection step, description 4032 corresponds to the description of cleaning, and description 4033 is the continuing step. Description, where the above steps describe the relationship as follows:
Gathering Job标识1:JobInfo—>Cleaning Job标识2:JobInfoGathering Job ID 1: JobInfo—>Cleaning Job ID 2: JobInfo
Figure PCTCN2022103434-appb-000001
Figure PCTCN2022103434-appb-000001
具体的在采集步骤和清理步骤中涉及的数据格式详见下面的表1和表2。The specific data formats involved in the collection steps and cleaning steps are detailed in Table 1 and Table 2 below.
实体、实体属性和实体属性值格式数据可以设计如下,表1只是一个示例,本发明实施例亦可用其他技术实现。其中,实体、实体属性和实体属性值格式数据可以根据系统设计要求进行扩展,表1说明该系统可能包含的一些因素。The entity, entity attributes and entity attribute value format data can be designed as follows. Table 1 is just an example, and the embodiments of the present invention can also be implemented with other technologies. Among them, the entity, entity attribute and entity attribute value format data can be expanded according to the system design requirements. Table 1 illustrates some factors that the system may include.
表1示例:Table 1 example:
Figure PCTCN2022103434-appb-000002
Figure PCTCN2022103434-appb-000002
Figure PCTCN2022103434-appb-000003
Figure PCTCN2022103434-appb-000003
表1中示例参数描述如表2所示:The example parameter descriptions in Table 1 are shown in Table 2:
表2Table 2
Figure PCTCN2022103434-appb-000004
Figure PCTCN2022103434-appb-000004
S303:将获得实体、实体属性和实体属性值格式数据生成有向无环图格式数据。S303: Obtain entity, entity attributes and entity attribute value format data to generate directed acyclic graph format data.
其中,本发明实施例提供的方案设计基础是在处理步骤的过程中跟踪这些步骤并且帮助工程人员来识别产生问题的根本原因。这些步骤能够被转换成有向无环图格式数据。其中有向无环图是一个没有有向循环的、有限的有向图。具体来说,有向无环图由有限个顶点和有向边组成,每条有向边都从一个顶点指向另一个顶点;从任意一个顶点出发都不能通过这些有向边回到原来的顶点。其中,有向无环图是一个图,图与数组、队列、链表等一样,都是一种数据结构。图上有很多节点,也叫做顶点,连接两个节点的叫做边。图5为本公开示例性实施例中有向无环图三个复杂度递进的数据结构示意图,其中三种数据结构包括链表关系的数据结构501,树关系的数据结构502,图关系的数据结构503。其中链表关系的数据结构501就是一条有方向的线。树关系的数据结构502是有分叉的,但是任意两个节点间只有一条路径能到达另外一点,也就是不能形成闭合的图形。而图关系的数据结构503是可以有闭合的图形的。特别的,树关系的数据结构502也属于图关系的数据结构503的一个特例。有向无环图最后一个字母G指的就是图,D对应单词是有向,也就是有明确的方向的意思,A节点中有指向B节点的指针,而B节点中是没有指向A新节点的指针,如果画出来就是一个从A到B的单向的箭头。在有向无环图中,一个节点到另外一个节点的指向是单向的,这就是有向的含义。A对应的单词是无环,意思是整 张图上不允许出现沿着箭头从一个节点出发最后又能回到起点的情况。Among them, the solution design basis provided by the embodiments of the present invention is to track these steps during the process and help engineers identify the root cause of the problem. These steps can be converted into directed acyclic graph format data. A directed acyclic graph is a finite directed graph without directed cycles. Specifically, a directed acyclic graph consists of a limited number of vertices and directed edges. Each directed edge points from one vertex to another; starting from any vertex, you cannot return to the original vertex through these directed edges. . Among them, a directed acyclic graph is a graph, which is a data structure like arrays, queues, linked lists, etc. There are many nodes on the graph, also called vertices, and those connecting two nodes are called edges. Figure 5 is a schematic diagram of data structures of three progressive complexity levels of directed acyclic graphs in an exemplary embodiment of the present disclosure. The three data structures include a data structure 501 of a linked list relationship, a data structure 502 of a tree relationship, and a data structure of a graph relationship. Structure 503. The data structure 501 of the linked list relationship is a directional line. The data structure 502 of the tree relationship is bifurcated, but there is only one path between any two nodes to reach another point, that is, a closed graph cannot be formed. The graph relationship data structure 503 can have a closed graph. In particular, the tree relational data structure 502 is also a special case of the graph relational data structure 503. The last letter G in a directed acyclic graph refers to the graph. The corresponding word D is directed, which means there is a clear direction. There is a pointer to node B in node A, but there is no pointer to the new node A in node B. The pointer, if drawn, is a one-way arrow from A to B. In a directed acyclic graph, the direction from one node to another node is unidirectional, which is the meaning of direction. The word corresponding to A is acyclic, which means that the entire graph is not allowed to follow the arrow from one node and finally return to the starting point.
图6为本公开示例性实施例中一个具体有向无环图数据结构示意图,其根据上述的实体、实体属性和实体属性值格式和表1与图2的参数以及定义,详细的展示了有向无环图中实体、实体属性和实体属性值格式的数据上下游关系。其中图6中命名空间(namespaces)如下:Figure 6 is a schematic diagram of a specific directed acyclic graph data structure in an exemplary embodiment of the present disclosure. Based on the above entities, entity attributes and entity attribute value formats and the parameters and definitions of Table 1 and Figure 2, it shows in detail Data upstream and downstream relationships between entities, entity attributes, and entity attribute value formats in the acyclic graph. The namespaces in Figure 6 are as follows:
job:http://example.org/data/job/job:http://example.org/data/job/
link:http://example.org/data/relation/link:http://example.org/data/relation/
srv:http://example.org/data/server/srv:http://example.org/data/server/
prefix log:http://example.org/ont/transaction-log/prefix log:http://example.org/ont/transaction-log/
xsd:<http://www.w3.org/2001/XMLSchema#xsd:<http://www.w3.org/2001/XMLSchema#
相比传统工业场景的简单关系,图6提供的本发明实施例有向无环图格式数据显示了每一个环节的参数和具体的数据,例如,每秒的数据的关系,平行扩展作业的数据的显示。Compared with the simple relationships in traditional industrial scenarios, the directed acyclic graph format data of the embodiment of the present invention provided in Figure 6 shows the parameters and specific data of each link, for example, the relationship between data per second and the data of parallel expansion operations. display.
其中,该有向无环图格式数据用以显示一个高标准的透明化处理流程。而且基于本发明的实施例,每个结果是可以被回放的,如此结果的产生不再是一个黑盒子。其中,每个步骤的上游和下游的都有不同的粒度记录,基于这些记录,本发明实施例不仅透明的显示数据处理的每个步骤,即将原始的数据和数据处理过程呈现出来,而且还为每个结果提供了各种源数据树的记录。此外,传统的工业系统中没有使用专门的实体、实体属性和实体属性值格式对数据进行存储和分析,即传统的工业系统没有一个统一的标准。而本发明使用实体、实体属性和实体属性值格式的数据存储,在数据的处理过程中,生成有向无环图格式数据,如此引入实体、实体属性和实体属性值格式数据存储的异步系统,该异步系统与传统的工业系统相互独立,因而不会对传统的工业系统造成影响。实体、实体属性和实体属性值格式数据可以用一些先进的分析方法和工具进行处理,例如进行有向无环图格式数据的处理,被处理的数据可以根据需要进行数据的回放,进而达到本发明实施例的主要目标,即在数据处理中提供数据透明化跟踪。Among them, the directed acyclic graph format data is used to display a high-standard transparent processing process. Moreover, based on the embodiment of the present invention, each result can be played back, so the generation of the result is no longer a black box. Among them, there are different granularity records upstream and downstream of each step. Based on these records, the embodiment of the present invention not only transparently displays each step of data processing, that is, presents the original data and data processing process, but also provides Each result provides records from various source data trees. In addition, traditional industrial systems do not use special entities, entity attributes, and entity attribute value formats to store and analyze data, that is, traditional industrial systems do not have a unified standard. The present invention uses data storage in the format of entities, entity attributes and entity attribute values. During the data processing process, directed acyclic graph format data is generated, thus introducing an asynchronous system for data storage in the format of entities, entity attributes and entity attribute values. The asynchronous system is independent of the traditional industrial system and will not affect the traditional industrial system. Entities, entity attributes and entity attribute value format data can be processed using some advanced analysis methods and tools, such as directed acyclic graph format data processing. The processed data can be played back as needed to achieve the present invention. The main goal of the embodiment is to provide transparent tracking of data in data processing.
根据图2和图4提供的本发明实施例的工业系统的数据处理工作原理,处理步骤可分解为数据收集、清理步骤1、清理步骤2,然后将准备好的数据馈送到数据分析模块中。图2中提到的数据结构仅显示了数据流信息,在常见的数据分析项目中,需要跟踪每个批处理或者流作业信息,每个模块的输出是另一个模块的上游。当有向无环图变得复杂时,无法准确的调查根本原因,因为在数据转换过程中,增加每个步骤中的详细转换信息。可 选的,详细的转换信息是图2中矩阵的另一个时间维度。所有数据都可以被视为一个多维矩阵,可以实现比较容易的操作,譬如时间维度上的数据转换。每个数据转换都可以是一个N-N映射矩阵,因为处理节点是可平行扩展作业的。According to the data processing working principle of the industrial system of the embodiment of the present invention provided in Figures 2 and 4, the processing steps can be decomposed into data collection, cleaning step 1, and cleaning step 2, and then the prepared data is fed to the data analysis module. The data structure mentioned in Figure 2 only shows data flow information. In common data analysis projects, each batch or streaming job information needs to be tracked, and the output of each module is the upstream of another module. When a directed acyclic graph becomes complex, it is impossible to accurately investigate the root cause because detailed transformation information is added at each step in the data transformation process. Optionally, detailed transformation information is another time dimension of the matrix in Figure 2. All data can be viewed as a multi-dimensional matrix, which allows for easier operations, such as data conversion in the time dimension. Each data transformation can be an N-N mapping matrix because the processing nodes scale the job in parallel.
本发明实施例对实体、实体属性和实体属性值格式数据进行存储,实体、实体属性和实体属性值格式是可用于标准数据库中的统一数据存储格式,将实体、实体属性和实体属性值格式的数据生成有向无环图格式数据,可实现对传统工业系统的透明化跟踪。同时基于更细粒度的流程监控数据格式和过程设计,可以使客户更好地了解黑匣子等数据处理模块的工作原理,并且可以使开发人员更好地调整数据处理模块,支持原始数据的回放。The embodiment of the present invention stores entities, entity attributes and entity attribute value format data. Entities, entity attributes and entity attribute value formats are unified data storage formats that can be used in standard databases. The entities, entity attributes and entity attribute value formats are The data generates data in directed acyclic graph format, enabling transparent tracking of traditional industrial systems. At the same time, based on the more fine-grained process monitoring data format and process design, customers can better understand the working principles of data processing modules such as black boxes, and developers can better adjust the data processing module to support the playback of original data.
本发明实施例提供的一种用于数据跟踪的数据处理方法,其包括上述图3中的S301至S303,具体实现方式与上述图3以及相应实施例描述一致,此不再赘述。更进一步,该数据处理方法进一步包括下面S701至S702,如图7所示。其中图7为本公开示例性实施例数据回放的方法示意图,该方法包括:An embodiment of the present invention provides a data processing method for data tracking, which includes S301 to S303 in the above-mentioned Figure 3. The specific implementation method is consistent with the above-mentioned Figure 3 and the description of the corresponding embodiment, and will not be described again. Furthermore, the data processing method further includes the following S701 to S702, as shown in Figure 7 . Figure 7 is a schematic diagram of a data playback method according to an exemplary embodiment of the present disclosure. The method includes:
S701:输入标识以显示要回放的数据。S701: Enter the identifier to display the data to be played back.
其中,要回放的数据可以为原始的所有数据或者具体某个模块或者某个节点的原始数据。回放是将数据的处理过程再显示一次,数据处理过程,例如上述图2、图3、图4实施例所揭示的数据的采集以及数据清理等,所有数据中间的每个过程都会重新显示出来。接下来将显示如何进行根本原因分析。当在数据分析步骤中发现问题时,可以在数据分析步骤中回放作业标识,从收集到数据分析可以加载整个树状图。在此树中,可以在任何树干中开始播放,以验证数据分析结果。Among them, the data to be played back can be all the original data or the original data of a specific module or node. Playback is to display the data processing process again. The data processing process, such as the data collection and data cleaning disclosed in the embodiments of Figure 2, Figure 3, and Figure 4, etc., each process in all the data will be displayed again. Next it is shown how to conduct root cause analysis. When a problem is discovered during the data analysis step, the job ID can be played back and the entire treemap can be loaded from collection to data analysis. Within this tree, playback can be started in any trunk to verify data analysis results.
其次,上述图2、图3、图4实施例所揭示基于收集到的数据和附加信息进行实体、实体属性和实体属性值格式储存,然后依靠先进的透明化工具来描述虚拟表示,例如使用有向无环图。其中在透明化的跟踪过程中可以对每个数据流的上下游的数据关系和连接的进行实时显示。在每个数据流的上下游关系中,其中线是测试点,其包括很多元素或者参数,例如,第一时间点和第二时间点的值,在时间轴第一时间点和第二时间点的时间段上存储的数据,在需要回放时,只要调用第一时间点和第二时间点的时间段上存储的数据就能获得原始数据的回放。Secondly, the embodiments disclosed in Figures 2, 3, and 4 above store entities, entity attributes, and entity attribute value formats based on collected data and additional information, and then rely on advanced transparency tools to describe virtual representations, such as using directed acyclic graph. During the transparent tracking process, the upstream and downstream data relationships and connections of each data flow can be displayed in real time. In the upstream and downstream relationships of each data flow, the line is the test point, which includes many elements or parameters, for example, the values of the first time point and the second time point, the first time point and the second time point on the timeline When the data stored in the time period needs to be played back, the playback of the original data can be obtained by calling the data stored in the time period of the first time point and the second time point.
其中,根据图6的示例,有向无环图格式数据是在运行过程中生成的,两个圆圈一个连线表示上下游关系,根据上述实施例描述,实体、实体属性和实体属性值格式的数据存在异步系统数据库中,当需要调用数据时,使用图6示例中的有向无环图格式数据可以自动生成原始数据。Among them, according to the example of Figure 6, the directed acyclic graph format data is generated during the running process. Two circles and one connection represent the upstream and downstream relationships. According to the description of the above embodiment, the entities, entity attributes and entity attribute value formats are The data is stored in the asynchronous system database. When the data needs to be called, the original data can be automatically generated using the directed acyclic graph format data in the example in Figure 6.
S702:根据有向无环图格式数据进行数据回放以获得根本原因。S702: Perform data playback based on directed acyclic graph format data to obtain the root cause.
相比传统工业场景的简单关系,有向无环图格式数据显示了每一个环节的参数和具体的数据,例如,每秒的数据的关系,或者在时间上平行扩展作业的数据的显示。本发明实施例中工业系统作为一个成熟的工业化的场景,使用有向无环图可以实现用简单的透明化帮助编程人员来实现数据回放和产生问题的根本原因的跟踪。收集到的数据还可以帮助用户对数据处理模块进行调查,帮助数据处理模块的所有者更好地解释数据分析功能。Compared with the simple relationships in traditional industrial scenarios, data in the directed acyclic graph format displays the parameters and specific data of each link, for example, the relationship between data per second, or the display of data that extends operations in parallel in time. In the embodiment of the present invention, the industrial system is a mature industrial scenario. The use of directed acyclic graphs can help programmers achieve data playback and trace the root cause of problems with simple transparency. The data collected can also help users investigate the data processing module and help the data processing module owner better explain the data analysis functions.
本公开示例性实施例还提供一种存储有计算机程序的非瞬时计算机可读存储介质,其中,所述计算机程序在被计算机的处理器执行时用于使所述计算机执行根据本公开实施例的方法。Exemplary embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is used to cause the computer to execute the method according to the embodiment of the present disclosure. method.
本公开示例性实施例还提供一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被计算机的处理器执行时用于使所述计算机执行根据本公开实施例的方法。Exemplary embodiments of the present disclosure also provide a computer program product, including a computer program, wherein the computer program, when executed by a processor of a computer, is used to cause the computer to perform a method according to an embodiment of the present disclosure.
参考图8,现将描述可以作为本公开的服务器或客户端的电子设备800的结构框图,其是可以应用于本公开的各方面的硬件设备的示例。电子设备旨在表示各种形式的数字电子的计算机设备,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。Referring to FIG. 8 , a structural block diagram of an electronic device 800 that may serve as a server or client of the present disclosure will now be described, which is an example of a hardware device that may be applied to aspects of the present disclosure. Electronic devices are intended to refer to various forms of digital electronic computing equipment, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图8所示,电子设备800包括计算模块801,其可以根据存储在只读存储器(ROM)802中的计算机程序或者从存储模块808加载到随机访问存储器(RAM)803中的计算机程序,来执行各种适当的动作和处理。在RAM 803中,还可存储设备800操作所需的各种程序和数据。计算模块801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。As shown in FIG. 8 , the electronic device 800 includes a computing module 801 that can perform calculations according to a computer program stored in a read-only memory (ROM) 802 or loaded from a storage module 808 into a random access memory (RAM) 803 . Perform various appropriate actions and processing. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. Computing module 801, ROM 802 and RAM 803 are connected to each other through bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
电子设备800中的多个部件连接至I/O接口805,包括:输入模块806、输出模块807、存储模块808以及通信模块809。输入模块806可以是能向电子设备800输入信息的任何类型的设备,输入模块806可以接收输入的数字或字符信息,以及产生与电子设备的用户设置和/或功能控制有关的键信号输入。输出模块807可以是能呈现信息的任何类型的设备,并且可以包括但不限于显示器、扬声器、视频/音频输出终端、振动器和/或打印机。存储模块804可以包括但不限于磁盘、光盘。通信模块809允许电子设备800通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据,并且可以包括但 不限于调制解调器、网卡、红外通信设备、无线通信收发机和/或芯片组,例如蓝牙TM设备、WiFi设备、WiMax设备、蜂窝通信设备和/或类似物。Multiple components in the electronic device 800 are connected to the I/O interface 805, including: an input module 806, an output module 807, a storage module 808, and a communication module 809. The input module 806 may be any type of device capable of inputting information to the electronic device 800. The input module 806 may receive input numeric or character information and generate key signal inputs related to user settings and/or functional controls of the electronic device. The output module 807 may be any type of device capable of presenting information, and may include, but is not limited to, a display, speakers, video/audio output terminal, vibrator, and/or printer. The storage module 804 may include, but is not limited to, magnetic disks and optical disks. The communication module 809 allows the electronic device 800 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunications networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chip Groups such as Bluetooth™ devices, WiFi devices, WiMax devices, cellular communications devices and/or the like.
计算模块801可以是各种具有处理和计算能力的通用和/或专用处理组件。计算模块801的一些示例包括但不限于中央处理模块(CPU)、图形处理模块(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算模块、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算模块801执行上文所描述的各个方法和处理。例如,在一些实施例中,方法S301-S303,S701到S702可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储模块808。在一些实施例中,计算机程序的部分或者全部可以经由ROM 802和/或通信模块809而被载入和/或安装到电子设备800上。在一些实施例中,计算模块801可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行方法S301-S303,S701到S702。 Computing module 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing module 801 include, but are not limited to, a central processing module (CPU), a graphics processing module (GPU), various dedicated artificial intelligence (AI) computing chips, various computing modules that run machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc. The computing module 801 performs various methods and processes described above. For example, in some embodiments, methods S301-S303, S701 to S702 may be implemented as a computer software program, which is tangibly embodied in a machine-readable medium, such as the storage module 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication module 809. In some embodiments, the computing module 801 may be configured to perform methods S301-S303, S701 to S702 in any other suitable manner (eg, by means of firmware).
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
如本公开使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor (eg, magnetic disk, optical disk, memory, programmable logic device (PLD)), including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具 有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection.

Claims (22)

  1. 一种用于数据跟踪的数据处理方法,其特征在于:A data processing method for data tracking, characterized by:
    收集数据,并对所述收集的数据进行清理,获得清理后的数据;Collect data, clean the collected data, and obtain the cleaned data;
    将所述清理后的数据以实体、实体属性和实体属性值格式进行存储,获得实体、实体属性和实体属性值格式数据;Store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain data in the format of entities, entity attributes and entity attribute values;
    将所述获得的实体、实体属性和实体属性值格式数据生成有向无环图格式数据。The obtained entity, entity attributes and entity attribute value format data are generated into directed acyclic graph format data.
  2. 根据权利要求1所述的数据处理方法,其中,所述方法进一步包括:The data processing method according to claim 1, wherein the method further includes:
    将所述获得的实体、实体属性和实体属性值格式数据输入到数据分析模块中进行数据分析。The obtained entity, entity attributes and entity attribute value format data are input into the data analysis module for data analysis.
  3. 根据权利要求1所述的数据处理方法,其中,所述方法进一步包括:The data processing method according to claim 1, wherein the method further includes:
    将所述获得的实体、实体属性和实体属性值格式数据输入到异步系统中进行存储。The obtained entity, entity attributes and entity attribute value format data are input into the asynchronous system for storage.
  4. 根据权利要求1所述的数据处理方法,其中,所述收集数据,并对所述收集的数据进行清理,获得清理后的数据,进一步包括:The data processing method according to claim 1, wherein the collecting data, cleaning the collected data, and obtaining the cleaned data further includes:
    收集数据,并对所述收集的数据进行第一次清理,获得第一次清理后的数据;Collect data, perform the first cleaning of the collected data, and obtain the data after the first cleaning;
    对所述获得的第一次清理后的数据进行第N次清理,获得第N次清理后的数据,其中N为大于1的整数。Perform the Nth cleaning on the obtained data after the first cleaning to obtain the Nth cleaning data, where N is an integer greater than 1.
  5. 根据权利要求1所述的数据处理方法,其中,所述将所述获得的实体、实体属性和实体属性值格式数据生成有向无环图之前,进一步包括:The data processing method according to claim 1, wherein before generating the directed acyclic graph from the obtained entities, entity attributes and entity attribute value format data, the method further includes:
    将所述获得的实体、实体属性和实体属性值格式数据存储到数据库中并生成仿真图。The obtained entity, entity attributes and entity attribute value format data are stored in a database and a simulation diagram is generated.
  6. 根据权利要求1所述的数据处理方法,其中,所述收集数据包括根据不同的时间维度收集每个测试点的数据。The data processing method according to claim 1, wherein collecting data includes collecting data of each test point according to different time dimensions.
  7. 根据权利要求1至6任何一项所述的数据处理方法,其中,所述收集数据包括原始数据以及附加数据。The data processing method according to any one of claims 1 to 6, wherein the collected data includes original data and additional data.
  8. 一种用于数据跟踪的数据处理装置,包括:A data processing device for data tracking, including:
    收集模块,用于收集数据;Collection module, used to collect data;
    清理模块,用于对所述收集的数据进行清理,获得清理后的数据;The cleaning module is used to clean the collected data and obtain the cleaned data;
    实体、实体属性和实体属性值生成模块,用于将所述清理后的数据以实体、实体属性和实体属性值格式进行存储,获得实体、实体属性和实体属性值格式数据;An entity, entity attribute and entity attribute value generation module is used to store the cleaned data in the entity, entity attribute and entity attribute value format, and obtain the entity, entity attribute and entity attribute value format data;
    有向无环图生成模块,用于将所述获得的实体、实体属性和实体属性值格式数据生成 有向无环图格式数据。A directed acyclic graph generation module is used to generate directed acyclic graph format data from the obtained entity, entity attributes and entity attribute value format data.
  9. 根据权利要求8所述的数据处理装置,其中,所述装置进一步包括:The data processing apparatus according to claim 8, wherein the apparatus further comprises:
    数据分析模块,用于分析输入的所述获得的实体、实体属性和实体属性值格式数据。A data analysis module, configured to analyze the input entity, entity attributes and entity attribute value format data obtained.
  10. 根据权利要求8所述的数据处理装置,其中,所述装置进一步包括:The data processing apparatus according to claim 8, wherein the apparatus further comprises:
    异步系统,用于存储所述获得的实体、实体属性和实体属性值格式数据。An asynchronous system for storing the obtained entity, entity attributes and entity attribute value format data.
  11. 根据权利要求8所述的数据处理装置,其中,清理模块进一步包括:The data processing apparatus according to claim 8, wherein the cleaning module further comprises:
    第一清理模块,用于对所述收集的数据进行第一次清理,获得第一次清理后的数据;The first cleaning module is used to clean the collected data for the first time and obtain the data after the first cleaning;
    第N清理模块,用于对所述获得的第一次清理后的数据进行第N次清理,获得第N次清理后的数据,其中N为大于1的整数。The Nth cleaning module is used to perform the Nth cleaning on the obtained data after the first cleaning, and obtain the Nth cleaning data, where N is an integer greater than 1.
  12. 根据权利要求8所述的数据处理装置,其中,所述收集模块进一步包括根据不同的时间维度收集每个测试点的数据。The data processing device according to claim 8, wherein the collection module further includes collecting data of each test point according to different time dimensions.
  13. 根据权利要求8至12任何一项所述的数据处理装置,其中,所述数据包括原始数据以及附加数据。The data processing device according to any one of claims 8 to 12, wherein the data includes original data and additional data.
  14. 一种用于数据跟踪的数据处理系统,包括:A data processing system for data tracking, including:
    传感器,用于收集数据;Sensors to collect data;
    数据处理装置,用于对所述收集的数据进行清理,获得清理后的数据;将所述清理后的数据以实体、实体属性和实体属性值格式进行存储,获得实体、实体属性和实体属性值格式数据;并将所述获得的实体、实体属性和实体属性值格式数据生成有向无环图格式数据。Data processing device, used to clean the collected data and obtain cleaned data; store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain entities, entity attributes and entity attribute values. Format data; and generate directed acyclic graph format data from the obtained entity, entity attributes and entity attribute value format data.
  15. 根据权利要求14所述的数据处理系统,其中,所述系统进一步包括:The data processing system of claim 14, wherein the system further includes:
    数据分析装置,用于分析输入的所述获得的实体、实体属性和实体属性值格式数据。A data analysis device, configured to analyze the input obtained entity, entity attributes and entity attribute value format data.
  16. 根据权利要求14所述的数据处理系统,其中,所述系统进一步包括:The data processing system of claim 14, wherein the system further includes:
    异步系统,用于存储所述获得的实体、实体属性和实体属性值格式数据。An asynchronous system for storing the obtained entity, entity attributes and entity attribute value format data.
  17. 根据权利要求14所述的数据处理系统,其中,所述数据处理装置进一步包括用于对所述收集的数据进行第一次清理,获得第一次清理后的数据;用于对所述获得的第一次清理后的数据进行第N次清理,获得第N次清理后的数据,其中N为大于1的整数。The data processing system according to claim 14, wherein the data processing device further includes: cleaning the collected data for the first time to obtain the data after the first cleaning; The data after the first cleaning is cleaned for the Nth time to obtain the data after the Nth cleaning, where N is an integer greater than 1.
  18. 根据权利要求14所述的数据处理系统,其中,所述系统进一步包括:The data processing system of claim 14, wherein the system further includes:
    第一数据库,用于将所述获得的实体、实体属性和实体属性值格式数据存储到数据库 中并生成仿真图。The first database is used to store the obtained entity, entity attributes and entity attribute value format data into the database and generate a simulation diagram.
  19. 根据权利要求14所述的数据处理系统,其中,所述传感器进一步包括根据不同的时间维度收集每个测试点的数据。The data processing system of claim 14, wherein the sensor further collects data of each test point according to different time dimensions.
  20. 根据权利要求14至19任何一项所述的数据处理系统,其中,所述收集数据包括原始数据以及附加数据。The data processing system according to any one of claims 14 to 19, wherein the collected data includes original data and additional data.
  21. 一种用于数据跟踪的数据处理方法,其特征在于:A data processing method for data tracking, characterized by:
    收集数据,并对所述收集的数据进行清理,获得清理后的数据;Collect data, clean the collected data, and obtain the cleaned data;
    将所述清理后的数据以实体、实体属性和实体属性值格式进行存储,获得实体、实体属性和实体属性值格式数据;Store the cleaned data in the format of entities, entity attributes and entity attribute values, and obtain data in the format of entities, entity attributes and entity attribute values;
    将所述获得的实体、实体属性和实体属性值格式数据生成有向无环图格式数据;Generate directed acyclic graph format data from the obtained entities, entity attributes and entity attribute value format data;
    输入标识以显示要回放的数据;Enter an identifier to display the data to be played back;
    根据所述有向无环图格式数据进行数据回放以获得根本原因。Data playback is performed based on the directed acyclic graph format data to obtain the root cause.
  22. 一种用于数据跟踪的电子设备,包括计算模块以及存储程序的存储模块,其中,所述程序包括指令,所述指令在由所述计算模块执行权利要求1-7中任一项所述的方法。An electronic device for data tracking, including a computing module and a storage module that stores a program, wherein the program includes instructions, and the instructions are executed by the computing module according to any one of claims 1-7. method.
PCT/CN2022/103434 2022-07-01 2022-07-01 Data processing method, apparatus, and system for data tracking and electronic device WO2024000585A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/103434 WO2024000585A1 (en) 2022-07-01 2022-07-01 Data processing method, apparatus, and system for data tracking and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/103434 WO2024000585A1 (en) 2022-07-01 2022-07-01 Data processing method, apparatus, and system for data tracking and electronic device

Publications (1)

Publication Number Publication Date
WO2024000585A1 true WO2024000585A1 (en) 2024-01-04

Family

ID=89383580

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/103434 WO2024000585A1 (en) 2022-07-01 2022-07-01 Data processing method, apparatus, and system for data tracking and electronic device

Country Status (1)

Country Link
WO (1) WO2024000585A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460166A (en) * 2020-03-03 2020-07-28 深圳壹账通智能科技有限公司 Entity relationship data processing method, device, terminal and storage medium
CN111797874A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Behavior prediction method, behavior prediction device, storage medium and electronic equipment
CN112926611A (en) * 2019-12-06 2021-06-08 京东数字科技控股有限公司 Feature extraction method, device and computer-readable storage medium
CN113220908A (en) * 2021-07-08 2021-08-06 杭州智会学科技有限公司 Knowledge graph matching method and device
CN113688191A (en) * 2021-08-27 2021-11-23 阿里巴巴(中国)有限公司 Feature data generation method, electronic device, storage medium, and program product
CN114385136A (en) * 2021-12-29 2022-04-22 武汉达梦数据库股份有限公司 Flow decomposition method and device for running ETL (extract transform load) by Flink framework

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797874A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Behavior prediction method, behavior prediction device, storage medium and electronic equipment
CN112926611A (en) * 2019-12-06 2021-06-08 京东数字科技控股有限公司 Feature extraction method, device and computer-readable storage medium
CN111460166A (en) * 2020-03-03 2020-07-28 深圳壹账通智能科技有限公司 Entity relationship data processing method, device, terminal and storage medium
CN113220908A (en) * 2021-07-08 2021-08-06 杭州智会学科技有限公司 Knowledge graph matching method and device
CN113688191A (en) * 2021-08-27 2021-11-23 阿里巴巴(中国)有限公司 Feature data generation method, electronic device, storage medium, and program product
CN114385136A (en) * 2021-12-29 2022-04-22 武汉达梦数据库股份有限公司 Flow decomposition method and device for running ETL (extract transform load) by Flink framework

Similar Documents

Publication Publication Date Title
US11636397B1 (en) Graphical user interface for concurrent forecasting of multiple time series
US11645471B1 (en) Determining a relationship recommendation for a natural language request
US11960575B1 (en) Data processing for machine learning using a graphical user interface
US11632383B2 (en) Predictive model selection for anomaly detection
US11741396B1 (en) Efficient command execution using aggregated compute units
US11288319B1 (en) Generating trending natural language request recommendations
US10657146B2 (en) Techniques for generating structured metrics from ingested events
US11670288B1 (en) Generating predicted follow-on requests to a natural language request received by a natural language processing system
US11915156B1 (en) Identifying leading indicators for target event prediction
US11409645B1 (en) Intermittent failure metrics in technological processes
US11269872B1 (en) Intent-based natural language processing system
US11921799B1 (en) Generating and using alert definitions
US11790623B1 (en) Manipulation of virtual object position within a plane of an extended reality environment
US11475053B1 (en) Providing completion recommendations for a partial natural language request received by a natural language processing system
US20230015186A1 (en) Partially typed semantic based query execution optimization
US11663109B1 (en) Automated seasonal frequency identification
US10586169B2 (en) Common feature protocol for collaborative machine learning
WO2024000585A1 (en) Data processing method, apparatus, and system for data tracking and electronic device
US20220043668A1 (en) System and methods for implementing a computer process automation tool
US11729074B1 (en) Online data decomposition
US11281689B1 (en) Distributed interaction feature generation system
KR101969531B1 (en) Method about automatically extracting and visualizing hierarchical information within a data collection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22948684

Country of ref document: EP

Kind code of ref document: A1