CN114860851A - Data processing method, device, equipment and storage medium - Google Patents
Data processing method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN114860851A CN114860851A CN202210395777.7A CN202210395777A CN114860851A CN 114860851 A CN114860851 A CN 114860851A CN 202210395777 A CN202210395777 A CN 202210395777A CN 114860851 A CN114860851 A CN 114860851A
- Authority
- CN
- China
- Prior art keywords
- data
- job
- original
- layer
- operation data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 36
- 238000003860 storage Methods 0.000 title claims abstract description 18
- 238000012545 processing Methods 0.000 claims abstract description 122
- 239000008280 blood Substances 0.000 claims abstract description 42
- 210000004369 blood Anatomy 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 10
- 238000004590 computer program Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 230000007547 defect Effects 0.000 abstract description 2
- 238000007726 management method Methods 0.000 description 15
- 238000004891 communication Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 7
- 238000011157 data evaluation Methods 0.000 description 7
- 238000013500 data storage Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 238000011068 loading method Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000013523 data management Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 101100242890 Quaranfil virus (isolate QrfV/Tick/Afghanistan/EG_T_377/1968) PA gene Proteins 0.000 description 1
- 101100247669 Quaranfil virus (isolate QrfV/Tick/Afghanistan/EG_T_377/1968) PB1 gene Proteins 0.000 description 1
- 101100242901 Quaranfil virus (isolate QrfV/Tick/Afghanistan/EG_T_377/1968) PB2 gene Proteins 0.000 description 1
- 101150025928 Segment-1 gene Proteins 0.000 description 1
- 101150082826 Segment-2 gene Proteins 0.000 description 1
- 101150027881 Segment-3 gene Proteins 0.000 description 1
- 101100242902 Thogoto virus (isolate SiAr 126) Segment 1 gene Proteins 0.000 description 1
- 101100194052 Thogoto virus (isolate SiAr 126) Segment 2 gene Proteins 0.000 description 1
- 101100242891 Thogoto virus (isolate SiAr 126) Segment 3 gene Proteins 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06313—Resource planning in a project environment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/08—Construction
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Entrepreneurship & Innovation (AREA)
- Quality & Reliability (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
- Biodiversity & Conservation Biology (AREA)
- Primary Health Care (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Computing Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of big data, and provides a data processing method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring original operation data corresponding to an operation scene; inputting the original operation data into a data processing model to obtain target operation data output by the data processing model, wherein the data processing model is obtained by training an original operation data sample and a target operation data sample; determining the incidence relation between the original job data and the target job data based on the service attribute of the job scene; and generating a blood relation map based on the incidence relation. The method and the device are used for overcoming the defect that the operation data is lack of effective management in the prior art and achieving efficient management of the operation data.
Description
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a data processing method, apparatus, device, and storage medium.
Background
With the rapid development of engineering infrastructure, the working machine generates a large amount of working data during working. However, the distribution of the job data in the prior art is dispersed and diversified, which causes various problems such as data non-specification, inconsistency, redundancy, and incapability of sharing, and further, the targeted management of the business category of the job data is lacking. In addition, due to the development of big data technology, the value of data is higher and higher, and the data becomes a cornerstone for making important strategies.
Therefore, how to improve the management of the operation data is an important issue to be solved in the industry.
Disclosure of Invention
The invention provides a data processing method, a data processing device, data processing equipment and a data processing storage medium, which are used for overcoming the defect that operation data in the prior art is lack of effective management and realizing efficient management of the operation data.
In a first aspect, the present invention provides a data processing method, including:
acquiring original operation data corresponding to an operation scene;
inputting the original operation data into a data processing model to obtain target operation data output by the data processing model, wherein the data processing model is obtained by training an original operation data sample and a target operation data sample;
determining the incidence relation between the original job data and the target job data based on the service attribute of the job scene;
and generating a blood relation map based on the incidence relation.
With reference to the first aspect, in one possible implementation manner, the data processing model includes: an operation data layer, a detail data layer, a summary data layer and an application data layer;
the inputting the original job data into a data processing model to obtain target job data output by the data processing model includes:
inputting the original operation data into the operation data layer to obtain the operation data to be processed output by the operation data layer;
inputting the job data to be processed into the detail data layer to obtain job data to be used output by the detail data layer;
inputting the to-be-used operation data into the summarized data layer to obtain to-be-applied operation data output by the summarized data layer;
and inputting the to-be-applied operation data into the application data layer to obtain the target operation data output by the application data layer.
With reference to the first aspect, in a possible implementation manner, the inputting the original job data into the data processing model to obtain target job data output by the data processing model includes:
inputting the original operation data into the operation data layer, and performing heterogeneous data synchronous processing on the original operation data through the operation data layer to obtain the operation data to be processed;
inputting the operation data to be processed into the detail data layer, and performing blood relationship extraction operation on the operation data to be processed through the detail data layer to obtain the operation data to be used;
inputting the job data to be used into the summarized data layer, and summarizing the job data to be used according to service requirements through the summarized data layer to obtain the job data to be applied;
and inputting the job data to be applied into the application data layer, and dividing the job data to be applied according to the job scene through the application data layer to obtain the target job data.
With reference to the first aspect, in a possible implementation manner, the determining an association relationship between the original job data and the target job data based on the service attribute of the job scenario includes:
determining a first incidence relation between the original job data and the job data to be processed based on the service attribute of the job scene;
determining a second incidence relation between the job data to be processed and the job data to be used;
determining a third association relationship between the job data to be used and the job data to be applied;
determining a fourth incidence relation between the job data to be applied and the target job data;
and taking the first incidence relation, the second incidence relation, the third incidence relation and the fourth incidence relation as the incidence relation.
With reference to the first aspect, in a possible implementation manner, after the inputting the original job data into a data processing model to obtain target job data output by the data processing model, the method further includes:
storing the target job data in a data mart corresponding to the business scenario;
and acquiring the operation data corresponding to the service requirement from the data mart.
With reference to the first aspect, in a possible implementation manner, after generating a blood margin map based on the association relationship, the method further includes:
and displaying the blood relationship map based on a pre-configured service processing node.
With reference to the first aspect, in a possible implementation manner, after the acquiring the job data corresponding to the business requirement from the data mart, the method further includes:
receiving a viewing instruction if the job data acquired from the data mart has a problem;
displaying the blood margin map in response to the viewing instruction;
determining the cause of the problem based on the blood relationship map;
optimizing the data processing model based on the cause of the problem occurrence.
In a second aspect, the present invention further provides a data processing apparatus, including:
the acquisition module is used for acquiring original job data corresponding to a job scene;
the processing module is used for inputting the original operation data into a data processing model to obtain target operation data output by the data processing model, and the data processing model is obtained by training an original operation data sample and a target operation data sample;
the determining module is used for determining the incidence relation between the original job data and the target job data based on the service attribute of the job scene;
and the generation module is used for generating a blood relation map based on the incidence relation.
In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the data processing method according to any one of the above methods when executing the computer program.
In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data processing method as described in any of the above.
According to the data processing method, the data processing device, the data processing equipment and the data processing storage medium, the original job data corresponding to the job scene is obtained, and the effective division of the original job data based on the job scene is realized; further, inputting the original operation data into a data processing model to obtain target operation data output by the data processing model; based on the business attributes of the operation scenes, the incidence relation between the original operation data and the target operation data is determined, so that the method can be used for carrying out centralized processing on the original operation data in any operation scene to obtain the target operation data meeting the specification, and determining the origin and destination of each operation data based on the incidence relation, so that the operation data has traceability and is convenient to trace the root and the source; and finally, based on the incidence relation, a blood relation map is generated, so that the effectiveness of operation data management and the traceability of operation data are improved, and an effective data basis is provided for subsequent service management.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart illustrating a data processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a data processing method according to another embodiment of the present invention;
FIG. 3 is a flow chart of a data processing method according to another embodiment of the present invention;
FIG. 4 is a flow chart illustrating a data processing method according to another embodiment of the present invention;
FIG. 5 is a flow chart illustrating a data processing method according to another embodiment of the present invention;
FIG. 6 is a flow chart illustrating a data processing method according to another embodiment of the present invention;
FIG. 7 is a flow chart illustrating a data processing method according to another embodiment of the present invention;
FIG. 8 is a schematic diagram of a data processing apparatus according to the present invention;
fig. 9 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The data processing method of the present invention is described below with reference to fig. 1 to 7.
The embodiment of the invention provides a data processing method which can be applied to an intelligent terminal and can also be applied to a server. The method is described below by taking the application to a server as an example, but it is needless to say that the method is only described here by way of example and the scope of protection is not limited. Other examples in the embodiments of the present invention are not limited to the scope of protection, and will not be described one by one.
Specifically, before explaining the data processing method, the following terms used in the embodiments of the present invention are introduced:
data mart: the data market is also called as a data market, meets the requirements of specific departments or users, stores data in a multidimensional mode, comprises defining dimensions, indexes needing to be calculated, the hierarchy of the dimensions and the like, and generates a data cube facing the requirements of decision analysis. In scope, data is extracted from enterprise-wide databases, data warehouses, or more specialized data warehouses.
Data blood relationship: that is, the context of data mainly includes the source of data, the processing method of data, the mapping relationship and the data export. The data blooding reason belongs to a part of the metadata, and the clear data blooding reason is the basis for maintaining the stability of the data platform, so that the data change influence analysis and the data problem investigation are facilitated. The data lineage includes dimensions from a data perspective, such as databases, tables, fields, systems, and applications, i.e., what tables of what databases the data is stored in, what the corresponding fields are, and attributes of the fields. From the perspective of business, the dimensionality of the data consanguinity mainly includes the business lines to which the data belong, the incidence relation among the business lines, and the front-back logic relation among the business data corresponding to the business lines.
Metadata: the data describing the data mainly describes the attributes of the data.
A big data platform: the system is a set of infrastructure mainly used for processing scenes such as mass data storage, calculation, streaming data real-time calculation and the like, and comprises a unified data acquisition center, a data calculation and storage center, a data management center, an operation and maintenance management and control center, an open sharing center, an application center and the like.
The data center station: the platform is a platform with the capabilities of data aggregation and integration, data purification and processing, data service visualization, data value change and the like.
Specifically, the data mart construction aims to extract data (based on data in various aspects such as different data centers, platforms, data sources, storage modes, types and languages) in original data islands into a temporary intermediate layer through data Extraction, conversion and Loading (extract-Transformation-Loading, abbreviated as ETL), then clean, convert and integrate the data (such as relationship data and plane data files) in distributed and heterogeneous data sources, and finally load the data into a data warehouse or a data mart.
The method and the system are used for carrying out business management based on the operation data in the field of industrial Internet of things operation machinery. The operation data has the characteristics of stronger specialty, relevance, flow, time sequence and analytic property, and the business management under the background needs to combine the business scene of the Internet of things to construct a data mart, ensure that needed business scene data is extracted from the data center at any time, analyze and model the data and provide relevant data services of the business center. Wherein, the data center station and the service center station jointly form a big data platform.
The invention aims to establish a business processing method based on an industrial Internet of things big data technology, and through metadata management, business association and blood relationship analysis, the problems of data multi-source isomerism, data loss, data repetition, data inconsistency and frequent metadata change in business are conveniently positioned, the data and business tracing is realized, and the application values of data service, business analysis and data mining are improved. The specific implementation of the method is shown in fig. 1:
In this embodiment, the original job data of a plurality of data sources corresponding to the job scene can be acquired, the centralized processing of the original job data of a plurality of sources is realized, and the accuracy in the later-stage problem tracing is higher.
Wherein the raw job data includes: technical metadata and business metadata, wherein the technical metadata comprises: field identification, field description, field type, field length, etc.; the service metadata includes: the system comprises the following components of operation time, operation conditions, equipment models, equipment location information, equipment dealers, equipment agents, equipment faults, fault reasons, solutions and the like.
Wherein, the data source includes: a data storage library and a work machine, wherein the data storage library comprises: distributed database (MPPDB), Relational Database (RDBMS), non-relational database (NoSQL), object storage file (S3-COS), and the like, and the data sources of work machines include work machine memory, vehicle control units, and the like, which may store work machine operational data.
Wherein, the operation scene includes: a geographical analysis scene, a fault diagnosis scene, a marketing scene and the like. In particular, real-time synchronization of work machine stored raw job data is performed via a data engine service, and/or timing synchronization of a data repository is performed via an ETL tool.
Specifically, a data synchronization instruction is generated, original operation data is acquired from a data storage library corresponding to an operation scene through the data synchronization instruction, and the original operation data is acquired from an operation machine corresponding to the operation scene based on a preset communication time length.
Specifically, a specific implementation manner of obtaining the original job data from the data repository corresponding to the job scenario through the data synchronization instruction is as follows:
taking the data storage library as a distributed database as an example for explanation:
sending a data synchronization instruction to a control node of the distributed database cluster, and acquiring original operation data stored in the distributed database; creating an external table, and storing original job data in the external table; and carrying out Hash operation on the unique identification of the operation machine in the original operation data so as to realize the balanced distribution of data of each data node and ensure the data balance of the data nodes. Wherein the unique identifier comprises a device serial number.
Wherein the data node represents a data server storage node.
Specifically, the configuration of heterogeneous data sources and the synchronization of data are performed through the ETL tool. Wherein, heterogeneous data source includes: greenplus, MySQL, MongoDB, Elasticissearch, Redis, HBase, SAP, MQTT, JMS, RabbitMQ, RocktMQ, and Kafka, among others.
The external table is a data file which only has table definition and no data in the database, and the data is stored outside the database.
In the following, the greenplus database is taken as an example for explanation:
the Greenplus can perform normal operations of adding and deleting change and check (DML) on an external table, and can realize that other heterogeneous data sources provide access and storage for external data through a built-in connector of a configuration file (pxf).
Specifically, based on the preset communication duration, a specific implementation manner of acquiring the original operation data from the operation machine corresponding to the operation scene is as follows:
in the following, the instantiation is specifically explained by fig. 2: taking Kafka as an example of a data source, the data source corresponds to N groups of operating machines, N is an integer greater than or equal to 1, and each group of operating machines is defined as: the method comprises the following steps that Partition1, Partition2 and Partition3 … … Partition N are adopted, each group of operation machines corresponds to one collector and is respectively represented by Reader1, Reader2 and Reader3 … … Reader N, and the collectors collect original operation data corresponding to the operation machines of each group; storing, by a processor, original job data in M data nodes, wherein M is an integer greater than or equal to 1, the data nodes being represented by Segment1, Segment2, and Segment3 … … Segment M, respectively.
Wherein each group of work machines comprises at least one work machine.
When the original operation data corresponding to each group of operation machines are stored in M data nodes, the hash operation is carried out on the unique identification of the operation machines in the original operation data, so that the data of each data node is distributed in a balanced manner, and the data balance of the data nodes is ensured.
And the hash operation comprises calculating a hash value corresponding to the unique identifier.
And 102, inputting the original operation data into the data processing model to obtain target operation data output by the data processing model.
The data processing model is obtained by training an original operation data sample and a target operation data sample.
In one embodiment, as shown in FIG. 3, the data processing model includes four processing layers: an operations data layer 301, a detail data layer 302, a summary data layer 303, and an applications data layer 304.
Inputting original job data into the operation data layer 301 to obtain job data to be processed output by the operation data layer 301;
inputting the job data to be processed into the detail data layer 302 to obtain the job data to be used output by the detail data layer 302;
inputting the job data to be used into the summarized data layer 303 to obtain the job data to be applied output by the summarized data layer 303;
the job data to be applied is input into the application data layer 304, and the target job data output by the application data layer 304 is obtained.
Specifically, after the data processing model outputs the target job data, the job data to be processed, the job data to be used, the job data to be applied, and the target job data are recorded.
The invention provides an effective data base for the subsequent analysis of the incidence relation between the operation data output by each processing layer by recording the operation data output by each processing layer.
In a specific embodiment, the specific implementation manner of each processing layer of the data processing model is as follows:
inputting original job data into an operation data layer 301, storing the original job data through the operation data layer 301, and performing heterogeneous data synchronous processing on the original job data to obtain job data to be processed;
inputting the operation data to be processed into the detail data layer 302, and performing blood relationship extraction operation on the operation data to be processed through the detail data layer 302 to obtain operation data to be used;
inputting the job data to be used into a summarized data layer 303, and summarizing the job data to be used through the summarized data layer 303 according to service requirements to obtain the job data to be applied;
the data of the job to be applied is input into the application data layer 304, and the data of the job to be applied is divided according to the job scene through the application data layer 304, so as to obtain the target job data.
Specifically, the operation data layer 301 can record a data source by storing original operation data, so that data tracing is facilitated; the detail data layer 302 retains the granularity of the original job data and provides the detail data; the summarized data layer 303 performs data summarization according to business requirements, where the business requirements include business data (e.g., broad sheets), business summaries (e.g., daily reports, monthly reports, annual reports, etc.), public index data, and the like; the data application layer 304 divides the job data to be applied based on the job scene.
Specifically, the details of the detail data layer 302 for performing the blood-related relation extraction operation on the to-be-processed job data are as follows:
the blood relationship extraction is an important means for creating a blood relationship map, and provides an important data guarantee for problem tracing in the later period.
In order to obtain the final consanguinity map, a table in the job data to be processed and a table in the job data to be used, a table in the job data to be used and a table in the job data to be applied, and a table in the job data to be applied and a table in the target job data are associated. The table in the job data to be processed is defined as an associated table, and the table in the job data to be applied is defined as an associated table. For example, when the table in the job data to be processed and the table in the job data to be used are associated, the table in the job data to be processed is an associated table, and the table in the job data to be used is an associated table.
For example, the table corresponding to the operation data layer 301 is the previous table, and the table corresponding to the detail data layer 302 is the next table, compared with the table corresponding to the operation data layer 301 and the detail data layer 302.
And after the associated table and the association table are associated and matched, associating the job data between the associated table and the association table, wherein the job data comprises technical metadata and business metadata. In the whole blood margin extraction process, all the associated tables depended on by the associated tables need to be recorded, and the generation paths of the associated tables are recorded, so that the problem tracing in the later period is facilitated. At this point, the bloody border relationship between the job data to be processed and the job data to be applied is determined.
Specifically, the invention decomposes complex tasks into a plurality of processing layers to complete through a data processing model, and each processing layer only processes simple tasks, thereby facilitating the positioning of problems. Moreover, the data is processed by a plurality of processing layers, repeated data calculation is reduced, the reusability of processing results of each processing layer is increased, and the data does not need to be acquired again due to different service scenes.
And 103, determining the incidence relation between the original job data and the target job data based on the service attribute of the job scene.
Wherein the target job data includes: technical metadata and business metadata, wherein the technical metadata comprises: field identification, field description, field type, field length, etc.; the service metadata includes: the system comprises the following components of operation time, operation conditions, equipment models, equipment location information, equipment dealers, equipment agents, equipment faults, fault reasons, solutions and the like.
It can be seen that the original job data and the target job data are both metadata, and the metadata of the target job data is used as seed data to record data source, data type and field description information, conversion process, matching relationship of each service metadata, service contact and the like of the technical metadata in the table. As shown in particular in fig. 4.
In a specific embodiment, a first association relation between original job data and job data to be processed is determined based on business attributes; determining a second incidence relation between the job data to be processed and the job data to be used; determining a third association relationship between the job data to be used and the job data to be applied; determining a fourth incidence relation between the job data to be applied and the target job data; and taking the first incidence relation, the second incidence relation, the third incidence relation and the fourth incidence relation as incidence relations.
The service attribute comprises: technical dependencies between technical metadata and business dependencies between business metadata.
Wherein the technical dependencies include: table-to-table dependencies, field-to-field dependencies, and the like; the service correlation includes: the relevance between the responsible persons of the table, the relevance of the business scenario indicated by the table, etc.
Wherein, the incidence relation is the business logic relation between the original table and the target table. The original table and the preceding table have the same meaning, and the target table and the following table have the same meaning.
The invention configures the business logic relationship between the original table and the target table through the visual page display, thereby realizing the effective management of business data and the rapid tracing of problem data.
Specifically, a business logic relationship between the original table and the target table is obtained based on the technology correlation between the technology metadata and the business correlation between the business metadata. According to the invention, the incidence relation among the job data output by each processing layer is determined to record the processing period from input to output of the data, so that the job data has traceability, the effective management of the job data is realized, and an effective data basis is provided for problem tracing.
In one embodiment, after the target job data is obtained, the target job data is stored in a data mart corresponding to the business scenario; and acquiring the job data corresponding to the business requirement from the data mart.
Therefore, the invention realizes the hierarchical circulation and the summarization of the data through the data processing model, and simultaneously establishes the data mart according to the service scene, thereby ensuring the effective integration of the data assets and simultaneously ensuring that the operation data is supplied to the service middlebox for use.
Specifically, the user can acquire the job data from the data mart through the service middlebox based on the service requirement of the user.
And 104, generating a blood margin map based on the association relation.
Specifically, metadata, a database, a table, a processing layer and business logic relations related to seed data are determined, and a blood relationship map is generated. An example of a blood-related map can be seen in figure 5.
Wherein, the ods _ dim _ history _ temperature, ods _ ms _ evi _ all _ machine _ df, dwd _ evi _ all _ machine _ di, ads _ evi _ all _ machine _ gis _ df, ads _ evi _ gis _ group, ads _ evi _ gis _ nonce, and ads _ evi _ gis _ info are table names respectively.
Specifically, a service processing flow is preset, the service processing flow includes a plurality of service processing nodes, each service processing node processes input data according to a preset service processing logic, and after obtaining output data, the output data is sent to the next service processing node. In the whole business processing flow process, the automatic association of the metadata and the business logic relation is realized, and the blood relation map is generated. A service processing flow is illustrated by taking a service scenario as geographical resolution as an example, and is specifically shown in fig. 6.
In one embodiment, the blood-margin map is displayed based on preconfigured business processing nodes.
Specifically, the user automatically loads the original table and the target table by clicking the service processing node, and simultaneously obtains the service logic relationship between the original table and the target table, and further displays the blood relation map based on the original table, the target table and the service logic relationship.
In a specific embodiment, after job data corresponding to a service requirement is acquired from a data mart, whether the acquired job data has a problem or not is determined, if the acquired job data has the problem, a viewing instruction is received and used for indicating and displaying a blood relationship map corresponding to a service processing node; displaying a blood margin map in response to a viewing instruction; determining the cause of the problem based on the blood relationship map; and optimizing the data processing model based on the problem occurrence reason. If no problem exists, the next operation is carried out according to the acquired operation data, and the next operation is the operation which is set by the user according to the requirement of the user.
Specifically, each job data corresponds to a respective data evaluation index, and whether the acquired job data has a problem is determined based on the data evaluation index. The data evaluation index is a preset rule for determining whether the operation data has problems.
For example, if the work data includes the rotation speed of the work machine, the data evaluation index corresponding to the rotation speed is: the rotation speed must be between a first preset value and a second preset value; for another example, if the operation data includes the power of the operation machine, the data evaluation index corresponding to the power is that the power must be between the third preset value and the fourth preset value; for another example, if the job data includes a job progress, the data evaluation index corresponding to the job progress is that the progress is unlikely to be smaller than the preset progress; and the like. Of course, in this case, the data evaluation index is only a simplest example, and is preset according to the circulation, the corresponding relationship, and the actual operation condition of each operation data.
Specifically, under the condition that the acquired job data is determined to have problems, a problem service processing node with problems is determined; after the problem service processing node is determined, checking a blood relation map corresponding to the problem service processing node through a checking instruction; screening tables related in the blood vessel relationship map, determining a table with problems, an original table corresponding to the table with problems and a service logic relationship between the original table and the table with problems, and determining the cause of the problems based on the determined service logic relationship; and optimizing the data processing model based on the cause of the problem. Specifically, a table corresponding to each service processing node and a service processing logic are verified to find a problem service processing node.
In the following, the big data platform is introduced by fig. 7:
the big data platform comprises: a data center station and a service center station; the data center station comprises: data source, data processing model and data mart; the service center station comprises: and (6) visualizing the interface.
Specifically, the user obtains job data corresponding to the business requirements from the data mart through the visual interface.
In the following, a data processing method is exemplarily described by taking a service scenario as an example of geographical resolution: data mart of geographical analysis is constructed through a data platform, and a service interface is provided for calling an interface module corresponding to a service middle station, wherein the interface module comprises: a real-time computing interface, a fence management interface and a grouping management interface.
The method comprises the steps of obtaining position information of the operation machinery, and loading the position information to a data mart in a data synchronization mode, wherein the whole process comprises the operations of data obtaining, data cleaning, data conversion, data integration and the like. And configuring a visual business processing flow, analyzing and obtaining the association relation between the original table and the target table through the business processing flow and metadata management, and obtaining the business logic relation between the original table and the target table on the basis of the association relation.
By clicking the service processing node, the blood relation graph can be displayed so as to realize the root tracing of the abnormal data.
According to the data processing method, the original job data corresponding to the job scene is obtained, so that the original job data is effectively divided based on the job scene; further, inputting the original operation data into a data processing model to obtain target operation data output by the data processing model; based on the business attributes of the operation scenes, the incidence relation between the original operation data and the target operation data is determined, so that the method can be used for carrying out centralized processing on the original operation data in any operation scene to obtain the target operation data meeting the specification, and determining the origin and destination of each operation data based on the incidence relation, so that the operation data has traceability and is convenient to trace the root and the source; and finally, based on the incidence relation, a blood relationship map is generated, so that the effectiveness of operation data management and the traceability of operation data are improved, and an effective data basis is provided for subsequent service management.
The following describes the data processing apparatus provided by the present invention, where the data processing apparatus described below and the data processing method described above may be referred to correspondingly, and repeated parts are not described again, as shown in fig. 8, the apparatus includes:
an obtaining module 801, which obtains original job data corresponding to a job scene;
the processing module 802 is configured to input original job data into a data processing model to obtain target job data output by the data processing model, where the data processing model is obtained by training an original job data sample and a target job data sample;
a determining module 803, configured to determine, based on the service attribute of the job scenario, an association relationship between the original job data and the target job data;
a generating module 804, configured to generate a blood relation map based on the association relationship.
In one embodiment, the data processing model comprises: an operation data layer, a detail data layer, a summary data layer and an application data layer; the processing module 802 is specifically configured to input original job data into an operation data layer, so as to obtain to-be-processed job data output by the operation data layer; inputting the job data to be processed into the detail data layer to obtain the job data to be used output by the detail data layer; inputting the operation data to be used into a summarized data layer to obtain the operation data to be applied output by the summarized data layer; and inputting the data of the operation to be applied into the application data layer to obtain the target operation data output by the application data layer.
In a specific embodiment, the processing module 802 is specifically configured to input original job data into an operation data layer, and perform heterogeneous data synchronization processing on the original job data through the operation data layer to obtain job data to be processed; inputting the operation data to be processed into a detail data layer, and performing blood relationship extraction operation on the operation data to be processed through the detail data layer to obtain operation data to be used; inputting the operation data to be used into a summary data layer, and summarizing the operation data to be used through the summary data layer according to the service requirements to obtain the operation data to be applied; and inputting the data of the operation to be applied into an application data layer, and dividing the data of the operation to be applied according to the operation scene through the application data layer to obtain target operation data.
In a specific embodiment, the determining module 803 is specifically configured to determine, based on the service attribute, a first association relationship between the original job data and the job data to be processed; determining a second incidence relation between the job data to be processed and the job data to be used; determining a third association relationship between the job data to be used and the job data to be applied; and determining a fourth incidence relation between the job data to be applied and the target job data, and taking the first incidence relation, the second incidence relation, the third incidence relation and the fourth incidence relation as incidence relations.
In one embodiment, the processing module 802 is further configured to store the target job data in a data mart corresponding to the business scenario; and acquiring the job data corresponding to the business requirement from the data mart.
In an embodiment, the generating module 804 is further configured to display the blood relationship graph based on a preconfigured business processing node.
In an embodiment, the processing module 802 is further configured to receive a viewing instruction if there is a problem in job data acquired from the data mart; displaying a blood margin map in response to a viewing instruction; determining the cause of the problem based on the blood relationship map; and optimizing the data processing model based on the problem occurrence reason.
Fig. 9 illustrates a physical structure diagram of an electronic device, and as shown in fig. 9, the electronic device may include: a processor (processor)901, a communication Interface (Communications Interface)902, a memory (memory)903 and a communication bus 904, wherein the processor 901, the communication Interface 902 and the memory 903 are in communication with each other via the communication bus 904. The processor 901 may call logic instructions in the memory 830 to perform a data processing method comprising: acquiring original operation data corresponding to an operation scene; inputting original operation data into a data processing model to obtain target operation data output by the data processing model, wherein the data processing model is obtained by training an original operation data sample and a target operation data sample; determining an incidence relation between original job data and target job data based on the service attribute of the job scene; and generating a blood relation map based on the association relationship.
In addition, the logic instructions in the memory 903 may be implemented in a software functional unit and stored in a computer readable storage medium when the logic instructions are sold or used as a separate product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the data processing method provided by the above embodiments, the method comprising: acquiring original operation data corresponding to an operation scene; inputting original operation data into a data processing model to obtain target operation data output by the data processing model, wherein the data processing model is obtained by training an original operation data sample and a target operation data sample; determining an incidence relation between original job data and target job data based on the service attribute of the job scene; and generating a blood relation map based on the association relationship.
The present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the data processing method provided by the above embodiments, the method including: acquiring original operation data corresponding to an operation scene; inputting original operation data into a data processing model to obtain target operation data output by the data processing model, wherein the data processing model is obtained by training an original operation data sample and a target operation data sample; determining an incidence relation between original job data and target job data based on the service attribute of the job scene; and generating a blood relation map based on the association relationship.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A data processing method, comprising:
acquiring original operation data corresponding to an operation scene;
inputting the original operation data into a data processing model to obtain target operation data output by the data processing model, wherein the data processing model is obtained by training an original operation data sample and a target operation data sample;
determining the incidence relation between the original job data and the target job data based on the service attribute of the job scene;
and generating a blood relation map based on the incidence relation.
2. The data processing method of claim 1, wherein the data processing model comprises: an operation data layer, a detail data layer, a summary data layer and an application data layer;
the inputting the original job data into a data processing model to obtain target job data output by the data processing model includes:
inputting the original operation data into the operation data layer to obtain the operation data to be processed output by the operation data layer;
inputting the job data to be processed into the detail data layer to obtain job data to be used output by the detail data layer;
inputting the to-be-used operation data into the summarized data layer to obtain to-be-applied operation data output by the summarized data layer;
and inputting the to-be-applied operation data into the application data layer to obtain the target operation data output by the application data layer.
3. The data processing method according to claim 2, wherein the inputting the raw job data into the data processing model to obtain the target job data output by the data processing model comprises:
inputting the original operation data into the operation data layer, and performing heterogeneous data synchronous processing on the original operation data through the operation data layer to obtain the operation data to be processed;
inputting the operation data to be processed into the detail data layer, and performing blood relationship extraction operation on the operation data to be processed through the detail data layer to obtain the operation data to be used;
inputting the to-be-used operation data into the summarized data layer, and summarizing the to-be-used operation data through the summarized data layer according to service requirements to obtain the to-be-applied operation data;
and inputting the job data to be applied into the application data layer, and dividing the job data to be applied according to the job scene through the application data layer to obtain the target job data.
4. The data processing method according to claim 2, wherein the determining the association relationship between the original job data and the target job data based on the business attribute of the job scenario comprises:
determining a first incidence relation between the original job data and the job data to be processed based on the service attribute of the job scene;
determining a second incidence relation between the job data to be processed and the job data to be used;
determining a third association relationship between the job data to be used and the job data to be applied;
determining a fourth incidence relation between the to-be-applied job data and the target job data;
and taking the first incidence relation, the second incidence relation, the third incidence relation and the fourth incidence relation as the incidence relation.
5. The data processing method according to any one of claims 1 to 4, wherein after inputting the raw job data into a data processing model to obtain target job data output by the data processing model, the method further comprises:
storing the target operation data in a data mart corresponding to the service scene;
and acquiring the operation data corresponding to the service requirement from the data mart.
6. The data processing method of claim 5, wherein after generating the blood margin map based on the correlation, further comprising:
and displaying the blood relationship map based on a pre-configured service processing node.
7. The data processing method according to claim 6, further comprising, after acquiring job data corresponding to a business demand from the data mart:
receiving a viewing instruction if the job data acquired from the data mart has a problem;
displaying the blood margin map in response to the viewing instruction;
determining the cause of the problem based on the blood relationship map;
optimizing the data processing model based on the cause of the problem occurrence.
8. A data processing apparatus, comprising:
the acquisition module is used for acquiring original job data corresponding to a job scene;
the processing module is used for inputting the original operation data into a data processing model to obtain target operation data output by the data processing model, and the data processing model is obtained by training an original operation data sample and a target operation data sample;
the determining module is used for determining the incidence relation between the original job data and the target job data based on the service attribute of the job scene;
and the generation module is used for generating a blood relation map based on the incidence relation.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the data processing method according to any one of claims 1 to 7 when executing the computer program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the data processing method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210395777.7A CN114860851A (en) | 2022-04-14 | 2022-04-14 | Data processing method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210395777.7A CN114860851A (en) | 2022-04-14 | 2022-04-14 | Data processing method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114860851A true CN114860851A (en) | 2022-08-05 |
Family
ID=82631640
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210395777.7A Withdrawn CN114860851A (en) | 2022-04-14 | 2022-04-14 | Data processing method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114860851A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117725156A (en) * | 2023-11-02 | 2024-03-19 | 广东广信通信服务有限公司 | Method, system, device and medium for processing association of business data and financial data |
-
2022
- 2022-04-14 CN CN202210395777.7A patent/CN114860851A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117725156A (en) * | 2023-11-02 | 2024-03-19 | 广东广信通信服务有限公司 | Method, system, device and medium for processing association of business data and financial data |
CN117725156B (en) * | 2023-11-02 | 2024-05-24 | 广东广信通信服务有限公司 | Method, system, device and medium for processing association of business data and financial data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11645471B1 (en) | Determining a relationship recommendation for a natural language request | |
US10761687B2 (en) | User interface that facilitates node pinning for monitoring and analysis of performance in a computing environment | |
US10592525B1 (en) | Conversion of cloud computing platform data for ingestion by data intake and query system | |
US10997192B2 (en) | Data source correlation user interface | |
CN111459766A (en) | Calling chain tracking and analyzing method for micro-service system | |
US20110191361A1 (en) | System and method for building a cloud aware massive data analytics solution background | |
US11494395B2 (en) | Creating dashboards for viewing data in a data storage system based on natural language requests | |
US9201700B2 (en) | Provisioning computer resources on a network | |
US11676345B1 (en) | Automated adaptive workflows in an extended reality environment | |
CN113468159A (en) | Data application full-link management and control method and system | |
US11824729B2 (en) | Generating a three-dimensional cityscape including a cluster of nodes | |
CN114428822B (en) | Data processing method and device, electronic equipment and storage medium | |
CN114218218A (en) | Data processing method, device and equipment based on data warehouse and storage medium | |
CN111414410A (en) | Data processing method, device, equipment and storage medium | |
CN113553341A (en) | Multidimensional data analysis method, multidimensional data analysis device, multidimensional data analysis equipment and computer readable storage medium | |
CN107153702A (en) | A kind of data processing method and device | |
CN111639016A (en) | Big data log analysis method and device and computer storage medium | |
CN114860851A (en) | Data processing method, device, equipment and storage medium | |
CN111125045B (en) | Lightweight ETL processing platform | |
CN107679097A (en) | A kind of distributed data processing method, system and storage medium | |
CN110389944B (en) | Metadata management system and method based on model | |
US11023485B2 (en) | Cube construction for an OLAP system | |
CN117076579A (en) | Method, device, equipment and storage medium for displaying data blood relationship | |
CN116910032A (en) | Method, device, equipment and storage medium for migrating data marts | |
CN107357919A (en) | User behaviors log inquiry system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220805 |
|
WW01 | Invention patent application withdrawn after publication |