CN116150455A - Heterogeneous data analysis method - Google Patents

Heterogeneous data analysis method Download PDF

Info

Publication number
CN116150455A
CN116150455A CN202310402595.2A CN202310402595A CN116150455A CN 116150455 A CN116150455 A CN 116150455A CN 202310402595 A CN202310402595 A CN 202310402595A CN 116150455 A CN116150455 A CN 116150455A
Authority
CN
China
Prior art keywords
data
dynamic
slope
feature
parallelism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310402595.2A
Other languages
Chinese (zh)
Other versions
CN116150455B (en
Inventor
戚红建
王宇飞
韩硕
宋成风
张强
秦绪帅
李伟
刘誉杰
李磊
徐衍斐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Bidding Branch Of China Huaneng Group Co ltd
Huaneng Information Technology Co Ltd
Original Assignee
Beijing Bidding Branch Of China Huaneng Group Co ltd
Huaneng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Bidding Branch Of China Huaneng Group Co ltd, Huaneng Information Technology Co Ltd filed Critical Beijing Bidding Branch Of China Huaneng Group Co ltd
Priority to CN202310402595.2A priority Critical patent/CN116150455B/en
Publication of CN116150455A publication Critical patent/CN116150455A/en
Application granted granted Critical
Publication of CN116150455B publication Critical patent/CN116150455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a heterogeneous data analysis method, which comprises the following steps: screening the heterogeneous data according to the process architecture to obtain a first data set; extracting a parallelism characteristic interval of each first data in the first data set to obtain a parallelism requirement type of each first data; based on the parallelism demand type, matching a corresponding instruction set from a type-instruction database to obtain an instruction sequence; based on the instruction sequence, the corresponding working units are matched to execute corresponding work. And each independent data in the heterogeneous data is more comprehensively analyzed, so that the mastering of important data in the heterogeneous data is facilitated, and the analysis capability of the heterogeneous data analysis is further improved.

Description

Heterogeneous data analysis method
Technical Field
The invention relates to the technical field of data analysis, in particular to a heterogeneous data analysis method.
Background
At present, with the development of scientific technology, the Internet is also more and more similar to the daily life of people, and the development of productivity and production technology also depends on the Internet. The result of the evolution of the internet is an explosive growth in the volume of data, and a computer is required to process a vast amount of heterogeneous data. Whether data mining or artificial intelligence modeling is performed, the first step is to access data, and the most important step of accessing data is to process heterogeneous data, however, at present, in the process of processing heterogeneous data, each heterogeneous data is generally and individually analyzed in sequence according to a set flow, and as the analysis mode is single and the individual analysis process may be too frequent, accurate analysis on the heterogeneous data cannot be performed.
Therefore, the invention provides a heterogeneous data analysis method.
Disclosure of Invention
The invention provides a heterogeneous data analysis method, which is used for screening heterogeneous data according to a process architecture to obtain first data sets with independent data, extracting parallelism characteristic intervals of each first data to obtain parallelism requirement types of each first data, matching corresponding instruction sets, constructing working units with corresponding instruction sequences to execute corresponding work, analyzing each independent data in the heterogeneous data more comprehensively, facilitating mastering important data in the heterogeneous data, and further improving analysis capability of heterogeneous data analysis.
The invention provides a heterogeneous data analysis method, which comprises the following steps:
step 1: screening the heterogeneous data according to the process architecture to obtain a first data set;
step 2: extracting a parallelism characteristic interval of each first data in the first data set to obtain a parallelism requirement type of each first data;
step 3: based on the parallelism demand type, matching a corresponding instruction set from a type-instruction database to obtain an instruction sequence;
step 4: based on the instruction sequence, the corresponding working units are matched to execute corresponding work.
Preferably, the present invention provides a heterogeneous data parsing method, which screens heterogeneous data according to a process architecture to obtain a first data set, including:
heterogeneous data are acquired, and segmentation is carried out according to the average length of the data, so that a first overlapped data set in each average length of the data is obtained;
removing overlapping data with overlapping length smaller than a preset length based on the first overlapping data set to obtain a second overlapping data set;
obtaining source data corresponding to each piece of residual overlapping data based on the second overlapping data set;
obtaining process architecture information corresponding to the residual overlapping data based on the source data corresponding to each residual overlapping data;
obtaining corresponding process architecture types based on the process architecture information;
and separating all the remaining overlapping data in the second overlapping data set based on the type of the process architecture to obtain a plurality of first data, and constructing to obtain a first data set.
Preferably, extracting a parallelism characteristic interval of each first data in the first data set to obtain a parallelism requirement type of each first data includes:
carrying out parallel analysis on each first data in the first data set to obtain a corresponding parallel characteristic data set and carrying out reconstruction to obtain a corresponding reconstruction data set, wherein the reconstruction data set comprises all first parallel data involved at the same time;
acquiring actual behavior vectors corresponding to first data at different moments in the same reconstruction data set, constructing a parallel line graph corresponding to the first data based on a model of the actual behavior vectors, and analyzing the slope;
constructing a corresponding static feature set and a dynamic feature set according to the slope analysis result;
based on the parallel feature association table, obtaining possible associated features corresponding to each dynamic feature data in the dynamic feature set;
based on the possible associated feature corresponding to each dynamic feature data and the dynamic association of the same dynamic feature data based on the rest dynamic feature data in the corresponding first data, obtaining the first feature of the corresponding dynamic feature data;
based on the slope average value corresponding to each dynamic characteristic data, obtaining a dynamic slope change trend graph according to a time sequence;
slope analysis is carried out on each first feature to obtain a first slope change trend graph;
and obtaining the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set based on the static characteristic set, the dynamic slope change trend graph and the first slope change trend graph.
Preferably, calculating a maximum value of the parallelism characteristic index of each first data in the first data set includes:
Figure SMS_1
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_3
representing the same first dataThe number of dynamic feature data in the dynamic feature set; />
Figure SMS_9
Representing the number of static feature data in the static feature set corresponding to the same first data; />
Figure SMS_12
Representing the +.>
Figure SMS_5
Modulo the vector corresponding to the dynamic feature data; />
Figure SMS_8
Representing the +.>
Figure SMS_10
Modulo the vector corresponding to the static feature data; />
Figure SMS_13
Representation and corresponding dynamic feature set +.>
Figure SMS_2
Modulo the vector of the first feature associated with the dynamic feature data; />
Figure SMS_7
Representing the largest weight among feature weights of all dynamic feature data in the corresponding dynamic feature set; />
Figure SMS_11
The maximum weight in the feature weights of all the static feature data in the corresponding static feature set is shown;
Figure SMS_14
representing the +.>
Figure SMS_4
Correlation coefficients of the dynamic features and the corresponding first features; />
Figure SMS_6
Representing the corresponding maximum value.
Preferably, calculating the minimum value of the parallelism characteristic index of each first data in the first data set includes:
Figure SMS_15
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_16
representing the corresponding minimum value; />
Figure SMS_17
Representing the minimum weight in the feature weights of all the static feature data in the corresponding static feature set; />
Figure SMS_18
Representing the smallest weight of the feature weights of all the dynamic feature data in the corresponding dynamic feature set.
Preferably, extracting a parallelism characteristic interval of each first data in the first data set to obtain a parallelism requirement type of each first data, and further including:
obtaining a parallelism characteristic interval of each first data in the first data set based on the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set;
optimizing the parallelism characteristic interval based on a preset error value;
and obtaining the parallelism demand type corresponding to the optimized parallelism characteristic interval based on the parallelism characteristic interval-demand type table.
Preferably, based on the parallelism requirement type, matching a corresponding instruction set from a type-instruction database to obtain an instruction sequence, including:
based on a type-instruction database, matching instruction sets corresponding to the parallelism demand type, acquiring hierarchical labels from a time period-label mapping table according to the data acquisition time period of the first data set, and sequencing all first data to obtain a first sequence;
and matching the first sequence with a corresponding instruction set to obtain an instruction sequence.
Preferably, determining the final tag corresponding to the parallel data according to the first slope difference of the left connecting segment and the second slope difference of the right connecting segment where the same parallel data is located, includes:
acquiring a first current slope of the left connecting section, and calculating a first slope difference between the first current slope and a preset average slope;
acquiring a second current slope of the right connecting section, and calculating a second slope difference between the second current slope and a preset average slope;
when the absolute value of the first slope difference is larger than that of the second slope difference, judging that the final label corresponding to the same parallel data is consistent with a label result set by slope judgment of the left connecting section;
otherwise, judging that the final label corresponding to the same parallel data is consistent with the label result set by the slope judgment of the right connecting section.
Preferably, constructing a corresponding static feature set and dynamic feature set according to the slope analysis result includes:
if the average slope of the die connecting line segments at every two moments is smaller than the preset average slope through analysis, setting a first static label for left parallel data and a second static label for right parallel data related to the corresponding die connecting line segments;
if the average slope is larger than or equal to a preset average slope, setting a first dynamic tag for left parallel data and a second dynamic tag for right parallel data related to the corresponding modular connection section;
when the labels set for the same parallel data are all static labels, the corresponding parallel data are regarded as static characteristic data;
when the labels set for the same parallel data are all dynamic labels, the corresponding parallel data are regarded as dynamic characteristic data;
when a label set for the same parallel data comprises a static label and a dynamic label, determining a final label corresponding to the parallel data according to a first slope difference of a left connecting section and a second slope difference of a right connecting section corresponding to the same parallel data, wherein the final label is a dynamic label or a static label;
based on all static feature data and all dynamic feature data, corresponding static feature sets and dynamic feature sets are constructed.
Preferably, the obtaining a maximum value and a minimum value of the parallelism characteristic index of each first data in the first data set based on the static characteristic set, the dynamic slope change trend graph and the first slope change trend graph includes:
inputting the dynamic slope change trend graph and the first slope change trend graph into a trend-association analysis model to obtain association coefficients of the same dynamic characteristic data and corresponding first characteristics;
and calculating the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set based on the static characteristic set, the dynamic characteristic set and the association coefficient of each dynamic characteristic data and the corresponding first characteristic.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
fig. 1 is a flowchart of a heterogeneous data parsing method according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
An embodiment of the present invention provides a heterogeneous data parsing method, as shown in fig. 1, including:
step 1: screening the heterogeneous data according to the process architecture to obtain a first data set;
step 2: extracting a parallelism characteristic interval of each first data in the first data set to obtain a parallelism requirement type of each first data;
step 3: based on the parallelism demand type, matching a corresponding instruction set from a type-instruction database to obtain an instruction sequence;
step 4: based on the instruction sequence, the corresponding working units are matched to execute corresponding work.
In this embodiment, the process architecture refers to the amount of frame operation speed required in the source data information of each data in the heterogeneous data and the combination information of the architecture, so that independent data in the heterogeneous data can be extracted, where the heterogeneous data refers to data overlapping of different structures, and the independent data refers to data containing complete data components.
In this embodiment, the first data set refers to a set of independent data separated from overlapping data in heterogeneous data obtained by filtering heterogeneous data according to a process architecture to remove useless data.
In this embodiment, the parallelism characteristic interval refers to a characteristic interval obtained by analyzing the characteristic of each first data in the first data set and calculating to obtain the maximum value and the minimum value of the corresponding parallelism characteristic index, and matching the corresponding parallelism requirement type in the parallelism characteristic interval-requirement type table, so as to achieve the purpose of obtaining the parallelism requirement type matching instruction set of each first data in the first data set for data analysis.
In this embodiment, the type of parallelism requirement refers to a type that all data in the heterogeneous data are simultaneously operated or operated, including a requirement for concurrency and a requirement for concurrency, where concurrency refers to that two or more data can be calculated at the same time, and concurrency refers to that two or more data can be calculated at the same time interval, so as to achieve the purpose of improving the resolution rate of the heterogeneous data.
In this embodiment, a type-instruction database refers to a database containing parallelism demand types and corresponding instruction sets.
In this embodiment, the instruction sequences refer to instruction sequences obtained by matching the parallelism requirement types with corresponding instruction sets, and ordering the instruction sets according to the results of hierarchical tag ordering, where each sequence in the instruction sequences has its own representative symbol, so as to facilitate instruction execution.
The working principle and the beneficial effects of the technical scheme are as follows: the method comprises the steps of screening heterogeneous data according to a process architecture, obtaining a first data set with independent data, extracting a parallelism characteristic interval of each first data, obtaining the parallelism requirement type of each first data, matching a corresponding instruction set, constructing a working unit with a corresponding instruction sequence to execute corresponding work, analyzing each independent data in the heterogeneous data more comprehensively, and facilitating grasp of important data in the heterogeneous data, so that analysis capability of heterogeneous data analysis is further improved.
The embodiment of the invention provides a heterogeneous data analysis method, which screens heterogeneous data according to a process architecture to obtain a first data set, and comprises the following steps:
heterogeneous data are acquired, and segmentation is carried out according to the average length of the data, so that a first overlapped data set in each average length of the data is obtained;
removing overlapping data with overlapping length smaller than a preset length based on the first overlapping data set to obtain a second overlapping data set;
obtaining source data corresponding to each piece of residual overlapping data based on the second overlapping data set;
obtaining process architecture information corresponding to the residual overlapping data based on the source data corresponding to each residual overlapping data;
obtaining corresponding process architecture types based on the process architecture information;
and separating all the remaining overlapping data in the second overlapping data set based on the type of the process architecture to obtain a plurality of first data, and constructing to obtain a first data set.
In this embodiment, the average data length refers to the average value calculation of the length values of the length splitting of the data in different analysis processes by different systems in the heterogeneous data, so as to achieve the purpose of segment analysis of the heterogeneous data.
In this embodiment, the first overlapping data set refers to all overlapping data of the heterogeneous data within an average length of the data, that is, after the heterogeneous data is segmented, a plurality of segmented data are obtained, and overlapping data may exist in different segmented data, where all overlapping data form the first overlapping data set.
In this embodiment, the preset length refers to the preset shortest data length, and if the data length is smaller than the preset data length, the data is incomplete data, so as to achieve the purpose of screening complete data.
In this embodiment, the second overlapping data set refers to a complete set of resolvable remaining data in the heterogeneous data obtained by reserving complete data in the first overlapping data set.
In this embodiment, the process architecture information refers to the frame operation speed amount required in the source data information of each data in the heterogeneous data and the combination information of the architecture, so that independent data in the heterogeneous data can be extracted.
In this embodiment, the process architecture type refers to the amount of frame operation speed required in the source data information of each data in the heterogeneous data and the type of each combination information of the architecture, so that the independent data in the heterogeneous data can be distinguished.
In this embodiment, the first data refers to data obtained by separating data in the second overlapping data set according to the type of the process architecture.
The working principle and the beneficial effects of the technical scheme are as follows: the heterogeneous data is segmented according to the average data length, a first overlapped data set in each average data length is obtained, complete screening is carried out, a second overlapped data set is obtained, analysis is carried out, the data in the second overlapped data set are separated according to the type of the manufacturing process architecture, mutually independent first data are obtained, and the accuracy and the resolving capability of heterogeneous data resolving are improved.
The embodiment of the invention provides a heterogeneous data analysis method, which is used for extracting a parallelism characteristic interval of each first data in a first data set to obtain the parallelism requirement type of each first data, and comprises the following steps:
carrying out parallel analysis on each first data in the first data set to obtain a corresponding parallel characteristic data set;
reconstructing the parallel characteristic data set to obtain a corresponding reconstructed data set, wherein the reconstructed data set comprises all first parallel data related at the same time;
acquiring actual behavior vectors corresponding to first data at different moments in the same reconstruction data set, and constructing a parallel line graph corresponding to the first data based on a model of the actual behavior vectors;
based on the parallel line graph, obtaining the average slope of the modular line segments at every two moments;
if the average slope is smaller than the preset average slope, setting a first static label for the left parallel data and a second static label for the right parallel data related to the corresponding modular connection section;
if the average slope is larger than or equal to a preset average slope, setting a first dynamic tag for left parallel data and a second dynamic tag for right parallel data related to the corresponding modular connection section;
when the labels set for the same parallel data are all static labels, the corresponding parallel data are regarded as static characteristic data;
when the labels set for the same parallel data are all dynamic labels, the corresponding parallel data are regarded as dynamic characteristic data;
when a label set for the same parallel data comprises a static label and a dynamic label, determining a final label corresponding to the parallel data according to a first slope difference of a left connecting section and a second slope difference of a right connecting section corresponding to the same parallel data, wherein the final label is a dynamic label or a static label;
based on all static feature data and all dynamic feature data, constructing corresponding static feature sets and dynamic feature sets;
based on the parallel feature association table, obtaining possible associated features corresponding to each dynamic feature data in the dynamic feature set;
based on the possible associated feature corresponding to each dynamic feature data and the dynamic association of the same dynamic feature data based on the rest dynamic feature data in the corresponding first data, obtaining the first feature of the corresponding dynamic feature data;
based on the slope average value corresponding to each dynamic characteristic data, obtaining a dynamic slope change trend graph according to a time sequence;
slope analysis is carried out on each first feature to obtain a first slope change trend graph;
inputting the dynamic slope change trend graph and the first slope change trend graph into a trend-association analysis model to obtain association coefficients of the same dynamic characteristic data and corresponding first characteristics;
and calculating the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set based on the static characteristic set, the dynamic characteristic set and the association coefficient of each dynamic characteristic data and the corresponding first characteristic.
In this embodiment, parallel parsing refers to data in which parallel operations exist in heterogeneous data, and the parallel operations include: the parallel characteristic data set refers to data with simultaneous operation or simultaneous operation of heterogeneous data or data with consistent transmission rate, operation rate and operation rate in the heterogeneous data.
In this embodiment, the reconstruction data set refers to data in the parallel characteristic data set corresponding to the same first data, which is obtained by performing data splicing and data clipping, so as to achieve the purpose of deep analysis of the parallel characteristic data.
In this embodiment, the preset average slope refers to an average slope of a line segment of a model of an actual behavior vector, which is preset under the influence of a reasonable external factor, and when the average slope is smaller than the preset average slope, the average slope is unchanged under a reasonable difference, so as to achieve the purpose of classifying the features dynamically and dynamically.
In this embodiment, the static feature data refers to that when the average slope is smaller than the preset average slope, the average slope is unchanged under a reasonable difference, and the corresponding parallel feature data is static feature data that is unchanged.
In this embodiment, the dynamic feature data refers to the dynamic feature data that when the average slope is greater than or equal to the preset average slope, the average slope changes under a reasonable difference, and the corresponding parallel feature data is changeable.
In this embodiment, a static feature set refers to a set of all static feature data.
In this embodiment, the dynamic feature set refers to a set of all dynamic feature data.
In this embodiment, the parallel feature association table refers to a look-up table containing dynamic feature data and corresponding possible associated features.
In this embodiment, the possible associated feature refers to feature data that may have an associated relationship with the dynamic feature data being searched for.
In this embodiment, the first feature refers to the same feature data in the other dynamic feature data except for the dynamic feature data corresponding to the possible associated feature corresponding to the dynamic feature data and the corresponding first data.
In this embodiment, the dynamic slope change trend graph refers to a slope change trend graph in a parallel line graph corresponding to dynamic feature data, so as to achieve the purpose of displaying dynamic feature data changes.
In this embodiment, the first slope trend graph refers to a slope trend graph in a parallel line graph corresponding to the first feature, so as to achieve the purpose of displaying the change of the first feature.
In this embodiment, the trend-association analysis model is a model which is obtained by training a dynamic slope change trend graph, a corresponding first slope change trend graph, and association coefficients of dynamic feature data and corresponding first features, and is capable of analyzing the dynamic slope change trend graph and the corresponding first slope change trend graph and obtaining the corresponding association coefficients.
In this embodiment, the association coefficient refers to a coefficient capable of reflecting the association relationship between the dynamic feature data and the corresponding first feature, and the value is 0 to 1.
In this embodiment, the parallelism characteristic index refers to an index representing the degree of simultaneous operation or manipulation of data obtained by calculating a static characteristic set, a dynamic characteristic set, and a correlation coefficient of each dynamic characteristic and a corresponding first characteristic.
The working principle and the beneficial effects of the technical scheme are as follows: the method comprises the steps of reconstructing parallel characteristic data corresponding to first data, analyzing the reconstruction vector of the reconstruction data to obtain average slopes of corresponding modular connection segments of the reconstruction vector at every two corresponding moments, comparing to obtain static characteristic data and dynamic characteristic data, carrying out characteristic relevance analysis on the dynamic characteristic data to obtain corresponding first characteristics and relevance coefficients, calculating the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set, refining the flow of heterogeneous data analysis, deeply analyzing the dynamic characteristic data and relevance characteristics, and greatly improving the accuracy and the analysis capability of heterogeneous data analysis.
The embodiment of the invention provides a heterogeneous data analysis method, which calculates the maximum value of the parallelism characteristic index of each first data in a first data set based on the static characteristic set, the dynamic characteristic set and the association coefficient of each dynamic characteristic and the corresponding first characteristic, and comprises the following steps:
Figure SMS_19
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_20
representing the number of dynamic feature data in the dynamic feature set corresponding to the same first data; />
Figure SMS_25
Representing the number of static feature data in the static feature set corresponding to the same first data; />
Figure SMS_28
Representing the +.>
Figure SMS_22
Modulo the vector corresponding to the dynamic feature data; />
Figure SMS_27
Representing the +.>
Figure SMS_30
Modulo the vector corresponding to the static feature data; />
Figure SMS_32
Representation and corresponding dynamic feature set +.>
Figure SMS_23
Modulo the vector of the first feature associated with the dynamic feature data; />
Figure SMS_26
Representing the largest weight among feature weights of all dynamic feature data in the corresponding dynamic feature set; />
Figure SMS_29
Showing features corresponding to all static feature data in a static feature setThe largest weight among the weights;
Figure SMS_31
representing the +.>
Figure SMS_21
Correlation coefficients of the dynamic features and the corresponding first features; />
Figure SMS_24
Representing the corresponding maximum.
In this embodiment of the present invention, the process is performed,
Figure SMS_33
the value of (2) is less than 1, (-)>
Figure SMS_34
The value of (2) is smaller than 1.
The working principle and the beneficial effects of the technical scheme are as follows: the maximum value of the parallelism characteristic index of each first data is obtained by calculating the static characteristic set, the dynamic characteristic set and the association coefficient of each dynamic characteristic and the corresponding first characteristic, so that the parallelism characteristic is accurately analyzed, and the accuracy degree of heterogeneous data analysis is improved.
The embodiment of the invention provides a heterogeneous data analysis method, which is used for calculating the minimum value of the parallelism characteristic index of each first data in a first data set based on a static characteristic set, a dynamic characteristic set and the association coefficient of each dynamic characteristic and a corresponding first characteristic, and comprises the following steps:
Figure SMS_35
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_36
representing the corresponding minimum value; />
Figure SMS_37
Feature weights representing all static feature data in corresponding static feature setsThe minimum weight of (3); />
Figure SMS_38
Representing the smallest weight of the feature weights of all the dynamic feature data in the corresponding dynamic feature set.
In this embodiment of the present invention, the process is performed,
Figure SMS_39
the value of (2) is less than 1, (-)>
Figure SMS_40
The value of (2) is smaller than 1.
The working principle and the beneficial effects of the technical scheme are as follows: the minimum value of the parallelism characteristic index of each first data is obtained by calculating the static characteristic set, the dynamic characteristic set and the association coefficient of each dynamic characteristic and the corresponding first characteristic, so that the parallelism characteristic is accurately analyzed, and the accuracy degree of heterogeneous data analysis is improved.
The embodiment of the invention provides a heterogeneous data analysis method, which extracts a parallelism characteristic interval of each first data in a first data set to obtain a parallelism requirement type of each first data, and further comprises the following steps:
obtaining a parallelism characteristic interval of each first data in the first data set based on the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set;
optimizing the parallelism characteristic interval based on a preset error value;
and obtaining the parallelism demand type corresponding to the optimized parallelism characteristic interval based on the parallelism characteristic interval-demand type table.
In this embodiment, the preset error value refers to a preset reasonable error value, so as to achieve the purpose of accurately matching the parallelism requirement type.
In this embodiment, the parallelism characteristic interval-requirement type table refers to a comparison table containing parallelism characteristic intervals and corresponding parallelism requirement types.
In this embodiment, the parallelism characteristic section is determined based on the maximum value and the minimum value, and the optimization is performed on the section by adjusting the maximum value and the minimum value, for example, the parallelism characteristic section is (a 1, a 2), and the optimization is performed by: (a 3, a 4).
Wherein a3=a1-preset error value/2, a4=a2+preset error value/3.
The working principle and the beneficial effects of the technical scheme are as follows: through optimizing the preset error value of the parallelism characteristic interval, inquiring is carried out in the parallelism characteristic interval-requirement type table according to the optimized parallelism characteristic interval, the corresponding parallelism requirement type is obtained, and the resolving capability of heterogeneous data resolving is improved.
The embodiment of the invention provides a heterogeneous data analysis method, which is based on the parallelism demand type, matches a corresponding instruction set from a type-instruction database to obtain an instruction sequence, and comprises the following steps:
based on a type-instruction database, matching instruction sets corresponding to the parallelism demand type, acquiring hierarchical labels from a time period-label mapping table according to the data acquisition time period of the first data set, and sequencing all first data to obtain a first sequence;
and matching the first sequence with a corresponding instruction set to obtain an instruction sequence.
In this embodiment, a type-instruction database refers to a database containing parallelism demand types and corresponding instruction sets.
In this embodiment, the period-tag mapping table refers to a lookup table composed of hierarchical tags including data acquisition periods and corresponding mappings.
In this embodiment, the hierarchy label refers to a label of a hierarchy in which each first data in the first data set is not separated, and can represent a data hierarchy, where the data hierarchy refers to a hierarchy in which data is located in the overlapping, and the number of hierarchies is determined by the first data in the first data set.
In this embodiment, the first sequence refers to a data sequence obtained by sorting all the first data in the first data according to the hierarchical label.
The working principle and the beneficial effects of the technical scheme are as follows: the first data in the first data set is sequenced by extracting the hierarchical label of the first data, so that a first sequence is obtained, a corresponding instruction set is obtained, an instruction sequence is obtained, and the resolving capability of heterogeneous data resolving is improved.
The embodiment of the invention provides a heterogeneous data analysis method, which determines a final label corresponding to parallel data according to a first slope difference of a left connecting section and a second slope difference of a right connecting section where the same parallel data are positioned, and comprises the following steps:
acquiring a first current slope of the left connecting section, and calculating a first slope difference between the first current slope and a preset average slope;
acquiring a second current slope of the right connecting section, and calculating a second slope difference between the second current slope and a preset average slope;
when the absolute value of the first slope difference is larger than that of the second slope difference, judging that the final label corresponding to the same parallel data is consistent with a label result set by slope judgment of the left connecting section;
otherwise, judging that the final label corresponding to the same parallel data is consistent with the label result set by the slope judgment of the right connecting section.
In this embodiment, the first current slope refers to the slope of the left connecting segment of the parallel line graph of the modulus of the actual behavior vector corresponding to the first parallel data at the current time.
In this embodiment, the second current slope refers to the slope of the right connecting segment of the parallel line graph of the modulus of the actual behavior vector corresponding to the first parallel data at the current time.
The working principle and the beneficial effects of the technical scheme are as follows: the first current slope and the second current slope of the left connecting section and the right connecting section are obtained, the slope difference of the preset average slope is calculated respectively to obtain the first slope difference and the second slope difference, the absolute value of the slope differences is compared, the final label corresponding to the same parallel data is judged to be consistent with the label result set by the slope judgment of the left connecting section or consistent with the label result set by the slope judgment of the right connecting section, the final label is obtained, dynamic and static classification is carried out on the characteristic data, the depth analysis on heterogeneous data is facilitated, and the analysis capability of the heterogeneous data is improved.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. A heterogeneous data parsing method, comprising:
step 1: screening the heterogeneous data according to the process architecture to obtain a first data set;
step 2: extracting a parallelism characteristic interval of each first data in the first data set to obtain a parallelism requirement type of each first data;
step 3: based on the parallelism demand type, matching a corresponding instruction set from a type-instruction database to obtain an instruction sequence;
step 4: based on the instruction sequence, the corresponding working units are matched to execute corresponding work.
2. The method of claim 1, wherein the screening of the heterogeneous data according to the process architecture to obtain the first data set comprises:
heterogeneous data are acquired, and segmentation is carried out according to the average length of the data, so that a first overlapped data set in each average length of the data is obtained;
removing overlapping data with overlapping length smaller than a preset length based on the first overlapping data set to obtain a second overlapping data set;
obtaining source data corresponding to each piece of residual overlapping data based on the second overlapping data set;
obtaining process architecture information corresponding to the residual overlapping data based on the source data corresponding to each residual overlapping data;
obtaining corresponding process architecture types based on the process architecture information;
and separating all the remaining overlapping data in the second overlapping data set based on the type of the process architecture to obtain a plurality of first data, and constructing to obtain a first data set.
3. The method of claim 1, wherein extracting the parallelism characteristic interval of each first data in the first data set to obtain the parallelism requirement type of each first data comprises:
carrying out parallel analysis on each first data in the first data set to obtain a corresponding parallel characteristic data set and carrying out reconstruction to obtain a corresponding reconstruction data set, wherein the reconstruction data set comprises all first parallel data involved at the same time;
acquiring actual behavior vectors corresponding to first data at different moments in the same reconstruction data set, constructing a parallel line graph corresponding to the first data based on a model of the actual behavior vectors, and analyzing the slope;
constructing a corresponding static feature set and a dynamic feature set according to the slope analysis result;
based on the parallel feature association table, obtaining possible associated features corresponding to each dynamic feature data in the dynamic feature set;
based on the possible associated feature corresponding to each dynamic feature data and the dynamic association of the same dynamic feature data based on the rest dynamic feature data in the corresponding first data, obtaining the first feature of the corresponding dynamic feature data;
based on the slope average value corresponding to each dynamic characteristic data, obtaining a dynamic slope change trend graph according to a time sequence;
slope analysis is carried out on each first feature to obtain a first slope change trend graph;
and obtaining the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set based on the static characteristic set, the dynamic slope change trend graph and the first slope change trend graph.
4. A method according to claim 3, wherein calculating a maximum value of the parallelism characteristic index of each first data in the first data set comprises:
Figure QLYQS_2
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>
Figure QLYQS_8
Representing the number of dynamic feature data in the dynamic feature set corresponding to the same first data; />
Figure QLYQS_12
Representing the number of static feature data in the static feature set corresponding to the same first data; />
Figure QLYQS_3
Representing the +.>
Figure QLYQS_6
Modulo the vector corresponding to the dynamic feature data; />
Figure QLYQS_10
Representing the +.>
Figure QLYQS_13
Modulo the vector corresponding to the static feature data; />
Figure QLYQS_1
Representation and corresponding dynamic feature set +.>
Figure QLYQS_7
Modulo the vector of the first feature associated with the dynamic feature data; />
Figure QLYQS_11
Representing the largest weight among feature weights of all dynamic feature data in the corresponding dynamic feature set; />
Figure QLYQS_14
The maximum weight in the feature weights of all the static feature data in the corresponding static feature set is shown; />
Figure QLYQS_4
Representing the +.>
Figure QLYQS_5
Correlation coefficients of the dynamic features and the corresponding first features; />
Figure QLYQS_9
Representing the corresponding maximum value.
5. The method of claim 4, wherein calculating a minimum value of the parallelism characteristic index for each of the first data in the first data set comprises:
Figure QLYQS_15
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>
Figure QLYQS_16
Representing the corresponding minimum value; />
Figure QLYQS_17
Representing the minimum weight in the feature weights of all the static feature data in the corresponding static feature set;
Figure QLYQS_18
representing the smallest weight of the feature weights of all the dynamic feature data in the corresponding dynamic feature set.
6. The method of claim 1, wherein extracting the parallelism characteristic interval for each first data in the first dataset to obtain the parallelism requirement type for each first data, further comprises:
obtaining a parallelism characteristic interval of each first data in the first data set based on the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set;
optimizing the parallelism characteristic interval based on a preset error value;
and obtaining the parallelism demand type corresponding to the optimized parallelism characteristic interval based on the parallelism characteristic interval-demand type table.
7. The method of claim 2, wherein matching corresponding instruction sets from a type-instruction database based on the parallelism demand type, results in an instruction sequence, comprising:
based on a type-instruction database, matching instruction sets corresponding to the parallelism demand type, acquiring hierarchical labels from a time period-label mapping table according to the data acquisition time period of the first data set, and sequencing all first data to obtain a first sequence;
and matching the first sequence with a corresponding instruction set to obtain an instruction sequence.
8. The method of claim 3, wherein determining the final tag for the parallel data based on the first slope difference for the left link segment and the second slope difference for the right link segment where the same parallel data is located comprises:
acquiring a first current slope of the left connecting section, and calculating a first slope difference between the first current slope and a preset average slope;
acquiring a second current slope of the right connecting section, and calculating a second slope difference between the second current slope and a preset average slope;
when the absolute value of the first slope difference is larger than that of the second slope difference, judging that the final label corresponding to the same parallel data is consistent with a label result set by slope judgment of the left connecting section;
otherwise, judging that the final label corresponding to the same parallel data is consistent with the label result set by the slope judgment of the right connecting section.
9. The method of claim 3, wherein constructing corresponding static feature sets and dynamic feature sets based on slope analysis results comprises:
if the average slope of the die connecting line segments at every two moments is smaller than the preset average slope through analysis, setting a first static label for left parallel data and a second static label for right parallel data related to the corresponding die connecting line segments;
if the average slope is larger than or equal to a preset average slope, setting a first dynamic tag for left parallel data and a second dynamic tag for right parallel data related to the corresponding modular connection section;
when the labels set for the same parallel data are all static labels, the corresponding parallel data are regarded as static characteristic data;
when the labels set for the same parallel data are all dynamic labels, the corresponding parallel data are regarded as dynamic characteristic data;
when a label set for the same parallel data comprises a static label and a dynamic label, determining a final label corresponding to the parallel data according to a first slope difference of a left connecting section and a second slope difference of a right connecting section corresponding to the same parallel data, wherein the final label is a dynamic label or a static label;
based on all static feature data and all dynamic feature data, corresponding static feature sets and dynamic feature sets are constructed.
10. The method of claim 3, wherein obtaining a maximum value and a minimum value of the parallelism characteristic index for each first data in the first data set based on the static characteristic set, the dynamic slope trend graph, and the first slope trend graph comprises:
inputting the dynamic slope change trend graph and the first slope change trend graph into a trend-association analysis model to obtain association coefficients of the same dynamic characteristic data and corresponding first characteristics;
and calculating the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set based on the static characteristic set, the dynamic characteristic set and the association coefficient of each dynamic characteristic data and the corresponding first characteristic.
CN202310402595.2A 2023-04-17 2023-04-17 Heterogeneous data analysis method Active CN116150455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310402595.2A CN116150455B (en) 2023-04-17 2023-04-17 Heterogeneous data analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310402595.2A CN116150455B (en) 2023-04-17 2023-04-17 Heterogeneous data analysis method

Publications (2)

Publication Number Publication Date
CN116150455A true CN116150455A (en) 2023-05-23
CN116150455B CN116150455B (en) 2023-07-18

Family

ID=86350887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310402595.2A Active CN116150455B (en) 2023-04-17 2023-04-17 Heterogeneous data analysis method

Country Status (1)

Country Link
CN (1) CN116150455B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117077598A (en) * 2023-10-13 2023-11-17 青岛展诚科技有限公司 3D parasitic parameter optimization method based on Mini-batch gradient descent method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080077930A1 (en) * 2006-09-26 2008-03-27 Eichenberger Alexandre E Workload Partitioning in a Parallel System with Hetergeneous Alignment Constraints
CN101441569A (en) * 2008-11-24 2009-05-27 中国人民解放军信息工程大学 Novel service flow-oriented compiling method based on heterogeneous reconfigurable architecture
CN102707952A (en) * 2012-05-16 2012-10-03 上海大学 User description based programming design method on embedded heterogeneous multi-core processor
CN104615488A (en) * 2015-01-16 2015-05-13 华为技术有限公司 Task scheduling method and device on heterogeneous multi-core reconfigurable computing platform
WO2022166466A1 (en) * 2021-02-08 2022-08-11 中国核电工程有限公司 Sensor screening method and apparatus and sensor data reconstruction method and system
CN115309402A (en) * 2022-07-13 2022-11-08 国网江苏省电力有限公司信息通信分公司 Method and device for forming heterogeneous execution sequence set capable of quantifying differences

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080077930A1 (en) * 2006-09-26 2008-03-27 Eichenberger Alexandre E Workload Partitioning in a Parallel System with Hetergeneous Alignment Constraints
CN101441569A (en) * 2008-11-24 2009-05-27 中国人民解放军信息工程大学 Novel service flow-oriented compiling method based on heterogeneous reconfigurable architecture
CN102707952A (en) * 2012-05-16 2012-10-03 上海大学 User description based programming design method on embedded heterogeneous multi-core processor
CN104615488A (en) * 2015-01-16 2015-05-13 华为技术有限公司 Task scheduling method and device on heterogeneous multi-core reconfigurable computing platform
WO2022166466A1 (en) * 2021-02-08 2022-08-11 中国核电工程有限公司 Sensor screening method and apparatus and sensor data reconstruction method and system
CN115309402A (en) * 2022-07-13 2022-11-08 国网江苏省电力有限公司信息通信分公司 Method and device for forming heterogeneous execution sequence set capable of quantifying differences

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHAO LIN: "Research on Heterogeneous Data Transfer Method for Power Energy Management", 《2021 5TH INTERNATIONAL CONFERENCE ON POWER AND ENERGY ENGINEERING (ICPEE)》, pages 139 - 144 *
吴树森;董小社;王宇菲;王龙翔;朱正东;: "UPPA:面向异构众核系统的统一并行编程架构", 计算机学报, no. 06, pages 20 - 39 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117077598A (en) * 2023-10-13 2023-11-17 青岛展诚科技有限公司 3D parasitic parameter optimization method based on Mini-batch gradient descent method
CN117077598B (en) * 2023-10-13 2024-01-26 青岛展诚科技有限公司 3D parasitic parameter optimization method based on Mini-batch gradient descent method

Also Published As

Publication number Publication date
CN116150455B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
Pezzotti et al. Approximated and user steerable tSNE for progressive visual analytics
US6191792B1 (en) Scheme for automatic data conversion definition generation according to data feature in visual multidimensional data analysis tool
CN108764361B (en) Working condition identification method of indicator diagram of beam-pumping unit based on integrated learning
CN116150455B (en) Heterogeneous data analysis method
CN112395487B (en) Information recommendation method and device, computer readable storage medium and electronic equipment
CN112132014B (en) Target re-identification method and system based on non-supervised pyramid similarity learning
CN112836509A (en) Expert system knowledge base construction method and system
Sumathi et al. Data mining: analysis of student database using classification techniques
CN116188475A (en) Intelligent control method, system and medium for automatic optical detection of appearance defects
CN110852076B (en) Method and device for automatic disease code conversion
CN115544348A (en) Intelligent mass information searching system based on Internet big data
CN115796312A (en) Multivariable time series prediction method and system
CN115203338A (en) Label and label example recommendation method
CN114691525A (en) Test case selection method and device
EP2348403B1 (en) Method and system for analyzing a legacy system based on trails through the legacy system
CN106775694A (en) A kind of hierarchy classification method of software merit rating code product
CN115062674B (en) Tool arrangement and tool changing method and device based on deep learning and storage medium
CN116738041A (en) Intelligent recommendation method and system for exercise scheme and electronic equipment
CN113837554B (en) Multi-mode key information matching-based food safety risk identification method and system
CN115579069A (en) Construction method and device of scRNA-Seq cell type annotation database and electronic equipment
Manning et al. CHLOE: A software tool for automatic novelty detection in microscopy image datasets
CN114913921A (en) System and method for identifying marker gene
CN113407700A (en) Data query method, device and equipment
CN113627522A (en) Image classification method, device and equipment based on relational network and storage medium
CN113409293A (en) Pathology image automatic segmentation system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant