CN116150455A

CN116150455A - Heterogeneous data analysis method

Info

Publication number: CN116150455A
Application number: CN202310402595.2A
Authority: CN
Inventors: 戚红建; 王宇飞; 韩硕; 宋成风; 张强; 秦绪帅; 李伟; 刘誉杰; 李磊; 徐衍斐
Original assignee: Beijing Bidding Branch Of China Huaneng Group Co ltd; Huaneng Information Technology Co Ltd
Current assignee: Beijing Bidding Branch Of China Huaneng Group Co ltd; Huaneng Information Technology Co Ltd
Priority date: 2023-04-17
Filing date: 2023-04-17
Publication date: 2023-05-23
Anticipated expiration: 2043-04-17
Also published as: CN116150455B

Abstract

The invention provides a heterogeneous data analysis method, which comprises the following steps: screening the heterogeneous data according to the process architecture to obtain a first data set; extracting a parallelism characteristic interval of each first data in the first data set to obtain a parallelism requirement type of each first data; based on the parallelism demand type, matching a corresponding instruction set from a type-instruction database to obtain an instruction sequence; based on the instruction sequence, the corresponding working units are matched to execute corresponding work. And each independent data in the heterogeneous data is more comprehensively analyzed, so that the mastering of important data in the heterogeneous data is facilitated, and the analysis capability of the heterogeneous data analysis is further improved.

Description

Heterogeneous data analysis method

Technical Field

The invention relates to the technical field of data analysis, in particular to a heterogeneous data analysis method.

Background

At present, with the development of scientific technology, the Internet is also more and more similar to the daily life of people, and the development of productivity and production technology also depends on the Internet. The result of the evolution of the internet is an explosive growth in the volume of data, and a computer is required to process a vast amount of heterogeneous data. Whether data mining or artificial intelligence modeling is performed, the first step is to access data, and the most important step of accessing data is to process heterogeneous data, however, at present, in the process of processing heterogeneous data, each heterogeneous data is generally and individually analyzed in sequence according to a set flow, and as the analysis mode is single and the individual analysis process may be too frequent, accurate analysis on the heterogeneous data cannot be performed.

Therefore, the invention provides a heterogeneous data analysis method.

Disclosure of Invention

The invention provides a heterogeneous data analysis method, which is used for screening heterogeneous data according to a process architecture to obtain first data sets with independent data, extracting parallelism characteristic intervals of each first data to obtain parallelism requirement types of each first data, matching corresponding instruction sets, constructing working units with corresponding instruction sequences to execute corresponding work, analyzing each independent data in the heterogeneous data more comprehensively, facilitating mastering important data in the heterogeneous data, and further improving analysis capability of heterogeneous data analysis.

The invention provides a heterogeneous data analysis method, which comprises the following steps:

step 1: screening the heterogeneous data according to the process architecture to obtain a first data set;

step 2: extracting a parallelism characteristic interval of each first data in the first data set to obtain a parallelism requirement type of each first data;

step 3: based on the parallelism demand type, matching a corresponding instruction set from a type-instruction database to obtain an instruction sequence;

step 4: based on the instruction sequence, the corresponding working units are matched to execute corresponding work.

Preferably, the present invention provides a heterogeneous data parsing method, which screens heterogeneous data according to a process architecture to obtain a first data set, including:

heterogeneous data are acquired, and segmentation is carried out according to the average length of the data, so that a first overlapped data set in each average length of the data is obtained;

removing overlapping data with overlapping length smaller than a preset length based on the first overlapping data set to obtain a second overlapping data set;

obtaining source data corresponding to each piece of residual overlapping data based on the second overlapping data set;

obtaining process architecture information corresponding to the residual overlapping data based on the source data corresponding to each residual overlapping data;

obtaining corresponding process architecture types based on the process architecture information;

and separating all the remaining overlapping data in the second overlapping data set based on the type of the process architecture to obtain a plurality of first data, and constructing to obtain a first data set.

Preferably, extracting a parallelism characteristic interval of each first data in the first data set to obtain a parallelism requirement type of each first data includes:

carrying out parallel analysis on each first data in the first data set to obtain a corresponding parallel characteristic data set and carrying out reconstruction to obtain a corresponding reconstruction data set, wherein the reconstruction data set comprises all first parallel data involved at the same time;

acquiring actual behavior vectors corresponding to first data at different moments in the same reconstruction data set, constructing a parallel line graph corresponding to the first data based on a model of the actual behavior vectors, and analyzing the slope;

constructing a corresponding static feature set and a dynamic feature set according to the slope analysis result;

based on the parallel feature association table, obtaining possible associated features corresponding to each dynamic feature data in the dynamic feature set;

based on the possible associated feature corresponding to each dynamic feature data and the dynamic association of the same dynamic feature data based on the rest dynamic feature data in the corresponding first data, obtaining the first feature of the corresponding dynamic feature data;

based on the slope average value corresponding to each dynamic characteristic data, obtaining a dynamic slope change trend graph according to a time sequence;

slope analysis is carried out on each first feature to obtain a first slope change trend graph;

and obtaining the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set based on the static characteristic set, the dynamic slope change trend graph and the first slope change trend graph.

Preferably, calculating a maximum value of the parallelism characteristic index of each first data in the first data set includes:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing the same first dataThe number of dynamic feature data in the dynamic feature set; />

Representing the number of static feature data in the static feature set corresponding to the same first data; />

Representing the +.>

Modulo the vector corresponding to the dynamic feature data; />

Representing the +.>

Modulo the vector corresponding to the static feature data; />

Representation and corresponding dynamic feature set +.>

Modulo the vector of the first feature associated with the dynamic feature data; />

Representing the largest weight among feature weights of all dynamic feature data in the corresponding dynamic feature set; />

The maximum weight in the feature weights of all the static feature data in the corresponding static feature set is shown;

representing the +.>

Correlation coefficients of the dynamic features and the corresponding first features; />

Representing the corresponding maximum value.

Preferably, calculating the minimum value of the parallelism characteristic index of each first data in the first data set includes:

；

representing the corresponding minimum value; />

Representing the minimum weight in the feature weights of all the static feature data in the corresponding static feature set; />

Representing the smallest weight of the feature weights of all the dynamic feature data in the corresponding dynamic feature set.

Preferably, extracting a parallelism characteristic interval of each first data in the first data set to obtain a parallelism requirement type of each first data, and further including:

obtaining a parallelism characteristic interval of each first data in the first data set based on the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set;

optimizing the parallelism characteristic interval based on a preset error value;

and obtaining the parallelism demand type corresponding to the optimized parallelism characteristic interval based on the parallelism characteristic interval-demand type table.

Preferably, based on the parallelism requirement type, matching a corresponding instruction set from a type-instruction database to obtain an instruction sequence, including:

based on a type-instruction database, matching instruction sets corresponding to the parallelism demand type, acquiring hierarchical labels from a time period-label mapping table according to the data acquisition time period of the first data set, and sequencing all first data to obtain a first sequence;

and matching the first sequence with a corresponding instruction set to obtain an instruction sequence.

Preferably, determining the final tag corresponding to the parallel data according to the first slope difference of the left connecting segment and the second slope difference of the right connecting segment where the same parallel data is located, includes:

acquiring a first current slope of the left connecting section, and calculating a first slope difference between the first current slope and a preset average slope;

acquiring a second current slope of the right connecting section, and calculating a second slope difference between the second current slope and a preset average slope;

when the absolute value of the first slope difference is larger than that of the second slope difference, judging that the final label corresponding to the same parallel data is consistent with a label result set by slope judgment of the left connecting section;

otherwise, judging that the final label corresponding to the same parallel data is consistent with the label result set by the slope judgment of the right connecting section.

Preferably, constructing a corresponding static feature set and dynamic feature set according to the slope analysis result includes:

if the average slope of the die connecting line segments at every two moments is smaller than the preset average slope through analysis, setting a first static label for left parallel data and a second static label for right parallel data related to the corresponding die connecting line segments;

if the average slope is larger than or equal to a preset average slope, setting a first dynamic tag for left parallel data and a second dynamic tag for right parallel data related to the corresponding modular connection section;

when the labels set for the same parallel data are all static labels, the corresponding parallel data are regarded as static characteristic data;

when the labels set for the same parallel data are all dynamic labels, the corresponding parallel data are regarded as dynamic characteristic data;

when a label set for the same parallel data comprises a static label and a dynamic label, determining a final label corresponding to the parallel data according to a first slope difference of a left connecting section and a second slope difference of a right connecting section corresponding to the same parallel data, wherein the final label is a dynamic label or a static label;

based on all static feature data and all dynamic feature data, corresponding static feature sets and dynamic feature sets are constructed.

Preferably, the obtaining a maximum value and a minimum value of the parallelism characteristic index of each first data in the first data set based on the static characteristic set, the dynamic slope change trend graph and the first slope change trend graph includes:

inputting the dynamic slope change trend graph and the first slope change trend graph into a trend-association analysis model to obtain association coefficients of the same dynamic characteristic data and corresponding first characteristics;

and calculating the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set based on the static characteristic set, the dynamic characteristic set and the association coefficient of each dynamic characteristic data and the corresponding first characteristic.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

fig. 1 is a flowchart of a heterogeneous data parsing method according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

An embodiment of the present invention provides a heterogeneous data parsing method, as shown in fig. 1, including:

In this embodiment, the process architecture refers to the amount of frame operation speed required in the source data information of each data in the heterogeneous data and the combination information of the architecture, so that independent data in the heterogeneous data can be extracted, where the heterogeneous data refers to data overlapping of different structures, and the independent data refers to data containing complete data components.

In this embodiment, the first data set refers to a set of independent data separated from overlapping data in heterogeneous data obtained by filtering heterogeneous data according to a process architecture to remove useless data.

In this embodiment, the parallelism characteristic interval refers to a characteristic interval obtained by analyzing the characteristic of each first data in the first data set and calculating to obtain the maximum value and the minimum value of the corresponding parallelism characteristic index, and matching the corresponding parallelism requirement type in the parallelism characteristic interval-requirement type table, so as to achieve the purpose of obtaining the parallelism requirement type matching instruction set of each first data in the first data set for data analysis.

In this embodiment, the type of parallelism requirement refers to a type that all data in the heterogeneous data are simultaneously operated or operated, including a requirement for concurrency and a requirement for concurrency, where concurrency refers to that two or more data can be calculated at the same time, and concurrency refers to that two or more data can be calculated at the same time interval, so as to achieve the purpose of improving the resolution rate of the heterogeneous data.

In this embodiment, a type-instruction database refers to a database containing parallelism demand types and corresponding instruction sets.

In this embodiment, the instruction sequences refer to instruction sequences obtained by matching the parallelism requirement types with corresponding instruction sets, and ordering the instruction sets according to the results of hierarchical tag ordering, where each sequence in the instruction sequences has its own representative symbol, so as to facilitate instruction execution.

The working principle and the beneficial effects of the technical scheme are as follows: the method comprises the steps of screening heterogeneous data according to a process architecture, obtaining a first data set with independent data, extracting a parallelism characteristic interval of each first data, obtaining the parallelism requirement type of each first data, matching a corresponding instruction set, constructing a working unit with a corresponding instruction sequence to execute corresponding work, analyzing each independent data in the heterogeneous data more comprehensively, and facilitating grasp of important data in the heterogeneous data, so that analysis capability of heterogeneous data analysis is further improved.

The embodiment of the invention provides a heterogeneous data analysis method, which screens heterogeneous data according to a process architecture to obtain a first data set, and comprises the following steps:

In this embodiment, the average data length refers to the average value calculation of the length values of the length splitting of the data in different analysis processes by different systems in the heterogeneous data, so as to achieve the purpose of segment analysis of the heterogeneous data.

In this embodiment, the first overlapping data set refers to all overlapping data of the heterogeneous data within an average length of the data, that is, after the heterogeneous data is segmented, a plurality of segmented data are obtained, and overlapping data may exist in different segmented data, where all overlapping data form the first overlapping data set.

In this embodiment, the preset length refers to the preset shortest data length, and if the data length is smaller than the preset data length, the data is incomplete data, so as to achieve the purpose of screening complete data.

In this embodiment, the second overlapping data set refers to a complete set of resolvable remaining data in the heterogeneous data obtained by reserving complete data in the first overlapping data set.

In this embodiment, the process architecture information refers to the frame operation speed amount required in the source data information of each data in the heterogeneous data and the combination information of the architecture, so that independent data in the heterogeneous data can be extracted.

In this embodiment, the process architecture type refers to the amount of frame operation speed required in the source data information of each data in the heterogeneous data and the type of each combination information of the architecture, so that the independent data in the heterogeneous data can be distinguished.

In this embodiment, the first data refers to data obtained by separating data in the second overlapping data set according to the type of the process architecture.

The working principle and the beneficial effects of the technical scheme are as follows: the heterogeneous data is segmented according to the average data length, a first overlapped data set in each average data length is obtained, complete screening is carried out, a second overlapped data set is obtained, analysis is carried out, the data in the second overlapped data set are separated according to the type of the manufacturing process architecture, mutually independent first data are obtained, and the accuracy and the resolving capability of heterogeneous data resolving are improved.

The embodiment of the invention provides a heterogeneous data analysis method, which is used for extracting a parallelism characteristic interval of each first data in a first data set to obtain the parallelism requirement type of each first data, and comprises the following steps:

carrying out parallel analysis on each first data in the first data set to obtain a corresponding parallel characteristic data set;

reconstructing the parallel characteristic data set to obtain a corresponding reconstructed data set, wherein the reconstructed data set comprises all first parallel data related at the same time;

acquiring actual behavior vectors corresponding to first data at different moments in the same reconstruction data set, and constructing a parallel line graph corresponding to the first data based on a model of the actual behavior vectors;

based on the parallel line graph, obtaining the average slope of the modular line segments at every two moments;

if the average slope is smaller than the preset average slope, setting a first static label for the left parallel data and a second static label for the right parallel data related to the corresponding modular connection section;

based on all static feature data and all dynamic feature data, constructing corresponding static feature sets and dynamic feature sets;

In this embodiment, parallel parsing refers to data in which parallel operations exist in heterogeneous data, and the parallel operations include: the parallel characteristic data set refers to data with simultaneous operation or simultaneous operation of heterogeneous data or data with consistent transmission rate, operation rate and operation rate in the heterogeneous data.

In this embodiment, the reconstruction data set refers to data in the parallel characteristic data set corresponding to the same first data, which is obtained by performing data splicing and data clipping, so as to achieve the purpose of deep analysis of the parallel characteristic data.

In this embodiment, the preset average slope refers to an average slope of a line segment of a model of an actual behavior vector, which is preset under the influence of a reasonable external factor, and when the average slope is smaller than the preset average slope, the average slope is unchanged under a reasonable difference, so as to achieve the purpose of classifying the features dynamically and dynamically.

In this embodiment, the static feature data refers to that when the average slope is smaller than the preset average slope, the average slope is unchanged under a reasonable difference, and the corresponding parallel feature data is static feature data that is unchanged.

In this embodiment, the dynamic feature data refers to the dynamic feature data that when the average slope is greater than or equal to the preset average slope, the average slope changes under a reasonable difference, and the corresponding parallel feature data is changeable.

In this embodiment, a static feature set refers to a set of all static feature data.

In this embodiment, the dynamic feature set refers to a set of all dynamic feature data.

In this embodiment, the parallel feature association table refers to a look-up table containing dynamic feature data and corresponding possible associated features.

In this embodiment, the possible associated feature refers to feature data that may have an associated relationship with the dynamic feature data being searched for.

In this embodiment, the first feature refers to the same feature data in the other dynamic feature data except for the dynamic feature data corresponding to the possible associated feature corresponding to the dynamic feature data and the corresponding first data.

In this embodiment, the dynamic slope change trend graph refers to a slope change trend graph in a parallel line graph corresponding to dynamic feature data, so as to achieve the purpose of displaying dynamic feature data changes.

In this embodiment, the first slope trend graph refers to a slope trend graph in a parallel line graph corresponding to the first feature, so as to achieve the purpose of displaying the change of the first feature.

In this embodiment, the trend-association analysis model is a model which is obtained by training a dynamic slope change trend graph, a corresponding first slope change trend graph, and association coefficients of dynamic feature data and corresponding first features, and is capable of analyzing the dynamic slope change trend graph and the corresponding first slope change trend graph and obtaining the corresponding association coefficients.

In this embodiment, the association coefficient refers to a coefficient capable of reflecting the association relationship between the dynamic feature data and the corresponding first feature, and the value is 0 to 1.

In this embodiment, the parallelism characteristic index refers to an index representing the degree of simultaneous operation or manipulation of data obtained by calculating a static characteristic set, a dynamic characteristic set, and a correlation coefficient of each dynamic characteristic and a corresponding first characteristic.

The working principle and the beneficial effects of the technical scheme are as follows: the method comprises the steps of reconstructing parallel characteristic data corresponding to first data, analyzing the reconstruction vector of the reconstruction data to obtain average slopes of corresponding modular connection segments of the reconstruction vector at every two corresponding moments, comparing to obtain static characteristic data and dynamic characteristic data, carrying out characteristic relevance analysis on the dynamic characteristic data to obtain corresponding first characteristics and relevance coefficients, calculating the maximum value and the minimum value of the parallelism characteristic index of each first data in the first data set, refining the flow of heterogeneous data analysis, deeply analyzing the dynamic characteristic data and relevance characteristics, and greatly improving the accuracy and the analysis capability of heterogeneous data analysis.

The embodiment of the invention provides a heterogeneous data analysis method, which calculates the maximum value of the parallelism characteristic index of each first data in a first data set based on the static characteristic set, the dynamic characteristic set and the association coefficient of each dynamic characteristic and the corresponding first characteristic, and comprises the following steps:

；

representing the number of dynamic feature data in the dynamic feature set corresponding to the same first data; />

Representing the +.>

Modulo the vector corresponding to the dynamic feature data; />

Representing the +.>

Modulo the vector corresponding to the static feature data; />

Representation and corresponding dynamic feature set +.>

Showing features corresponding to all static feature data in a static feature setThe largest weight among the weights;

representing the +.>

Representing the corresponding maximum.

In this embodiment of the present invention, the process is performed,

the value of (2) is less than 1, (-)>

The value of (2) is smaller than 1.

The working principle and the beneficial effects of the technical scheme are as follows: the maximum value of the parallelism characteristic index of each first data is obtained by calculating the static characteristic set, the dynamic characteristic set and the association coefficient of each dynamic characteristic and the corresponding first characteristic, so that the parallelism characteristic is accurately analyzed, and the accuracy degree of heterogeneous data analysis is improved.

The embodiment of the invention provides a heterogeneous data analysis method, which is used for calculating the minimum value of the parallelism characteristic index of each first data in a first data set based on a static characteristic set, a dynamic characteristic set and the association coefficient of each dynamic characteristic and a corresponding first characteristic, and comprises the following steps:

；

representing the corresponding minimum value; />

Feature weights representing all static feature data in corresponding static feature setsThe minimum weight of (3); />

In this embodiment of the present invention, the process is performed,

the value of (2) is less than 1, (-)>

The value of (2) is smaller than 1.

The working principle and the beneficial effects of the technical scheme are as follows: the minimum value of the parallelism characteristic index of each first data is obtained by calculating the static characteristic set, the dynamic characteristic set and the association coefficient of each dynamic characteristic and the corresponding first characteristic, so that the parallelism characteristic is accurately analyzed, and the accuracy degree of heterogeneous data analysis is improved.

The embodiment of the invention provides a heterogeneous data analysis method, which extracts a parallelism characteristic interval of each first data in a first data set to obtain a parallelism requirement type of each first data, and further comprises the following steps:

In this embodiment, the preset error value refers to a preset reasonable error value, so as to achieve the purpose of accurately matching the parallelism requirement type.

In this embodiment, the parallelism characteristic interval-requirement type table refers to a comparison table containing parallelism characteristic intervals and corresponding parallelism requirement types.

In this embodiment, the parallelism characteristic section is determined based on the maximum value and the minimum value, and the optimization is performed on the section by adjusting the maximum value and the minimum value, for example, the parallelism characteristic section is (a 1, a 2), and the optimization is performed by: (a 3, a 4).

Wherein a3=a1-preset error value/2, a4=a2+preset error value/3.

The working principle and the beneficial effects of the technical scheme are as follows: through optimizing the preset error value of the parallelism characteristic interval, inquiring is carried out in the parallelism characteristic interval-requirement type table according to the optimized parallelism characteristic interval, the corresponding parallelism requirement type is obtained, and the resolving capability of heterogeneous data resolving is improved.

The embodiment of the invention provides a heterogeneous data analysis method, which is based on the parallelism demand type, matches a corresponding instruction set from a type-instruction database to obtain an instruction sequence, and comprises the following steps:

In this embodiment, the period-tag mapping table refers to a lookup table composed of hierarchical tags including data acquisition periods and corresponding mappings.

In this embodiment, the hierarchy label refers to a label of a hierarchy in which each first data in the first data set is not separated, and can represent a data hierarchy, where the data hierarchy refers to a hierarchy in which data is located in the overlapping, and the number of hierarchies is determined by the first data in the first data set.

In this embodiment, the first sequence refers to a data sequence obtained by sorting all the first data in the first data according to the hierarchical label.

The working principle and the beneficial effects of the technical scheme are as follows: the first data in the first data set is sequenced by extracting the hierarchical label of the first data, so that a first sequence is obtained, a corresponding instruction set is obtained, an instruction sequence is obtained, and the resolving capability of heterogeneous data resolving is improved.

The embodiment of the invention provides a heterogeneous data analysis method, which determines a final label corresponding to parallel data according to a first slope difference of a left connecting section and a second slope difference of a right connecting section where the same parallel data are positioned, and comprises the following steps:

In this embodiment, the first current slope refers to the slope of the left connecting segment of the parallel line graph of the modulus of the actual behavior vector corresponding to the first parallel data at the current time.

In this embodiment, the second current slope refers to the slope of the right connecting segment of the parallel line graph of the modulus of the actual behavior vector corresponding to the first parallel data at the current time.

The working principle and the beneficial effects of the technical scheme are as follows: the first current slope and the second current slope of the left connecting section and the right connecting section are obtained, the slope difference of the preset average slope is calculated respectively to obtain the first slope difference and the second slope difference, the absolute value of the slope differences is compared, the final label corresponding to the same parallel data is judged to be consistent with the label result set by the slope judgment of the left connecting section or consistent with the label result set by the slope judgment of the right connecting section, the final label is obtained, dynamic and static classification is carried out on the characteristic data, the depth analysis on heterogeneous data is facilitated, and the analysis capability of the heterogeneous data is improved.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A heterogeneous data parsing method, comprising:

2. The method of claim 1, wherein the screening of the heterogeneous data according to the process architecture to obtain the first data set comprises:

3. The method of claim 1, wherein extracting the parallelism characteristic interval of each first data in the first data set to obtain the parallelism requirement type of each first data comprises:

4. A method according to claim 3, wherein calculating a maximum value of the parallelism characteristic index of each first data in the first data set comprises:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>

Representing the +.>

Modulo the vector corresponding to the dynamic feature data; />

Representing the +.>

Modulo the vector corresponding to the static feature data; />

Representation and corresponding dynamic feature set +.>

The maximum weight in the feature weights of all the static feature data in the corresponding static feature set is shown; />

Representing the +.>

Representing the corresponding maximum value.

5. The method of claim 4, wherein calculating a minimum value of the parallelism characteristic index for each of the first data in the first data set comprises:

Representing the corresponding minimum value; />

Representing the minimum weight in the feature weights of all the static feature data in the corresponding static feature set;

6. The method of claim 1, wherein extracting the parallelism characteristic interval for each first data in the first dataset to obtain the parallelism requirement type for each first data, further comprises:

7. The method of claim 2, wherein matching corresponding instruction sets from a type-instruction database based on the parallelism demand type, results in an instruction sequence, comprising:

8. The method of claim 3, wherein determining the final tag for the parallel data based on the first slope difference for the left link segment and the second slope difference for the right link segment where the same parallel data is located comprises:

9. The method of claim 3, wherein constructing corresponding static feature sets and dynamic feature sets based on slope analysis results comprises:

10. The method of claim 3, wherein obtaining a maximum value and a minimum value of the parallelism characteristic index for each first data in the first data set based on the static characteristic set, the dynamic slope trend graph, and the first slope trend graph comprises: