CN112069269A

CN112069269A - Big data and multidimensional feature-based data tracing method and big data cloud server

Info

Publication number: CN112069269A
Application number: CN202010877961.6A
Authority: CN
Inventors: 黄天红
Original assignee: Individual
Current assignee: Yancheng Moyu Big Data Information Technology Co.,Ltd.
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2020-12-11
Anticipated expiration: 2040-08-27
Also published as: CN112818068A; CN112069269B; CN112818067A

Abstract

The application relates to a data tracing method based on big data and multidimensional characteristics and a big data cloud server, which comprises the steps of firstly carrying out multidimensional characteristic identification on data to be traced to obtain a multidimensional data characteristic queue, secondly carrying out data environment parameter clustering on the multidimensional data characteristic queue to obtain characteristic distribution information, respectively carrying out characteristic correlation identification and data interaction defect identification on the characteristic distribution information to obtain a correlation data characteristic set and a defect data characteristic set, then carrying out index value extraction on the multidimensional data characteristic queue and the correlation data characteristic set according to the defect data characteristic set to obtain a target index value including index categories, and finally inquiring target pairing data corresponding to the data to be traced to a source in a preset database according to the target index value and the index categories and carrying out tracing on the data to be traced to the source according to the target tracing data to obtain original service data. Therefore, the multidimensional data characteristics of the data to be traced can be considered, and the complete and accurate tracing of the data to be traced can be realized.

Description

Big data and multidimensional feature-based data tracing method and big data cloud server

Technical Field

The application relates to the technical field of big data analysis, in particular to a big data and multidimensional feature-based data tracing method and a big data cloud server.

Background

With the development of science and technology, the big data era turns to something short. The big data can be converted, analyzed and optimized according to different source data of information technologies such as mobile internet, internet of things, social networks, digital families, electronic commerce and the like, and various results are fed back to the application in a cross mode to improve user experience, so that the largest commercial value, economic value and social value are created. The big data is reasonably and effectively utilized, and larger competitiveness, value and wealth can be created for people so as to realize maximization of data value.

However, as the data service scale is continuously enlarged, the data volume also shows a surge state, which causes a huge storage pressure to the big data server, and in order to improve the storage pressure of the big data server, the service data needs to be compressed and stored. Some data are inevitably lost in the process of compressing and storing the service data, and when the compressed and stored service data is reused subsequently, the source tracing of the compressed and stored service data is needed.

Disclosure of Invention

The application provides a big data and multi-dimensional feature-based data tracing method and a big data cloud server, so that data can be traced to obtain complete original service data.

According to a first aspect of the embodiments of the present invention, there is provided a data tracing method based on big data and multidimensional features, including:

carrying out multi-dimensional feature recognition on data to be traced according to a preset data feature recognition model to obtain a multi-dimensional data feature queue corresponding to the data to be traced;

performing data environment parameter clustering on the multidimensional data feature queue to obtain feature distribution information of the data to be traced;

performing characteristic correlation identification on the characteristic distribution information of the data to be traced to obtain a correlation data characteristic set corresponding to the data to be traced;

performing data interaction defect identification on the feature distribution information of the data to be traced to obtain a defect data feature set corresponding to the data to be traced;

extracting index values of the multidimensional data feature queue and the relevant data feature set according to the defect data feature set to obtain a target index value including index categories;

and inquiring target pairing data corresponding to the data to be traced in a preset database according to the target index value and the index category thereof, and tracing the data to be traced according to the target tracing data to obtain original service data corresponding to the data to be traced.

On the basis of the first aspect, before performing multi-dimensional feature recognition on the data to be traced according to the preset data feature recognition model, the method further includes: extracting a service data label from the data to be traced to obtain a service processing label of the data to be traced;

the method for performing multi-dimensional feature recognition on data to be traced according to a preset data feature recognition model to obtain a multi-dimensional data feature queue corresponding to the data to be traced includes: and traversing and matching the data characteristics and the service processing labels of the data to be traced according to a pre-stored label information set in the preset data characteristic identification model to obtain a multidimensional data characteristic queue corresponding to the data to be traced.

On the basis of the first aspect, the performing traversal matching of the data features and the service processing labels on the service processing labels of the data to be traced according to the pre-stored label information set in the preset data feature identification model to obtain the multidimensional data feature queue corresponding to the data to be traced includes:

according to a pre-stored label information set in the preset data characteristic identification model and a service processing label of the data to be traced, determining a label mapping path for converting the service processing label into the pre-stored label information set;

and performing path node segmentation mapping on the service processing label of the data to be traced according to the label mapping path, and determining a multidimensional data feature queue corresponding to the data to be traced based on label description information obtained after the path node segmentation mapping.

On the basis of the first aspect, the target clustering model for clustering the data environment parameters comprises a clustering driving thread and a clustering correction thread;

the clustering of data environment parameters to the multidimensional data feature queue to obtain the feature distribution information of the data to be traced includes: performing data environment parameter clustering based on characteristic dimension quantity on the multidimensional data characteristic queue through the clustering driving thread to obtain characteristic distribution information of the data to be traced;

the performing feature correlation identification on the feature distribution information of the data to be traced to obtain a correlation data feature set corresponding to the data to be traced includes: performing cluster set correction based on cluster set concentration screening on the feature distribution information of the data to be traced through the cluster correction thread to obtain a relevant data feature set corresponding to the data to be traced;

the data interaction defect identification is carried out on the feature distribution information of the data to be traced, and a defect data feature set corresponding to the data to be traced is obtained, and the method comprises the following steps: and performing cluster set correction based on defect curve time sequence change on the feature distribution information of the data to be traced through the cluster correction thread to obtain a defect data feature set corresponding to the data to be traced.

On the basis of the first aspect, the cluster correction thread includes a plurality of correction paths having a progressive relationship; the cluster correction based on cluster set concentration screening is performed on the feature distribution information of the data to be traced through the cluster correction thread to obtain a relevant data feature set corresponding to the data to be traced, and the method comprises the following steps:

performing characteristic distribution interval correction on the characteristic distribution information of the data to be traced through a first correction path in the plurality of correction paths with progressive relations; and transmitting the correction output information of the first correction path to a next correction path determined based on the progressive relation, continuing to perform feature distribution interval correction and correction output information output in the next correction path determined based on the progressive relation until the correction output information is output to a last correction path, mapping the correction output information output by the last correction path to a clustering feature list, and determining a correlation data feature set corresponding to the data to be traced based on a concentration weight queue corresponding to the clustering concentration of the correction output information output by the last correction path in the clustering feature list.

On the basis of the first aspect, the cluster driving thread includes a plurality of driving functions with driving interference; the clustering of the data environment parameters based on the characteristic dimension quantity is performed on the multidimensional data characteristic queue through the clustering driving thread to obtain the characteristic distribution information of the data to be traced, and the clustering driving thread comprises the following steps:

extracting environmental characteristic data of the multidimensional data characteristic queue through a driving function with the largest interference factor in the plurality of driving functions with driving interference;

loading the current environmental characteristic data extraction result of the driving function with the maximum interference factor into the driving function with the second large interference factor except the driving function with the maximum interference factor in the plurality of driving functions with driving interference, and continuing to extract the environmental characteristic data and cascade loading the current environmental characteristic data extraction result in the driving function with the second large interference factor except the driving function with the maximum interference factor in the plurality of driving functions with driving interference until the current environmental characteristic data extraction result is cascade loaded into the driving function with the minimum interference factor in the plurality of driving functions with driving interference;

and taking the environmental characteristic data with the target dimension number in the current environmental characteristic data extraction result output by the driving function with the minimum interference factor in the driving functions with the driving interference as the characteristic distribution information of the data to be traced.

On the basis of the first aspect, when the cluster correction thread includes a plurality of correction paths having a progressive relationship and a common correction node exists between adjacent correction paths, the cluster correction thread performs cluster set correction based on cluster set concentration screening on the feature distribution information of the data to be traced, so as to obtain a related data feature set corresponding to the data to be traced, including:

performing characteristic distribution interval correction on the characteristic distribution information of the data to be traced through a first correction path in the plurality of correction paths with progressive relations; integrating the correction output information with the current environment feature data extraction result output by the target drive function corresponding to the target correction path of which the first correction path has the common correction node, taking the integrated result as the correction output information of the first correction path, and outputting the integrated result to the next correction path determined based on the progressive relation, so as to continue feature distribution interval correction, integration processing and correction output information output in the next correction path determined based on the progressive relation until the final correction path is output;

and mapping the correction output information output by the last correction path to a clustering feature list, and determining a correlation data feature set corresponding to the data to be traced according to a concentration ratio weight queue corresponding to the cluster concentration ratio of the correction output information output by the last correction path in the clustering feature list.

On the basis of the first aspect, the extracting, according to the defect data feature set, index values of the multidimensional data feature queue and the correlation data feature set to obtain a target index value including an index category includes:

performing the following for each defective data feature in the defective data feature set: multiplying the defect ratio of the defect data features in the multi-dimensional data feature queue corresponding to the defect data features by the defect ratio of the defect data features in the defect data feature set to obtain a first defect ratio of the defect data features;

normalizing the defect ratio of the defect data features in the defect data feature set, and multiplying the normalization result by the defect ratio of the corresponding defect data features in the relevant data feature set to obtain a second defect ratio of the defect data features;

performing weighted summation on the first defect ratio and the second defect ratio to obtain a defect index label of the defect data characteristic;

and extracting the defect index value of the defect index label of each defect data characteristic to obtain a target index value comprising the index category.

According to a second aspect of the embodiments of the present invention, there is provided a big data cloud server, including: the system comprises a processor, a memory and a network interface, wherein the memory and the network interface are connected with the processor; the network interface is connected with a nonvolatile memory in the big data cloud server; when the processor is operated, the computer program is called from the nonvolatile memory through the network interface, and the computer program is operated through the memory so as to execute the method.

According to a third aspect of the embodiments of the present invention, a readable storage medium applied to a computer is provided, where a computer program is burned in the readable storage medium, and the computer program implements the method when running in a memory of a big data cloud server.

When the big data and multidimensional feature-based data tracing method and the big data cloud server are applied, multidimensional feature identification is firstly carried out on data to be traced to obtain a multidimensional data feature queue, secondly, clustering data environment parameters of the multidimensional data feature queue to obtain feature distribution information, respectively carrying out feature correlation identification and data interaction defect identification on the feature distribution information to obtain a correlation data feature set and a defect data feature set, and finally, according to the target index value and the index category thereof, inquiring target pairing data corresponding to the data to be traced in a preset database, and tracing the data to be traced according to the target tracing data to obtain original service data corresponding to the data to be traced. Therefore, the multidimensional data characteristics of the data to be traced can be taken into consideration, and the relevance data characteristic set and the defect data characteristic set of the data to be traced are further deeply mined to accurately determine the target index value and the index category thereof, so that the complete and accurate tracing of the data to be traced is realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a system architecture diagram of a big data and multidimensional feature based data traceability system according to an exemplary embodiment of the present application.

FIG. 2 is a flowchart illustrating a data tracing method based on big data and multidimensional features according to an exemplary embodiment.

FIG. 3 is a block diagram illustrating an embodiment of a data tracing apparatus based on big data and multidimensional features according to an exemplary embodiment of the present application.

Fig. 4 is a hardware structure diagram of a big data cloud server where a big data and multidimensional feature-based data tracing apparatus of the present application is located.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The inventor finds that common methods for tracing the source of the business data mostly perform data restoration based on a compression path in research and analysis, but data characteristics of the business data on different dimensions are not considered, so that data relevance of the business data and defects generated during data compression are ignored, and complete tracing of the business data is difficult to realize.

In order to solve the technical problem, embodiments of the present invention are directed to providing a big data and multidimensional feature-based data tracing method and a big data cloud server. Referring first to fig. 1, a system architecture diagram of a data tracing system 100 based on big data and multidimensional features is provided. The data tracing system 100 may include a big data cloud server 200 and a business terminal 400 connected to each other. The big data cloud server 200 collects the service data from the service terminal 400, compresses and stores the service data, and completely traces the source of the service data when the compressed and stored service data needs to be traced.

In this embodiment, the big data cloud server 200 may be applied not only to smart cities, but also to smart medical, smart industrial parks, and smart industrial internet, and the data tracing system 100 may be applied to scenes such as big data, cloud computing, and edge computing, including but not limited to new energy vehicle system management, intelligent online office, intelligent online education, cloud game data processing, e-commerce live delivery processing, cloud internet of vehicles processing, block chain digital financial currency service, block chain supply chain financial service, and the like, and is not limited herein. It is understood that when applied to the above-mentioned corresponding fields, the types of the service data are adjusted and further refined, and are not listed here.

On the basis, please refer to fig. 2 in combination, a flowchart of a data tracing method based on big data and multidimensional features is provided, and the method may be applied to the big data cloud server 200 in fig. 1, and specifically may include the contents described in the following steps S21 to S26.

Step S21, performing multi-dimensional feature recognition on the data to be traced according to a preset data feature recognition model to obtain a multi-dimensional data feature queue corresponding to the data to be traced.

For example, the data to be traced is compressed business data pre-stored in a big data cloud server, and the business data is acquired from a business terminal.

And step S22, performing data environment parameter clustering on the multidimensional data feature queue to obtain feature distribution information of the data to be traced.

For example, the feature distribution information may be presented in a graph, a list, or other form, and is not limited herein.

Step S23, performing feature correlation identification on the feature distribution information of the data to be traced to obtain a correlation data feature set corresponding to the data to be traced.

And step S24, performing data interaction defect identification on the feature distribution information of the data to be traced to obtain a defect data feature set corresponding to the data to be traced.

And step S25, extracting index values of the multidimensional data feature queue and the relevant data feature set according to the defect data feature set to obtain a target index value including index categories.

For example, the target index value may be plural.

Step S26, querying target paired data corresponding to the data to be traced in a preset database according to the target index value and the index category thereof, and tracing the data to be traced according to the target traced data to obtain original service data corresponding to the data to be traced.

For example, the preset database may be a relational database such as MYSQL database or Hive database, but not limited thereto, and the target pairing data may be multiple sets of data.

It can be understood that, by performing the steps described in the above-mentioned steps S21-S26, firstly performing multidimensional feature identification on the source data to obtain a multidimensional data feature queue, secondly, clustering data environment parameters of the multidimensional data feature queue to obtain feature distribution information, respectively carrying out feature correlation identification and data interaction defect identification on the feature distribution information to obtain a correlation data feature set and a defect data feature set, and finally, according to the target index value and the index category thereof, inquiring target pairing data corresponding to the data to be traced in a preset database, and tracing the data to be traced according to the target tracing data to obtain original service data corresponding to the data to be traced. Therefore, the multidimensional data characteristics of the data to be traced can be taken into consideration, and the relevance data characteristic set and the defect data characteristic set of the data to be traced are further deeply mined to accurately determine the target index value and the index category thereof, so that the complete and accurate tracing of the data to be traced is realized.

In a specific implementation process, between step S21, the method further includes: and extracting the service data label of the data to be traced to obtain the service processing label of the data to be traced. Further, on the basis, the performing multidimensional feature recognition on the data to be traced according to the preset data feature recognition model to obtain a multidimensional data feature queue corresponding to the data to be traced includes: and traversing and matching the data characteristics and the service processing labels of the data to be traced according to a pre-stored label information set in the preset data characteristic identification model to obtain a multidimensional data characteristic queue corresponding to the data to be traced. In this way, the comprehensiveness and accuracy of the multidimensional data feature queue can be ensured based on the business processing labels.

Further, on the basis, the traversing matching of the data features and the service processing labels is performed on the service processing labels of the data to be traced according to the pre-stored label information set in the preset data feature identification model, so as to obtain the multidimensional data feature queue corresponding to the data to be traced, and the method specifically includes the contents described in the following steps a and b.

Step a, according to a pre-stored label information set in the preset data characteristic identification model and a service processing label of the data to be traced, determining a label mapping path for converting the service processing label into the pre-stored label information set.

And b, performing path node segmented mapping on the service processing label of the data to be traced according to the label mapping path, and determining a multidimensional data feature queue corresponding to the data to be traced based on label description information obtained after the path node segmented mapping.

Therefore, accurate traversal matching can be achieved based on the steps a-b, and therefore comprehensiveness and accuracy of the multidimensional data feature queue are guaranteed.

In practical application, the target clustering model for data environment parameter clustering can comprise a clustering driving thread and a clustering correction thread.

On the basis of this technology, the clustering of data environment parameters on the multidimensional data feature queue described in step S22 to obtain feature distribution information of the data to be traced may exemplarily include: and clustering the data environment parameters of the multidimensional data feature queue based on the feature dimension number through the clustering driving thread to obtain the feature distribution information of the data to be traced.

On the basis of this technology, the performing feature correlation identification on the feature distribution information of the data to be traced described in step S23 to obtain a correlation data feature set corresponding to the data to be traced may exemplarily include: and performing cluster set correction based on cluster set concentration screening on the feature distribution information of the data to be traced through the cluster correction thread to obtain a correlation data feature set corresponding to the data to be traced.

On the basis of this technology, the performing data interaction defect identification on the feature distribution information of the data to be traced described in step S24 to obtain a defect data feature set corresponding to the data to be traced may exemplarily include: and performing cluster set correction based on defect curve time sequence change on the feature distribution information of the data to be traced through the cluster correction thread to obtain a defect data feature set corresponding to the data to be traced.

Therefore, the confidence degrees of the feature distribution information, the relevance data feature set and the defect data feature set can be ensured, and the effectiveness of subsequent data tracing is ensured.

On the basis, the cluster correction thread comprises a plurality of correction paths with progressive relation; the cluster correction based on cluster set concentration screening is performed on the feature distribution information of the data to be traced through the cluster correction thread to obtain a relevant data feature set corresponding to the data to be traced, and the method specifically includes the following steps: performing characteristic distribution interval correction on the characteristic distribution information of the data to be traced through a first correction path in the plurality of correction paths with progressive relations; and transmitting the correction output information of the first correction path to a next correction path determined based on the progressive relation, continuing to perform feature distribution interval correction and correction output information output in the next correction path determined based on the progressive relation until the correction output information is output to a last correction path, mapping the correction output information output by the last correction path to a clustering feature list, and determining a correlation data feature set corresponding to the data to be traced based on a concentration weight queue corresponding to the clustering concentration of the correction output information output by the last correction path in the clustering feature list.

In this way, accurate statistics of the set of correlation data features can be achieved based on the progressive correction path.

Further, the cluster driving thread comprises a plurality of driving functions with driving interference; the clustering of the data environment parameters based on the characteristic dimension quantity is performed on the multidimensional data characteristic queue through the clustering driving thread to obtain the characteristic distribution information of the data to be traced, and the clustering driving thread comprises the following steps:

It can be understood that the feature discrimination and the feature identification between the feature distribution information can be ensured based on the above detailed description of obtaining the feature distribution information of the data to be traced by clustering the data environment parameter clustering based on the feature dimension number on the multidimensional data feature queue by the clustering driving thread.

In an implementation manner, when the cluster correction thread includes a plurality of correction paths having a progressive relationship and a common correction node exists between adjacent correction paths, the cluster correction thread performs cluster set correction based on cluster set concentration screening on the feature distribution information of the data to be traced, so as to obtain a correlation data feature set corresponding to the data to be traced, including:

Therefore, accurate screening of the correlation data feature set can be achieved, and the noise rate of the correlation data feature set is ensured to be minimized.

In a specific embodiment, in order to implement compatibility matching between a target index value and a relational database, the index value extraction of the multidimensional data feature queue and the correlation data feature set according to the defect data feature set described in step S25 is performed to obtain a target index value including an index category, which may specifically include the contents described in the following steps S251 to S254.

Step S251, for each defective data feature in the defective data feature set, performing the following processing: and multiplying the defect ratio corresponding to the defect data characteristics in the multi-dimensional data characteristic queue by the defect ratio of the defect data characteristics in the defect data characteristic set to obtain a first defect ratio of the defect data characteristics.

Step S252, performing normalization processing on the defect ratio of the defect data feature in the defect data feature set, and multiplying the normalization processing result by the defect ratio of the defect data feature corresponding to the relevant data feature set to obtain a second defect ratio of the defect data feature.

Step S253, performing weighted summation on the first defect ratio and the second defect ratio to obtain a defect index tag of the defect data feature.

Step S254, performing defect index value extraction on the defect index label of each defect data feature to obtain a target index value including an index category.

When the contents described in the above steps S251 to S254 are applied, compatibility matching of the target index value with the relational database can be achieved.

In an alternative embodiment, in order to ensure the accuracy and integrity of the target paired data, the querying, in the preset database, of the target paired data corresponding to the data to be traced according to the target index value and the index category thereof, described in step S26, may specifically include the following contents described in step S2611 to step S2615.

Step S2611 is to generate an index value list corresponding to the target index value and an index category list corresponding to the index category, where the index value list and the index category list respectively include a plurality of list data sets of different index significant coefficients.

Step S2612, determine an index query statement of the target index value in any list data set of the index value list, and determine a list data set having a maximum index significant coefficient in the index category list as a target list data set.

Step S2613, writing the index query statement to the target list data set according to the relevance data feature set and the defect data feature set to obtain an index matching statement corresponding to the index query statement in the target list data set, and constructing an index matching path between the target index value and the index category according to the word vector similarity between the index query statement and the index matching statement.

Step S2614, obtaining an associated index statement in the target list data set by using the index matching statement as a reference statement, writing the associated index statement to the list data set where the index query statement is located based on an inverted index matching path corresponding to the index matching path, obtaining a database call statement corresponding to the associated index statement in the list data set where the index query statement is located, and determining call path information of the database call statement as data extraction information.

Step S2615, obtaining query timing information written by the index query statement into the target list dataset; traversing the data identifier corresponding to the data extraction information in the index category list according to the delay restoration confidence between the database calling statement and the time delay weight corresponding to the plurality of time sequence nodes in the query time sequence information until the obtained traceability evaluation value of the list data set of the data identifier is consistent with the traceability evaluation value of the data extraction information in the index value list; and inquiring stored data corresponding to the data identification from the preset database based on the index matching statement as target pairing data corresponding to the data to be traced.

In this way, the accuracy and integrity of the target paired data can be ensured through the contents described in the above steps S2611 to S2615.

In another alternative embodiment, in order to implement complete tracing on original business data to ensure smooth development of subsequent business processes, the tracing on the to-be-traced data according to the target tracing data described in step S26 to obtain the original business data corresponding to the to-be-traced data further may include the following contents described in steps S2621 to S2625.

Step S2621, determining a first tracing track curve, a second tracing track curve and a third tracing track curve of the target tracing data relative to the data to be traced; and determining a first cosine distance between a first track characteristic corresponding to the first tracing track curve and a second track characteristic corresponding to the second tracing track curve and a second cosine distance between a second track characteristic corresponding to the second tracing track curve and a third track characteristic corresponding to the third tracing track curve.

Step S2622, for the first tracing trajectory curve, performing curve smoothing on the first tracing trajectory curve according to the first cosine distance by taking the first trajectory feature as a reference to obtain a fourth tracing trajectory curve; and for the second tracing track curve, performing curve smoothing on the second tracing track curve according to the second cosine distance by taking the second track characteristic as a reference to obtain a fifth tracing track curve.

Step S2623, respectively performing curve fitting on the first tracing track curve and the second tracing track curve, the first tracing track curve and the fourth tracing track curve, the second tracing track curve and the third tracing track curve, and the second tracing track curve and the fifth tracing track curve to obtain a first fitting curve, a second fitting curve, a third fitting curve and a fourth fitting curve; a first curve dispersion between the first fitted curve and the second fitted curve and a second curve dispersion between the third fitted curve and the fourth fitted curve are determined.

Step S2624, judging whether the first curve dispersion and the second curve dispersion both correspond to a preset dispersion; if so, determining a traceability generalization coefficient of the target traceability data according to the first fitting curve and the third fitting curve, and performing traceability data integration on the first traceability trajectory curve, the second traceability trajectory curve and the third traceability trajectory curve according to the traceability generalization coefficient to obtain a data integration result; if not, respectively determining a first difference value and a second difference value between the first curve dispersion and the preset dispersion and the second curve dispersion and the preset dispersion; comparing the magnitude of the first difference value and the second difference value; when the first difference value is smaller than the second difference value, determining a traceability generalization coefficient of the target traceability data according to the first fitting curve and the second fitting curve, and performing traceability data integration on the first traceability trajectory curve, the second traceability trajectory curve and the third traceability trajectory curve according to the traceability generalization coefficient to obtain a data integration result; when the first difference value is larger than the second difference value, determining a traceability generalization coefficient of the target traceability data according to the third fitting curve and the fourth fitting curve, and performing traceability data integration on the first traceability trajectory curve, the second traceability trajectory curve and the third traceability trajectory curve according to the traceability generalization coefficient to obtain a data integration result.

Step S2625, tracing the data to be traced based on the data integration result to obtain original service data corresponding to the data to be traced.

In a specific implementation process, through the contents described in the above steps S2621 to S2625, complete tracing to the original business data can be realized to ensure smooth development of the subsequent business process.

Based on the same inventive concept as above, please refer to fig. 3 in combination, which provides a data tracing apparatus 300 based on big data and multidimensional features, comprising:

the queue obtaining module 310 is configured to perform multidimensional feature identification on data to be traced according to a preset data feature identification model, so as to obtain a multidimensional data feature queue corresponding to the data to be traced;

the parameter clustering module 320 is configured to perform data environment parameter clustering on the multidimensional data feature queue to obtain feature distribution information of the data to be traced;

the feature identification module 330 is configured to perform feature correlation identification on the feature distribution information of the data to be traced to obtain a correlation data feature set corresponding to the data to be traced;

the defect identification module 340 is configured to perform data interaction defect identification on the feature distribution information of the data to be traced to obtain a defect data feature set corresponding to the data to be traced;

an index extraction module 350, configured to perform index value extraction on the multidimensional data feature queue and the relevant data feature set according to the defective data feature set, so as to obtain a target index value including an index category;

the data tracing module 360 is configured to query, in a preset database, target pairing data corresponding to the data to be traced according to the target index value and the index category thereof, and trace the data to be traced according to the target tracing data to obtain original service data corresponding to the data to be traced.

For the above description of the queue obtaining module 310, the parameter clustering module 320, the feature identifying module 330, the defect identifying module 340, the index extracting module 350 and the data tracing module 360, please refer to the detailed description of the steps and substeps of the method shown in fig. 2, which is not described herein again.

Further, on the basis of fig. 1 and fig. 2, a data tracing system based on big data and multidimensional features is further provided, which includes a big data cloud server and a service terminal that are in communication connection with each other;

the service terminal is used for:

sending a data calling request to a big data cloud server;

the big data cloud server is used for:

acquiring corresponding data to be traced according to the data call request;

inquiring target pairing data corresponding to the data to be traced in a preset database according to the target index value and the index category thereof, and tracing the data to be traced according to the target tracing data to obtain original service data corresponding to the data to be traced;

and feeding back the original service data to the service terminal.

On the basis, please refer to fig. 4 in combination, which provides a big data cloud server 200, including: a processor 210, and a memory 220 and a network interface 230 connected to the processor 210; the network interface 230 is connected with a nonvolatile memory 240 in the big data cloud server 200; the processor 210 retrieves a computer program from the non-volatile memory 240 via the network interface 230 and runs the computer program via the memory 220 to perform the above-described method.

Likewise, a readable storage medium applied to a computer is also provided, and the readable storage medium is burned with a computer program, and the computer program realizes the method when running in the memory 220 of the big data cloud server 200.

The various technical features in the above embodiments can be arbitrarily combined, so long as there is no conflict or contradiction between the combinations of the features, but the combination is limited by the space and is not described one by one, and therefore, any combination of the various technical features in the above embodiments also belongs to the scope disclosed in the present specification.

The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

In summary, when the method, the apparatus, and the system are applied, firstly, the multidimensional feature identification is performed on the data to be traced to obtain the multidimensional data feature queue, secondly, the data environment parameter clustering is performed on the multidimensional data feature queue to obtain the feature distribution information, and respectively, the feature correlation identification and the data interaction defect identification are performed on the feature distribution information to obtain the correlation data feature set and the defect data feature set, then, the index values of the multidimensional data feature queue and the correlation data feature set are extracted according to the defect data feature set to obtain the target index value including the index category, and finally, the target pairing data corresponding to the data to be traced to be searched in the preset database according to the target index value and the index category thereof, and the original service data corresponding to the data to be traced to be sourced is obtained according to the target index value and the. Therefore, the multidimensional data characteristics of the data to be traced can be taken into consideration, and the relevance data characteristic set and the defect data characteristic set of the data to be traced are further deeply mined to accurately determine the target index value and the index category thereof, so that the complete and accurate tracing of the data to be traced is realized.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A data tracing method based on big data and multidimensional characteristics is characterized by comprising the following steps:

2. The method according to claim 1, wherein before performing multi-dimensional feature recognition on the data to be traced according to the preset data feature recognition model, the method further comprises: extracting a service data label from the data to be traced to obtain a service processing label of the data to be traced;

3. The method according to claim 2, wherein the step of performing traversal matching of data features and service processing labels on the service processing labels of the data to be traced according to a pre-stored label information set in the preset data feature recognition model to obtain a multidimensional data feature queue corresponding to the data to be traced comprises:

4. The method of claim 1, wherein the target clustering model for data environment parameter clustering comprises a cluster driving thread and a cluster correcting thread;

5. The method of claim 4, wherein the cluster correction thread comprises a plurality of correction paths having a progressive relationship; the cluster correction based on cluster set concentration screening is performed on the feature distribution information of the data to be traced through the cluster correction thread to obtain a relevant data feature set corresponding to the data to be traced, and the method comprises the following steps:

6. The method of claim 4, wherein the cluster driven threads comprise a plurality of driving functions with driving disturbances; the clustering of the data environment parameters based on the characteristic dimension quantity is performed on the multidimensional data characteristic queue through the clustering driving thread to obtain the characteristic distribution information of the data to be traced, and the clustering driving thread comprises the following steps:

7. The method according to claim 6, wherein when the cluster correction thread includes a plurality of correction paths having a progressive relationship and a common correction node exists between adjacent correction paths, the cluster correction based on cluster concentration screening is performed on the feature distribution information of the data to be traced through the cluster correction thread to obtain a related data feature set corresponding to the data to be traced, including:

8. The method of claim 1, wherein the extracting the index values of the multidimensional data feature queue and the relevant data feature set according to the defect data feature set to obtain the target index value including the index category comprises:

9. A big data cloud server, comprising:

a processor, and

a memory and a network interface connected with the processor;

the network interface is connected with a nonvolatile memory in the big data cloud server;

the processor, when running, retrieves a computer program from the non-volatile memory via the network interface and runs the computer program via the memory to perform the method of any of claims 1-8.

10. A readable storage medium applied to a computer, wherein the readable storage medium is burned with a computer program, and the computer program is used for implementing the method of any one of the above claims 1 to 8 when the computer program runs in a memory of a big data cloud server.