CN117852777B - Linking method and system for multi-source heterogeneous data asset - Google Patents

Linking method and system for multi-source heterogeneous data asset Download PDF

Info

Publication number
CN117852777B
CN117852777B CN202410262827.3A CN202410262827A CN117852777B CN 117852777 B CN117852777 B CN 117852777B CN 202410262827 A CN202410262827 A CN 202410262827A CN 117852777 B CN117852777 B CN 117852777B
Authority
CN
China
Prior art keywords
data asset
data
characteristic value
mapping
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410262827.3A
Other languages
Chinese (zh)
Other versions
CN117852777A (en
Inventor
霍绥力
张春红
张尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huaban Zhiyuan Technology Co ltd
Original Assignee
Beijing Huaban Zhiyuan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huaban Zhiyuan Technology Co ltd filed Critical Beijing Huaban Zhiyuan Technology Co ltd
Priority to CN202410262827.3A priority Critical patent/CN117852777B/en
Publication of CN117852777A publication Critical patent/CN117852777A/en
Application granted granted Critical
Publication of CN117852777B publication Critical patent/CN117852777B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for managing multi-source heterogeneous data assets, which are applied to the technical field of data processing, wherein the method comprises the following steps: and performing tertiary clustering by obtaining the basic information of the multi-source heterogeneous data assets, and generating a data asset clustering result. And calling the data asset time stamp through a data distribution mapping algorithm, traversing the data asset clustering result to perform dimension reduction mapping, and generating data asset low-dimension mapping information. And receiving a data access request of the user side for the clustering result of the data assets, wherein the data access request comprises a target timestamp. And carrying out mapping information matching on the low-dimensional mapping information of the data asset according to the target timestamp and the index timestamp to obtain the target low-dimensional mapping information. And carrying out up-dimensional restoration on the target low-dimensional mapping information, generating a target data asset and sending the target data asset to the user side. The technical problems of large storage load and poor data asset security of multi-source heterogeneous data assets in the prior art are solved.

Description

Linking method and system for multi-source heterogeneous data asset
Technical Field
The invention relates to the field of data processing, in particular to a method and a system for linking multi-source heterogeneous data assets.
Background
A data asset is a data asset that is capable of bringing economic benefits to an enterprise, typically by recording the data asset in an electronically stored manner. However, in the prior art, the source of the data asset is wide and the structure is complex, the storage load of the data asset is large, the data asset can be directly acquired after the data is leaked, and the security of the data asset is poor.
Therefore, in the prior art, the multi-source heterogeneous data asset has the technical problems of large storage load and poor data asset security.
Disclosure of Invention
The application solves the technical problems of large storage load and poor data asset security of the multi-source heterogeneous data asset in the prior art by providing the method and the system for managing the multi-source heterogeneous data asset.
The present application provides a method of linking together multi-source heterogeneous data assets, the method comprising: obtaining multi-source heterogeneous data asset base information, wherein the multi-source heterogeneous data asset base information at least comprises a data asset source, a data asset structure, a data asset type and a data asset timestamp; performing tertiary clustering on multi-source heterogeneous data assets according to the data asset sources, the data asset types and the data asset structures to generate data asset clustering results;
The data asset time stamp is called through a data distribution mapping algorithm, the data asset clustering result is traversed to carry out dimension reduction mapping, and data asset low-dimension mapping information is generated, wherein the data asset low-dimension mapping information is provided with an index time stamp; receiving a data access request of a user side to the data asset clustering result, wherein the data access request comprises a target timestamp; performing mapping information matching on the data asset low-dimensional mapping information according to the target timestamp and the index timestamp to obtain target low-dimensional mapping information; and carrying out up-dimensional restoration on the target low-dimensional mapping information, generating a target data asset and sending the target data asset to a user side.
The present application also provides a linked system for multi-source heterogeneous data assets, the system comprising: the system comprises a data acquisition module, a data storage module and a data storage module, wherein the data acquisition module is used for acquiring multi-source heterogeneous data asset basic information, and the multi-source heterogeneous data asset basic information at least comprises a data asset source, a data asset structure, a data asset type and a data asset timestamp; the clustering module is used for performing tertiary clustering on the multi-source heterogeneous data assets according to the data asset sources, the data asset types and the data asset structures to generate data asset clustering results; the dimension reduction mapping module is used for calling the data asset time stamp through a data distribution mapping algorithm, traversing the data asset clustering result to carry out dimension reduction mapping, and generating data asset low-dimension mapping information, wherein the data asset low-dimension mapping information is provided with an index time stamp; the access request receiving module is used for receiving a data access request of the user side to the data asset clustering result, wherein the data access request comprises a target time stamp; the mapping matching module is used for carrying out mapping information matching on the data asset low-dimensional mapping information according to the target timestamp and the index timestamp to obtain target low-dimensional mapping information; and the data restoration module is used for carrying out up-dimensional restoration on the target low-dimensional mapping information, generating a target data asset and sending the target data asset to the user side.
The application also provides an electronic device, comprising:
A memory for storing executable instructions;
And the processor is used for realizing the method for linking the multi-source heterogeneous data asset when executing the executable instructions stored in the memory.
The present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the linking method for multi-source heterogeneous data assets provided by the present application.
The method and the system for managing the multi-source heterogeneous data asset are used for generating a data asset clustering result by performing tertiary clustering on obtained basic information of the multi-source heterogeneous data asset. And calling the data asset time stamp through a data distribution mapping algorithm, traversing the data asset clustering result to perform dimension reduction mapping, and generating data asset low-dimension mapping information. And receiving a data access request of the user side for the clustering result of the data assets, wherein the data access request comprises a target timestamp. And carrying out mapping information matching on the low-dimensional mapping information of the data asset according to the target timestamp and the index timestamp to obtain the target low-dimensional mapping information. And carrying out up-dimensional restoration on the target low-dimensional mapping information, generating a target data asset and sending the target data asset to the user side. The method has the advantages that the dimension reduction storage of the multi-source heterogeneous data asset is realized, the storage load is reduced, the data content cannot be directly extracted even if the data is leaked, and the safety of the data is improved. The technical problems of large storage load and poor data asset security of multi-source heterogeneous data assets in the prior art are solved.
The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the following description will briefly explain the drawings of the embodiments of the present invention. It is apparent that the figures in the following description relate only to some embodiments of the invention and are not limiting of the invention.
FIG. 1 is a flow diagram of a method for linking multi-source heterogeneous data assets provided by an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for linking multi-source heterogeneous data assets to obtain asset clustering results according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a method for linking multi-source heterogeneous data assets to match target low-dimensional mapping information according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a system for providing a multi-source heterogeneous data asset according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a link system electronic device for multi-source heterogeneous data assets according to an embodiment of the present invention.
Reference numerals illustrate: the system comprises a data acquisition module 11, a clustering module 12, a dimension reduction mapping module 13, an access request receiving module 14, a mapping matching module 15, a data reduction module 16, a processor 31, a memory 32, an input device 33 and an output device 34.
Detailed Description
Examples
The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a particular order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only.
While the present application makes various references to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server, the modules are merely illustrative, and different aspects of the system and method may use different modules.
A flowchart is used in the present application to describe the operations performed by a system according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.
As shown in fig. 1, an embodiment of the present application provides a method for linking multi-source heterogeneous data assets, the method comprising:
Obtaining multi-source heterogeneous data asset base information, wherein the multi-source heterogeneous data asset base information at least comprises a data asset source, a data asset structure, a data asset type and a data asset timestamp;
performing tertiary clustering on multi-source heterogeneous data assets according to the data asset sources, the data asset types and the data asset structures to generate data asset clustering results;
the data asset time stamp is called through a data distribution mapping algorithm, the data asset clustering result is traversed to carry out dimension reduction mapping, and data asset low-dimension mapping information is generated, wherein the data asset low-dimension mapping information is provided with an index time stamp;
A data asset is a data asset that is capable of bringing economic benefits to an enterprise, typically by recording the data asset in an electronically stored manner. Multi-source heterogeneous data asset base information is obtained, wherein the multi-source heterogeneous data asset base information includes at least a data asset source, a data asset structure, a data asset type and a data asset timestamp, and specific data content. The data asset source is a specific source of data such as enterprise management system, enterprise website, etc., the data asset structure is a data storage structure, the data asset type is a specific category of data, such as data category, text category, etc., other non-graphic data, and the data asset timestamp is the acquisition time of the data. And performing tertiary clustering on the multi-source heterogeneous data assets according to the data asset sources, the data asset types and the data asset structures, wherein the clustering sequence is used for clustering according to the sequence of the data asset sources, the data asset types and the data asset structures, and further generating a data asset clustering result. Further, the data asset time stamp is called through a data distribution mapping algorithm, the data asset clustering result is traversed to carry out dimension reduction mapping, and data asset low-dimension mapping information is generated, wherein the data asset low-dimension mapping information is provided with an index time stamp.
As shown in fig. 2, the method provided by the embodiment of the present application further includes:
Performing primary clustering on the multi-source heterogeneous data assets according to the data asset sources to generate a data asset primary clustering result;
Traversing the primary clustering result of the data asset to perform secondary clustering according to the type of the data asset, and generating a secondary clustering result of the data asset;
And traversing the data asset secondary clustering result to perform tertiary clustering according to the data asset structure, and generating the data asset clustering result.
Performing tertiary clustering on the multi-source heterogeneous data asset according to the data asset source, the data asset type and the data asset structure to generate a data asset clustering result, wherein the data asset clustering result comprises the following steps: and performing primary clustering on the multi-source heterogeneous data assets according to the data asset sources to generate a primary clustering result of the data assets. And then traversing the primary clustering result of the data asset to perform secondary clustering according to the type of the data asset, and generating a secondary clustering result of the data asset. And further, traversing the data asset secondary clustering result to perform tertiary clustering according to the data asset structure, and generating the data asset clustering result. The data asset clustering result is generated by carrying out clustering operation according to the clustering sequence of primary clustering, secondary clustering and tertiary clustering. After the data asset clustering results are generated, the data asset sources, the data asset types and the data asset structures of the clustering results in the data asset clustering results are consistent, so that the data in the clustering clusters are conveniently processed and classified and stored according to the clustering clusters.
The method provided by the embodiment of the application further comprises the following steps:
Obtaining a first cluster data asset characteristic value set of the data asset clustering result;
Randomly selecting a first data asset characteristic value from the first cluster data asset characteristic value set, and setting the first data asset characteristic value as a reference characteristic value, wherein the reference characteristic value has a reference timestamp at the data asset timestamp;
Randomly selecting a second data asset characteristic value which is different from the first data asset characteristic value from the first cluster data asset characteristic value, and setting the second data asset characteristic value as a comparison characteristic value, wherein the comparison characteristic value has a comparison time stamp at the data asset time stamp;
Calculating a semantic mapping vector from the reference feature value to the comparison feature value;
Calculating a time mapping vector from the reference time stamp to the comparison time stamp;
And constructing low-dimensional mapping information of the comparison eigenvalue based on the reference eigenvalue, the reference timestamp, the semantic mapping vector and the time mapping vector, and adding the low-dimensional mapping information into the data asset low-dimensional mapping information, wherein the reference timestamp is the index timestamp.
And calling the data asset time stamp through a data distribution mapping algorithm, traversing the data asset clustering result to perform dimension reduction mapping, and generating data asset low-dimension mapping information, wherein the data asset low-dimension mapping information has an index time stamp and comprises the following steps: and obtaining a first cluster data asset characteristic value set of the data asset clustering result, wherein the first cluster data asset characteristic value set is specific content of each clustered data. And randomly selecting a first data asset characteristic value from the first cluster data asset characteristic value set, namely randomly acquiring one data in the first cluster data asset characteristic value set, and setting the data as a reference characteristic value, wherein the reference characteristic value has a reference timestamp at the data asset timestamp. And then randomly selecting a second data asset characteristic value which is different from the first data asset characteristic value from the first cluster of data asset characteristic values, and setting the second data asset characteristic value as an alignment characteristic value, wherein the alignment characteristic value has an alignment time stamp at the data asset time stamp. And calculating semantic mapping vectors from the reference characteristic values to the comparison characteristic values, wherein the obtained characteristic values are all data of the same category, such as Chinese characters and digital letters, and have corresponding semantic distance vectors, so that the comparison characteristic values can be represented by the semantic mapping vectors. And calculating the time mapping vector from the reference time stamp to the comparison time stamp in the same acquisition mode. And any one comparison characteristic value is stored based on the reference characteristic value and the reference time stamp through the semantic mapping vector and the time mapping vector, so that the data storage load is reduced, and the data security is improved. And constructing low-dimensional mapping information of the comparison eigenvalue based on the reference eigenvalue, the reference timestamp, the semantic mapping vector and the time mapping vector, and adding the low-dimensional mapping information of the comparison eigenvalue into the low-dimensional mapping information of the data asset, wherein the reference timestamp is the index timestamp, and the acquisition of the semantic mapping vector and the time mapping vector of the residual data asset eigenvalue in the first cluster of data asset eigenvalues is completed.
The method provided by the embodiment of the application further comprises the following steps:
performing semantic unit decomposition on the reference characteristic value to obtain a reference unit sequence;
carrying out semantic unit decomposition on the comparison characteristic values to obtain a comparison unit sequence;
When the number of the first semantic units of the reference unit sequence is larger than or equal to the number of the second semantic units of the comparison unit sequence, aligning the comparison unit sequence and the reference unit sequence from beginning to end for semantic distance vector analysis, and generating the semantic mapping vector;
When the number of the first semantic units of the reference unit sequence is smaller than the number of the second semantic units of the comparison unit sequence, aligning the comparison unit sequence with the reference unit sequence from the beginning, supplementing preset reference characters to the reference unit sequence, and performing semantic distance vector analysis on the alignment of the tail parts of the comparison unit sequence to generate the semantic mapping vector.
Calculating a semantic mapping vector of the reference feature value to the comparison feature value, comprising: and carrying out semantic unit decomposition on the reference characteristic value, decomposing the reference characteristic value into a plurality of single characteristics, wherein the single characteristics are the minimum units which cannot be continuously segmented by the reference characteristic value, the reference units after the segmentation of the digital characteristics are single numbers, and obtaining a reference unit sequence based on the sequence formed by the plurality of reference units. And carrying out semantic unit decomposition on the comparison characteristic values by adopting the same segmentation mode to obtain a comparison unit sequence. When the number of the first semantic units of the reference unit sequence is greater than or equal to the number of the second semantic units of the comparison unit sequence, the elements in the reference unit sequence can calculate semantic mapping vectors of all units in the comparison unit sequence, and the comparison unit sequence and the reference unit sequence are aligned from the beginning to perform semantic distance vector analysis to generate the semantic mapping vectors. When the number of the first semantic units of the reference unit sequence is smaller than the number of the second semantic units of the comparison unit sequence, the elements in the reference unit sequence cannot realize semantic mapping vector calculation of all units in the comparison unit sequence, aligning the comparison unit sequence with the reference unit sequence from the beginning, supplementing a preset reference character after the last unit of the reference unit sequence to realize semantic distance vector analysis with the tail alignment of the comparison unit sequence, and generating the semantic mapping vector, wherein the preset reference character is a preset character which is convenient for semantic mapping vector calculation. And calculating a time mapping vector from the reference time stamp to the alignment time stamp by adopting the same calculation scheme as the semantic mapping vector.
Receiving a data access request of a user side to the data asset clustering result, wherein the data access request comprises a target timestamp;
Performing mapping information matching on the data asset low-dimensional mapping information according to the target timestamp and the index timestamp to obtain target low-dimensional mapping information;
And carrying out up-dimensional restoration on the target low-dimensional mapping information, generating a target data asset and sending the target data asset to a user side.
And receiving a data access request of the user side to the data asset clustering result, wherein the data access request comprises a target timestamp. And then, carrying out mapping information matching on the data asset low-dimensional mapping information according to the target timestamp and the index timestamp, namely after the target timestamp is acquired, acquiring the data to be accessed by combining the index timestamp to acquire the target low-dimensional mapping information. And finally, carrying out dimension lifting restoration on the target low-dimensional mapping information, and carrying out inverse operation on the semantic mapping vector which is the target low-dimensional mapping information and is combined with the reference characteristic value during dimension lifting restoration, so as to generate a target data asset and send the target data asset to a user side. The method has the advantages that the dimension reduction storage of the multi-source heterogeneous data asset is realized, the storage load is reduced, the data content cannot be directly extracted even if the data is leaked, and the safety of the data is improved.
As shown in fig. 3, the method provided by the embodiment of the present application further includes:
calculating an index time mapping vector from the index time stamp to the target time stamp;
And matching the target low-dimensional mapping information from the data asset low-dimensional mapping information according to the index time mapping vector.
Performing mapping information matching on the data asset low-dimensional mapping information according to the target timestamp and the index timestamp to obtain target low-dimensional mapping information, wherein the method comprises the following steps: and calculating an index time mapping vector from the index time stamp to the target time stamp, wherein the index time mapping vector obtained at the moment is the time mapping vector of the comparison time stamp corresponding to the data to be retrieved by the user. And matching the target low-dimensional mapping information from the data asset low-dimensional mapping information according to the index time mapping vector.
The method provided by the embodiment of the application further comprises the following steps:
configuring a data asset disposition task set, wherein the data asset disposition task set comprises a to-be-tuned data asset base information tag;
Traversing the data asset disposal task set and configuring a uniquely associated task number tag set;
And calling the basic information label of the data asset to be regulated, the task number label set and the data asset time stamp through a data distribution mapping algorithm, traversing the data asset clustering result to perform dimension reduction mapping, and generating data asset low-dimension mapping information, wherein the data asset low-dimension mapping information is provided with an index time stamp.
A data asset handling task set is configured, wherein the data asset handling task set comprises a plurality of execution tasks needing to be called in the data set, and the data asset handling task set comprises a to-be-called data asset basic information tag. And traversing the data asset disposal task set, and configuring a unique associated task number label set, namely distributing a unique task number for the data asset disposal task set. And then, calling the basic information label of the data asset to be regulated, the task number label set and the data asset time stamp through a data distribution mapping algorithm, traversing the data asset clustering result to perform dimension reduction mapping, and generating data asset low-dimension mapping information, wherein the data asset low-dimension mapping information is provided with an index time stamp. By configuring the data asset handling task set, the task numbers can be directly searched for efficient acquisition of corresponding data when corresponding task processing is performed subsequently.
The method provided by the embodiment of the application further comprises the following steps:
Obtaining a first cluster data asset characteristic value set of the data asset clustering result;
Randomly selecting a first data asset characteristic value from the first cluster of data asset characteristic values, and setting the first data asset characteristic value as a reference characteristic value, wherein the reference characteristic value has a reference timestamp at the data asset timestamp;
Randomly selecting a second data asset characteristic value which is different from the first data asset characteristic value from the first cluster data asset characteristic value set, and setting the second data asset characteristic value as a comparison characteristic value, wherein the comparison characteristic value has a comparison time stamp at the data asset time stamp;
Calculating a semantic mapping vector from the reference feature value to the comparison feature value;
Calculating a time mapping vector from the reference time stamp to the comparison time stamp;
matching the relevant task number set for the comparison characteristic value according to the to-be-adjusted data asset basic information tag and the task number tag set;
And constructing low-dimensional mapping information of the comparison characteristic value based on the reference characteristic value, the reference timestamp, the semantic mapping vector, the time mapping vector and the associated task number set, and adding the low-dimensional mapping information into the low-dimensional mapping information of the data asset, wherein the reference timestamp is the index timestamp.
And calling the basic information tag of the data asset to be regulated, the task number tag set and the data asset time stamp through a data distribution mapping algorithm, traversing the data asset clustering result to perform dimension reduction mapping, and generating data asset low-dimension mapping information, wherein the method comprises the following steps of: and obtaining a first cluster data asset characteristic value set of the data asset clustering result. And randomly selecting a first data asset characteristic value from the first cluster of data asset characteristic values, and setting the first data asset characteristic value as a reference characteristic value, wherein the reference characteristic value has a reference timestamp at the data asset timestamp. And randomly selecting a second data asset characteristic value which is different from the first data asset characteristic value from the first cluster data asset characteristic value set, and setting the second data asset characteristic value as a comparison characteristic value, wherein the comparison characteristic value has a comparison time stamp at the data asset time stamp. And calculating a semantic mapping vector from the reference characteristic value to the comparison characteristic value. A time mapping vector of the reference time stamp to the alignment time stamp is calculated. And matching the relevant task number set for the comparison characteristic value according to the to-be-adjusted data asset basic information tag and the task number tag set, namely, corresponding the comparison characteristic value and the task number. And constructing low-dimensional mapping information of the comparison characteristic value based on the reference characteristic value, the reference timestamp, the semantic mapping vector, the time mapping vector and the associated task number set, and adding the low-dimensional mapping information into the low-dimensional mapping information of the data asset, wherein the reference timestamp is the index timestamp.
According to the technical scheme provided by the embodiment of the invention, the multi-source heterogeneous data asset basic information is obtained, wherein the multi-source heterogeneous data asset basic information at least comprises a data asset source, a data asset structure, a data asset type and a data asset timestamp. And performing tertiary clustering on the multi-source heterogeneous data asset according to the data asset source, the data asset type and the data asset structure to generate a data asset clustering result. And calling the data asset time stamp through a data distribution mapping algorithm, traversing the data asset clustering result to perform dimension reduction mapping, and generating data asset low-dimension mapping information, wherein the data asset low-dimension mapping information is provided with an index time stamp. And receiving a data access request of the user side to the data asset clustering result, wherein the data access request comprises a target timestamp. And carrying out mapping information matching on the data asset low-dimensional mapping information according to the target timestamp and the index timestamp to obtain target low-dimensional mapping information. And carrying out up-dimensional restoration on the target low-dimensional mapping information, generating a target data asset and sending the target data asset to a user side. The method has the advantages that the dimension reduction storage of the multi-source heterogeneous data asset is realized, the storage load is reduced, the data content cannot be directly extracted even if the data is leaked, and the safety of the data is improved. The technical problems of large storage load and poor data asset security of multi-source heterogeneous data assets in the prior art are solved.
Examples
Based on the same inventive concept as the method for linking multi-source heterogeneous data assets in the foregoing embodiments, the present invention also provides a linking system for multi-source heterogeneous data assets, which can be implemented in hardware and/or software, and can be generally integrated in an electronic device, for performing the method provided by any of the embodiments of the present invention. As shown in fig. 4, the system includes:
A data acquisition module 11, configured to obtain multi-source heterogeneous data asset base information, where the multi-source heterogeneous data asset base information includes at least a data asset source, a data asset structure, a data asset type, and a data asset timestamp;
A clustering module 12, configured to perform tertiary clustering on the multi-source heterogeneous data assets according to the data asset source, the data asset type and the data asset structure, and generate a data asset clustering result;
The dimension-reducing mapping module 13 is configured to retrieve the data asset timestamp through a data distribution mapping algorithm, traverse the data asset clustering result to perform dimension-reducing mapping, and generate data asset low-dimension mapping information, where the data asset low-dimension mapping information has an index timestamp;
an access request receiving module 14, configured to receive a data access request from a user side to the data asset clustering result, where the data access request includes a target timestamp;
The mapping matching module 15 is configured to match mapping information in the data asset low-dimensional mapping information according to the target timestamp and the index timestamp, so as to obtain target low-dimensional mapping information;
And the data restoration module 16 is configured to perform up-scaling restoration on the target low-dimensional mapping information, generate a target data asset, and send the target data asset to the user side.
Further, the clustering module 12 is further configured to:
Performing primary clustering on the multi-source heterogeneous data assets according to the data asset sources to generate a data asset primary clustering result;
Traversing the primary clustering result of the data asset to perform secondary clustering according to the type of the data asset, and generating a secondary clustering result of the data asset;
And traversing the data asset secondary clustering result to perform tertiary clustering according to the data asset structure, and generating the data asset clustering result.
Further, the dimension-reduction mapping module 13 is further configured to:
Obtaining a first cluster data asset characteristic value set of the data asset clustering result;
Randomly selecting a first data asset characteristic value from the first cluster data asset characteristic value set, and setting the first data asset characteristic value as a reference characteristic value, wherein the reference characteristic value has a reference timestamp at the data asset timestamp;
Randomly selecting a second data asset characteristic value which is different from the first data asset characteristic value from the first cluster data asset characteristic value, and setting the second data asset characteristic value as a comparison characteristic value, wherein the comparison characteristic value has a comparison time stamp at the data asset time stamp;
Calculating a semantic mapping vector from the reference feature value to the comparison feature value;
Calculating a time mapping vector from the reference time stamp to the comparison time stamp;
And constructing low-dimensional mapping information of the comparison eigenvalue based on the reference eigenvalue, the reference timestamp, the semantic mapping vector and the time mapping vector, and adding the low-dimensional mapping information into the data asset low-dimensional mapping information, wherein the reference timestamp is the index timestamp.
Further, the dimension-reduction mapping module 13 is further configured to:
performing semantic unit decomposition on the reference characteristic value to obtain a reference unit sequence;
carrying out semantic unit decomposition on the comparison characteristic values to obtain a comparison unit sequence;
When the number of the first semantic units of the reference unit sequence is larger than or equal to the number of the second semantic units of the comparison unit sequence, aligning the comparison unit sequence and the reference unit sequence from beginning to end for semantic distance vector analysis, and generating the semantic mapping vector;
When the number of the first semantic units of the reference unit sequence is smaller than the number of the second semantic units of the comparison unit sequence, aligning the comparison unit sequence with the reference unit sequence from the beginning, supplementing preset reference characters to the reference unit sequence, and performing semantic distance vector analysis on the alignment of the tail parts of the comparison unit sequence to generate the semantic mapping vector.
Further, the mapping matching module 15 is further configured to:
calculating an index time mapping vector from the index time stamp to the target time stamp;
And matching the target low-dimensional mapping information from the data asset low-dimensional mapping information according to the index time mapping vector.
Further, the data reduction module 16 is further configured to:
configuring a data asset disposition task set, wherein the data asset disposition task set comprises a to-be-tuned data asset base information tag;
Traversing the data asset disposal task set and configuring a uniquely associated task number tag set;
And calling the basic information label of the data asset to be regulated, the task number label set and the data asset time stamp through a data distribution mapping algorithm, traversing the data asset clustering result to perform dimension reduction mapping, and generating data asset low-dimension mapping information, wherein the data asset low-dimension mapping information is provided with an index time stamp.
Further, the data reduction module 16 is further configured to:
Obtaining a first cluster data asset characteristic value set of the data asset clustering result;
Randomly selecting a first data asset characteristic value from the first cluster of data asset characteristic values, and setting the first data asset characteristic value as a reference characteristic value, wherein the reference characteristic value has a reference timestamp at the data asset timestamp;
Randomly selecting a second data asset characteristic value which is different from the first data asset characteristic value from the first cluster data asset characteristic value, and setting the second data asset characteristic value as a comparison characteristic value, wherein the comparison characteristic value has a comparison time stamp at the data asset time stamp;
Calculating a semantic mapping vector from the reference feature value to the comparison feature value;
Calculating a time mapping vector from the reference time stamp to the comparison time stamp;
matching the relevant task number set for the comparison characteristic value according to the to-be-adjusted data asset basic information tag and the task number tag set;
And constructing low-dimensional mapping information of the comparison characteristic value based on the reference characteristic value, the reference timestamp, the semantic mapping vector, the time mapping vector and the associated task number set, and adding the low-dimensional mapping information into the low-dimensional mapping information of the data asset, wherein the reference timestamp is the index timestamp.
The included units and modules are only divided according to the functional logic, but are not limited to the above-mentioned division, so long as the corresponding functions can be realized; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Examples
Fig. 5 is a schematic structural diagram of an electronic device provided in a third embodiment of the present invention, and shows a block diagram of an exemplary electronic device suitable for implementing an embodiment of the present invention. The electronic device shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present invention. As shown in fig. 5, the electronic device includes a processor 31, a memory 32, an input device 33, and an output device 34; the number of processors 31 in the electronic device may be one or more, in fig. 5, one processor 31 is taken as an example, and the processors 31, the memory 32, the input device 33 and the output device 34 in the electronic device may be connected by a bus or other means, in fig. 5, by bus connection is taken as an example.
The memory 32 serves as a computer readable storage medium for storing software programs, computer executable programs and modules, such as program instructions/modules corresponding to the linking method for multi-source heterogeneous data assets in embodiments of the present invention. The processor 31 executes various functional applications of the computer device and power distribution by running software programs, instructions and modules stored in the memory 32, i.e. implements the linking method for multi-source heterogeneous data assets described above.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (8)

1. A method for linking multi-source heterogeneous data assets, comprising:
Obtaining multi-source heterogeneous data asset base information, wherein the multi-source heterogeneous data asset base information at least comprises a data asset source, a data asset structure, a data asset type and a data asset timestamp;
performing tertiary clustering on multi-source heterogeneous data assets according to the data asset sources, the data asset types and the data asset structures to generate data asset clustering results;
the data asset time stamp is called through a data distribution mapping algorithm, the data asset clustering result is traversed to carry out dimension reduction mapping, and data asset low-dimension mapping information is generated, wherein the data asset low-dimension mapping information is provided with an index time stamp;
receiving a data access request of a user side to the data asset clustering result, wherein the data access request comprises a target timestamp;
Performing mapping information matching on the data asset low-dimensional mapping information according to the target timestamp and the index timestamp to obtain target low-dimensional mapping information;
Performing dimension lifting reduction on the target low-dimension mapping information, generating a target data asset and sending the target data asset to a user side;
the data asset time stamp is called through a data distribution mapping algorithm, the data asset clustering result is traversed to carry out dimension reduction mapping, and data asset low-dimension mapping information is generated, wherein the data asset low-dimension mapping information is provided with an index time stamp, and the method comprises the following steps:
Obtaining a first cluster data asset characteristic value set of the data asset clustering result;
Randomly selecting a first data asset characteristic value from the first cluster data asset characteristic value set, and setting the first data asset characteristic value as a reference characteristic value, wherein the reference characteristic value has a reference timestamp at the data asset timestamp;
Randomly selecting a second data asset characteristic value which is different from the first data asset characteristic value from the first cluster data asset characteristic value, and setting the second data asset characteristic value as a comparison characteristic value, wherein the comparison characteristic value has a comparison time stamp at the data asset time stamp;
Calculating a semantic mapping vector from the reference feature value to the comparison feature value;
Calculating a time mapping vector from the reference time stamp to the comparison time stamp;
constructing low-dimensional mapping information of the comparison eigenvalue based on the reference eigenvalue, the reference timestamp, the semantic mapping vector and the time mapping vector, and adding the low-dimensional mapping information into the data asset low-dimensional mapping information, wherein the reference timestamp is the index timestamp;
wherein calculating a semantic mapping vector of the reference feature value to the alignment feature value comprises:
performing semantic unit decomposition on the reference characteristic value to obtain a reference unit sequence;
carrying out semantic unit decomposition on the comparison characteristic values to obtain a comparison unit sequence;
When the number of the first semantic units of the reference unit sequence is larger than or equal to the number of the second semantic units of the comparison unit sequence, aligning the comparison unit sequence and the reference unit sequence from beginning to end for semantic distance vector analysis, and generating the semantic mapping vector;
When the number of the first semantic units of the reference unit sequence is smaller than the number of the second semantic units of the comparison unit sequence, aligning the comparison unit sequence with the reference unit sequence from the beginning, supplementing preset reference characters to the reference unit sequence, and performing semantic distance vector analysis on the alignment of the tail parts of the comparison unit sequence to generate the semantic mapping vector.
2. The method of claim 1, wherein tertiary clustering of multi-source heterogeneous data assets according to the data asset sources, the data asset types, and the data asset structures, generating data asset cluster results comprises:
Performing primary clustering on the multi-source heterogeneous data assets according to the data asset sources to generate a data asset primary clustering result;
Traversing the primary clustering result of the data asset to perform secondary clustering according to the type of the data asset, and generating a secondary clustering result of the data asset;
And traversing the data asset secondary clustering result to perform tertiary clustering according to the data asset structure, and generating the data asset clustering result.
3. The method of claim 1, wherein performing mapping information matching at the data asset low-dimensional mapping information based on the target timestamp and the index timestamp to obtain target low-dimensional mapping information, comprising:
calculating an index time mapping vector from the index time stamp to the target time stamp;
And matching the target low-dimensional mapping information from the data asset low-dimensional mapping information according to the index time mapping vector.
4. The method as recited in claim 1, further comprising:
configuring a data asset disposition task set, wherein the data asset disposition task set comprises a to-be-tuned data asset base information tag;
Traversing the data asset disposal task set and configuring a uniquely associated task number tag set;
And calling the basic information label of the data asset to be regulated, the task number label set and the data asset time stamp through a data distribution mapping algorithm, traversing the data asset clustering result to perform dimension reduction mapping, and generating data asset low-dimension mapping information, wherein the data asset low-dimension mapping information is provided with an index time stamp.
5. The method of claim 4, wherein retrieving the data asset base information tag to be tuned, the task number tag set, and the data asset timestamp by a data distribution mapping algorithm, traversing the data asset clustering result for dimension-reduction mapping, generating data asset low-dimensional mapping information, comprises:
Obtaining a first cluster data asset characteristic value set of the data asset clustering result;
Randomly selecting a first data asset characteristic value from the first cluster of data asset characteristic values, and setting the first data asset characteristic value as a reference characteristic value, wherein the reference characteristic value has a reference timestamp at the data asset timestamp;
Randomly selecting a second data asset characteristic value which is different from the first data asset characteristic value from the first cluster data asset characteristic value, and setting the second data asset characteristic value as a comparison characteristic value, wherein the comparison characteristic value has a comparison time stamp at the data asset time stamp;
Calculating a semantic mapping vector from the reference feature value to the comparison feature value;
Calculating a time mapping vector from the reference time stamp to the comparison time stamp;
matching the relevant task number set for the comparison characteristic value according to the to-be-adjusted data asset basic information tag and the task number tag set;
And constructing low-dimensional mapping information of the comparison characteristic value based on the reference characteristic value, the reference timestamp, the semantic mapping vector, the time mapping vector and the associated task number set, and adding the low-dimensional mapping information into the low-dimensional mapping information of the data asset, wherein the reference timestamp is the index timestamp.
6. A piping system for multi-source heterogeneous data assets, comprising:
the system comprises a data acquisition module, a data storage module and a data storage module, wherein the data acquisition module is used for acquiring multi-source heterogeneous data asset basic information, and the multi-source heterogeneous data asset basic information at least comprises a data asset source, a data asset structure, a data asset type and a data asset timestamp;
The clustering module is used for performing tertiary clustering on the multi-source heterogeneous data assets according to the data asset sources, the data asset types and the data asset structures to generate data asset clustering results;
The dimension reduction mapping module is used for calling the data asset time stamp through a data distribution mapping algorithm, traversing the data asset clustering result to carry out dimension reduction mapping, and generating data asset low-dimension mapping information, wherein the data asset low-dimension mapping information is provided with an index time stamp;
the access request receiving module is used for receiving a data access request of the user side to the data asset clustering result, wherein the data access request comprises a target time stamp;
The mapping matching module is used for carrying out mapping information matching on the data asset low-dimensional mapping information according to the target timestamp and the index timestamp to obtain target low-dimensional mapping information;
The data reduction module is used for carrying out dimension-lifting reduction on the target low-dimensional mapping information, generating a target data asset and sending the target data asset to the user side;
The dimension-reduction mapping module is further used for:
Obtaining a first cluster data asset characteristic value set of the data asset clustering result;
Randomly selecting a first data asset characteristic value from the first cluster data asset characteristic value set, and setting the first data asset characteristic value as a reference characteristic value, wherein the reference characteristic value has a reference timestamp at the data asset timestamp;
Randomly selecting a second data asset characteristic value which is different from the first data asset characteristic value from the first cluster data asset characteristic value, and setting the second data asset characteristic value as a comparison characteristic value, wherein the comparison characteristic value has a comparison time stamp at the data asset time stamp;
Calculating a semantic mapping vector from the reference feature value to the comparison feature value;
Calculating a time mapping vector from the reference time stamp to the comparison time stamp;
constructing low-dimensional mapping information of the comparison eigenvalue based on the reference eigenvalue, the reference timestamp, the semantic mapping vector and the time mapping vector, and adding the low-dimensional mapping information into the data asset low-dimensional mapping information, wherein the reference timestamp is the index timestamp;
The dimension-reduction mapping module is further used for:
performing semantic unit decomposition on the reference characteristic value to obtain a reference unit sequence;
carrying out semantic unit decomposition on the comparison characteristic values to obtain a comparison unit sequence;
When the number of the first semantic units of the reference unit sequence is larger than or equal to the number of the second semantic units of the comparison unit sequence, aligning the comparison unit sequence and the reference unit sequence from beginning to end for semantic distance vector analysis, and generating the semantic mapping vector;
When the number of the first semantic units of the reference unit sequence is smaller than the number of the second semantic units of the comparison unit sequence, aligning the comparison unit sequence with the reference unit sequence from the beginning, supplementing preset reference characters to the reference unit sequence, and performing semantic distance vector analysis on the alignment of the tail parts of the comparison unit sequence to generate the semantic mapping vector.
7. An electronic device, the electronic device comprising:
A memory for storing executable instructions;
a processor for implementing the method of linking multi-source heterogeneous data assets of any one of claims 1 to 5 when executing executable instructions stored in the memory.
8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a linking method for multi-source heterogeneous data assets as claimed in any of claims 1-5.
CN202410262827.3A 2024-03-07 2024-03-07 Linking method and system for multi-source heterogeneous data asset Active CN117852777B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410262827.3A CN117852777B (en) 2024-03-07 2024-03-07 Linking method and system for multi-source heterogeneous data asset

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410262827.3A CN117852777B (en) 2024-03-07 2024-03-07 Linking method and system for multi-source heterogeneous data asset

Publications (2)

Publication Number Publication Date
CN117852777A CN117852777A (en) 2024-04-09
CN117852777B true CN117852777B (en) 2024-05-24

Family

ID=90531488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410262827.3A Active CN117852777B (en) 2024-03-07 2024-03-07 Linking method and system for multi-source heterogeneous data asset

Country Status (1)

Country Link
CN (1) CN117852777B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994144A (en) * 2023-01-04 2023-04-21 光大科技有限公司 Data storage method and device, storage medium and electronic equipment
CN116738211A (en) * 2023-06-26 2023-09-12 西安工业大学 Road condition identification method based on multi-source heterogeneous data fusion
CN116894152A (en) * 2023-09-11 2023-10-17 山东唐和智能科技有限公司 Multisource data investigation and real-time analysis method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994144A (en) * 2023-01-04 2023-04-21 光大科技有限公司 Data storage method and device, storage medium and electronic equipment
CN116738211A (en) * 2023-06-26 2023-09-12 西安工业大学 Road condition identification method based on multi-source heterogeneous data fusion
CN116894152A (en) * 2023-09-11 2023-10-17 山东唐和智能科技有限公司 Multisource data investigation and real-time analysis method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Imbalanced Data Classification for Multi-Source Heterogenous Sensor Networks;WEI WANG ,MENGJUN ZHANG, LI ZHANG, AND QIONG BAI;《Digital Object Identifier》;20200113;第27406-27413页 *
WEI WANG ,MENGJUN ZHANG, LI ZHANG, AND QIONG BAI.Imbalanced Data Classification for Multi-Source Heterogenous Sensor Networks.《Digital Object Identifier》.2020,第27406-27413页. *

Also Published As

Publication number Publication date
CN117852777A (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN109977110B (en) Data cleaning method, device and equipment
CN107515878B (en) Data index management method and device
JP2020135853A (en) Method, apparatus, electronic device, computer readable medium, and computer program for determining descriptive information
CN111026874A (en) Data processing method and server of knowledge graph
CN110765101A (en) Label generation method and device, computer readable storage medium and server
CN113435859A (en) Letter processing method and device, electronic equipment and computer readable medium
CN111241526B (en) Data permission matching method and device, electronic equipment and storage medium
CN111061713A (en) Block chain data fusion method, device, equipment and storage medium
CN117852777B (en) Linking method and system for multi-source heterogeneous data asset
CN117093619A (en) Rule engine processing method and device, electronic equipment and storage medium
CN111737264A (en) Information processing method and system
Yin et al. Content‐Based Image Retrial Based on Hadoop
CN110765778A (en) Label entity processing method and device, computer equipment and storage medium
CN115048456A (en) User label generation method and device, computer equipment and readable storage medium
CN112199401B (en) Data request processing method, device, server, system and storage medium
CN112507725B (en) Static publishing method, device, equipment and storage medium of financial information
Barapatre et al. Data preparation on large datasets for data science
Marinov A bloom filter application for processing big datasets through MapReduce framework
CN112286916A (en) Data processing method, device, equipment and storage medium
CN100437493C (en) Data processing system and method
CN110647666A (en) Intelligent matching method and device for template and formula and computer readable storage medium
CN113535770B (en) Data query method and device
CN113127574A (en) Service data display method, system, equipment and medium based on knowledge graph
Merelli et al. Porting bioinformatics applications from grid to cloud: A macromolecular surface analysis application case study
CN113051303A (en) Business data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant