CN114138784A - Information tracing method and device based on storage library, electronic equipment and medium - Google Patents

Information tracing method and device based on storage library, electronic equipment and medium Download PDF

Info

Publication number
CN114138784A
CN114138784A CN202111446604.5A CN202111446604A CN114138784A CN 114138784 A CN114138784 A CN 114138784A CN 202111446604 A CN202111446604 A CN 202111446604A CN 114138784 A CN114138784 A CN 114138784A
Authority
CN
China
Prior art keywords
information
traced
source
text
repository
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111446604.5A
Other languages
Chinese (zh)
Other versions
CN114138784B (en
Inventor
蒋树杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202111446604.5A priority Critical patent/CN114138784B/en
Publication of CN114138784A publication Critical patent/CN114138784A/en
Application granted granted Critical
Publication of CN114138784B publication Critical patent/CN114138784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and discloses an information tracing method based on a storage library, which comprises the following steps: judging whether the first storage library has information to be traced or not; if the source table does not belong to the fact table, the source table is stored into a second storage bank to obtain a search path; if the source information exists, a second target table for storing the information to be traced is obtained from the first storage bank, and a search path is obtained; and tracing the information according to the search path. The invention also provides an information tracing device, equipment and a storage medium based on the storage library. The invention also relates to a block chain technology, and the information to be traced can be stored in the block chain node. The invention can improve the accuracy of information tracing.

Description

Information tracing method and device based on storage library, electronic equipment and medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an information tracing method and device based on a storage library, electronic equipment and a computer readable storage medium.
Background
With the development of the existing network communication technology, more and more information is generated, only a small part of data in the information has value, under the condition, the information needs to be cleaned, the information is increased and decreased in the process of cleaning the information in a data table form, the characteristics of original data are blurred, and at the moment, the information needs to be traced.
The existing information tracing mode can only search files with known information through Eclipse, Sublime and other software tracing, cannot achieve a good searching effect on files with unknown information, cannot reflect the flow direction of the information and the searching processing process of each step, and accordingly the accuracy of the information tracing result is not high.
Disclosure of Invention
The invention provides an information tracing method and device based on a storage library, electronic equipment and a computer readable storage medium, and mainly aims to improve the accuracy of information tracing.
In order to achieve the above object, the information tracing method based on the repository provided by the present invention includes:
obtaining information to be traced and judging whether the preset first storage library has the information to be traced;
if the information to be traced does not exist in the first storage library, storing the information to be traced into the first storage library and taking a table in the first storage library, which stores the information to be traced, as a first target table;
obtaining a source file corresponding to the information to be traced and a first source table for storing the source file;
if the first source table does not belong to the fact table, searching the superior file of the source file step by step until the second source table of the found superior file is the fact table, and acquiring the search path of the source file according to the second source table of the superior file;
if the first source table belongs to the fact table, storing the first source table into a preset second storage library, and acquiring a search path of the source file according to the first source table;
if the information to be traced exists in the first storage bank, acquiring a second target table for storing the information to be traced from the first storage bank, acquiring a source file of the information to be traced and a first source table for storing the source file according to the second target table, and acquiring a search path of the source file from the first source table;
and tracing the information according to the search path.
Optionally, the determining whether the to-be-traced information exists in a preset first repository includes:
if the information to be traced contains non-text information, converting the non-text information in the information to be traced into text information;
performing text word segmentation on the converted information to be traced according to a preset word segmentation algorithm to obtain word segmentation text information;
extracting keywords of the word segmentation text information through a preset keyword extraction model to obtain text keywords;
and judging whether the information to be traced exists in the first storage bank or not by searching whether the text key exists in each data table in the first storage bank or not.
Optionally, the performing text word segmentation on the converted information to be traced according to a preset word segmentation algorithm to obtain word segmentation text information includes:
performing text word segmentation on the converted information to be traced through various different word segmentation algorithms to obtain multiple groups of word segmentation information;
and calculating the minimum change rate of each group of word segmentation information, and selecting target word segmentation information from the groups of word segmentation information as word segmentation text information according to the minimum change rate.
Optionally, the obtaining a source file corresponding to the information to be traced and a first source table storing the source file includes:
creating an information view by using a structured query language according to the information to be traced;
inquiring all data tables on which the information view depends in a preset data dictionary, and acquiring a source file of the information to be traced based on preset log information;
and taking a data table of the source file in all data tables depended by the information view as a first source table of the information to be traced.
Optionally, before searching the upper-level file of the source file step by step if the first source table does not belong to the fact table, the method further includes:
analyzing the source file to obtain analysis data, and judging whether the analysis data are all digital data;
if the analysis data are all digital data types, determining that the first source table is a fact table;
and if the analysis data are not all digital data types, determining that the first source table is not a fact table.
Optionally, the searching for whether the text keyword exists in each data table in the first repository to determine whether the information to be traced exists in the first repository includes:
vectorizing the text keywords and the text information in the first storage library to obtain a text keyword vector and a text information vector;
calculating the similarity between the text keyword vector and the text information vector through a text similarity algorithm;
if the similarity between the text keyword vector and the text information vector is greater than or equal to a preset threshold value, determining that the information to be traced exists in the first storage library;
and if the similarity between the text keyword vector and the text information vector is smaller than the preset threshold, determining that the information to be traced does not exist in the first storage library.
Optionally, the minimum rate of change of each group of the participle information is calculated by the following formula:
Figure BDA0003384330580000031
where σ is the minimum rate of change, N is the number of segments of the participle information, μ is the segment average word number of the participle information, xiThe number of words of each segment in the word segmentation information.
In order to solve the above problem, the present invention further provides an information tracing apparatus based on a repository, where the apparatus includes:
the information storage position judging module is used for acquiring information to be traced and judging whether the preset first storage library has the information to be traced;
a source file obtaining module, configured to, if the information to be traced does not exist in the first storage repository, store the information to be traced into the first storage repository, use a table in the first storage repository, where the information to be traced is stored, as a first target table, and obtain a source file corresponding to the information to be traced and a first source table in which the source file is stored;
a fact table judging module, configured to search a higher-level file of the source file step by step if the first source table does not belong to a fact table until a second source table of the searched higher-level file is the fact table, obtain a search path of the source file according to the second source table of the higher-level file, store the first source table in a preset second storage repository if the first source table belongs to the fact table, and obtain the search path of the source file according to the first source table;
an information path searching module, configured to, if the information to be traced exists in the first repository, obtain a second target table storing the information to be traced from the first repository, obtain a source file of the information to be traced and a first source table storing the source file according to the second target table, and obtain a search path of the source file from the first source table;
and the information tracing module is used for tracing the information according to the search path.
In order to solve the above problem, the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the repository-based information tracing method as described above.
In order to solve the above problem, the present invention further provides a computer-readable storage medium including a storage data area and a storage program area, the storage data area storing created data, the storage program area storing a computer program; wherein the computer program, when executed by a processor, implements a repository-based information tracing method as described above.
In the embodiment of the invention, information to be traced is firstly obtained, whether the information exists in a first storage bank in a data storage system is judged, preliminary classification is carried out, if the information does not exist, the information to be traced is stored in the first storage bank in the data storage system, a source file and a source table of the information to be traced are obtained, the source file is analyzed to obtain analysis data, if the source table belongs to a fact table, the source table is stored in a second storage bank, a search path of the source file is obtained according to the source table, if the source table does not belong to the fact table, the source table of the source file is continuously searched until the fact table is searched, in addition, if the information to be traced exists in the first storage bank, the source table of the source file and the search path of the source file are directly obtained according to a target table of the information to be traced in the first storage bank, and finally information tracing is carried out according to the search path of the source file, therefore, the information flow direction of each stage in the information tracing process can be obtained, and therefore the purpose of improving the accuracy of the information tracing can be achieved.
Drawings
Fig. 1 is a schematic flowchart illustrating an information tracing method based on a repository according to an embodiment of the present invention;
FIG. 2 is a block diagram of an information tracing apparatus based on a repository according to an embodiment of the present invention;
fig. 3 is a schematic internal structural diagram of an electronic device implementing a repository-based information tracing method according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides an information tracing method based on a storage library. The executing body of the information tracing method based on the repository includes, but is not limited to, at least one of electronic devices such as a server and a terminal, which can be configured to execute the method provided by the embodiment of the present application. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. In other words, the repository-based information tracing method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Fig. 1 is a schematic flow chart of an information tracing method based on a repository according to an embodiment of the present invention. In this embodiment, the information tracing method based on the repository includes:
and S1, obtaining the information to be traced.
In the embodiment of the present invention, the information to be traced is information whose source and flow direction need to be searched, for example, the information to be traced may be information of a calling program package in code writing.
Specifically, the information to be traced can be various types of information. For example, the information to be traced can be text type tracing information, image type tracing information, video type tracing information, and the like.
S2, judging whether the preset first repository has the information to be traced
In an embodiment of the present invention, the first repository is a data repository in a data storage system, and is used to store all information to be traced, specifically, if the information to be traced exists in the data storage system, the information to be traced is stored in a data table of the first repository, and specifically, the information to be traced is stored in a certain field in the data table.
In an embodiment of the present invention, the determining whether the information to be traced exists in the preset first repository includes:
if the information to be traced contains non-text information, converting the non-text information in the information to be traced into text information;
performing text word segmentation on the converted information to be traced according to a preset word segmentation algorithm to obtain word segmentation text information;
extracting keywords of the word segmentation text information through a preset keyword extraction model to obtain text keywords;
and judging whether the information to be traced exists in the first storage bank or not by searching whether the text key exists in each data table in the first storage bank or not.
Specifically, if the information to be traced is image information, the information to be traced is converted into the information to be traced in a text format through an image-to-text algorithm. And if the information to be traced is audio information, converting the information to be traced into the information to be traced in a text format through an audio-to-text algorithm.
Specifically, the image-to-text algorithm may be an OCR (Optical Character Recognition) algorithm, and the audio-to-text algorithm may be a ctc (connectivity temporal classification) algorithm.
In the embodiment of the invention, if the information to be traced is text information, the word segmentation algorithm is directly adopted to perform text word segmentation on the information to be traced to obtain the word segmentation text information.
In the embodiment of the present invention, the word segmentation algorithm may be a mechanical word segmentation algorithm, and the mechanical word segmentation algorithm may also be referred to as a word segmentation algorithm based on character string matching. The mechanical word segmentation algorithm can be divided into a maximum matching method, a minimum matching method, a forward matching method and a reverse matching method according to different scanning directions.
Further, the performing text word segmentation on the converted information to be traced according to a preset word segmentation algorithm to obtain word segmentation text information includes:
performing text word segmentation on the converted information to be traced through various different word segmentation algorithms to obtain multiple groups of word segmentation information;
and calculating the minimum change rate of each group of word segmentation information, and selecting target word segmentation information from the groups of word segmentation information as word segmentation text information according to the minimum change rate.
Specifically, the minimum change rate of each group of the word segmentation information is calculated by the following formula:
Figure BDA0003384330580000071
where σ is the minimum rate of change, N is the number of segments of the participle information, μ is the segment average word number of the participle information, xiIs the number of words of each segment in the participle information.
For example, skill change/life ", the number of segments is 3, the average number of words in a segment is 2, and the number of words in each segment is 3, 1, 2.
For example, if the information to be traced is "science and technology change life", the first word segmentation processing is performed on the information to be traced to obtain that the minimum change rate of "science and technology change/life" is sqrt (((3-2) ^2+ (1-2) ^2+ (2-2) ^2)/3) ^ 0.8165, further, the second word segmentation processing is performed on the information to be traced to obtain that the minimum change rate of "science and technology/change/life" is sqrt (((2-2) ^2+ (2-2) ^2+ (2-2) ^2)/3) ═ 0, the word segmentation method when the minimum change rate is the minimum value is selected to obtain that the word segmentation result is the target word segmentation information, that is, the selected word segmentation text information is "science and technology/change/life".
In the embodiment of the present invention, the extracting the keywords of the segmented text information through a preset keyword extraction model to obtain the text keywords includes:
removing stop words from the word segmentation text information to obtain selectable words;
and performing part-of-speech tagging on the optional participles through the keyword extraction model, and performing keyword identification on the part-of-speech tagged optional participles to obtain the text keywords.
Further, the stop word is a word or a word that is automatically filtered before or after the natural language data is processed in the information retrieval process to save storage space and improve retrieval efficiency. Such as "the", "on", "over", etc.
In the embodiment of the invention, the keyword extraction model can adopt TF-IDF algorithm and TextRank algorithm to label the part of speech of the optional participle.
In the embodiment of the invention, after part-of-speech tagging is carried out on the selectable word, a corresponding preset word segmentation mapping table is searched according to the part-of-speech tagging of the selectable word, and the recognition is carried out according to the word segmentation mapping table to obtain the text keyword.
In an embodiment of the present invention, the searching for whether the text keyword exists in each data table in the first repository to determine whether the information to be traced exists in the first repository includes:
vectorizing the text keywords and the text information in the first repository to obtain a text keyword vector and a text information vector, calculating the similarity between the text keyword vector and the text information vector by a text similarity algorithm (such as a cosine similarity algorithm), and taking the text information corresponding to the text information vector with the similarity larger than a preset threshold value as the searched information to be traced.
In the embodiment of the present invention, the text keyword vector and the text information vector have a one-to-many relationship.
S3, if the information to be traced does not exist in the first repository, storing the information to be traced into the first repository and using the table in the first repository storing the information to be traced as a first target table.
In an embodiment of the present invention, after the table in the first repository, in which the information to be traced is stored, is taken as a first target table, the method further includes: and taking a field storing the information to be traced in the first target table as a target field, wherein the target field is a position where the information to be traced is stored in the target table.
The target table is a data object of a data storage in a relational database management system and is composed of rows and columns.
Further, the storing the information to be traced into the first repository includes:
calling an interface of a relational database management system;
storing the traceability information into the first repository through the relational database management system.
In the embodiment of the invention, the relational database management system is a relational database management system, and is a system for organizing and storing data.
S4, obtaining a source file corresponding to the information to be traced and a first source table for storing the source file.
In this embodiment of the present invention, the source file is a data source file of the information to be traced, where obtaining the source file further includes obtaining a first source table storing the source file.
In the embodiment of the present invention, the first source table is a report table storing the source file.
Further, after the source file corresponding to the information to be traced and the first source table storing the source file are obtained, the method further includes:
and analyzing data obtained by analyzing the source file is also stored in the first source table.
For example, if data in a Jar packet is called in a JAVA program, the data in the Jar packet called in the JAVA program is to-be-traced information, the Jar packet is a source file, and a storage table storing the Jar packet is a source table.
In this embodiment of the present invention, the obtaining a source file corresponding to the information to be traced and a first source table storing the source file includes:
creating an information view by using a structured query language according to the information to be traced;
inquiring all data tables on which the information view depends in a preset data dictionary, and acquiring a source file of the information to be traced based on preset log information;
and taking the data table of the source file existing in all the data tables depended by the information view as a first source table of the source file.
Specifically, the information view is obtained by transforming the information to be traced and is used for viewing the first source table of the source file.
In this embodiment of the present invention, the data dictionary (data dictionary) refers to a tool for defining and describing data items, data structures, data streams, data stores, and processing logic of data. And querying the dependency relationship between the data table and the view through the data dictionary.
In the embodiment of the invention, when the first repository is an oracle database, the source file can be analyzed through a plurality of analysis modes such as besutiful library analysis, xpatch analysis, regular expression analysis and the like, so as to obtain the analysis data.
S5, judging whether the first source table belongs to the fact table.
In the embodiment of the present invention, the fact table may also be referred to as a fact data table, and contains a large amount of digital data that can be recorded and summarized, and the fact table does not contain descriptive information, wherein the fact table may be divided into a transaction fact table, a periodic snapshot fact table, and an accumulated snapshot fact table.
In the embodiment of the present invention, by determining whether the first source table is a fact table, the source table can be classified, and the source of the information to be traced is determined.
And S6, if the first source table does not belong to the fact table, searching the superior files of the source files step by step until the second source table of the found superior files is the fact table, and acquiring the search path of the source files according to the second source table of the superior files.
In this embodiment of the present invention, before searching the higher-level file of the source file step by step if the first source table does not belong to the fact table, the method further includes:
analyzing the source file to obtain analysis data, and judging whether the analysis data are all digital data;
if the analysis data are all digital data types, determining that the first source table is a fact table;
and if the analysis data are not all digital data types, determining that the first source table is not a fact table.
Further, searching the higher-level file of the source file step by step is realized according to the update log of the information to be traced by acquiring the information to be traced in the source file.
In this embodiment of the present invention, the source file and the first source table are not necessarily a final source of the information to be traced.
S7, if the first source table belongs to the fact table, storing the first source table into a preset second storage library, and obtaining the search path of the source file according to the first source table.
In the embodiment of the present invention, the search path is a history propagation path of the source file, and is formed by the location of each storage of the first source table in the data transmission process.
Further, the lookup path of the first source table is obtained based on an update log of the information to be traced corresponding to the source file in the first source table.
In this embodiment of the present invention, the search path of the first source table includes a location where the source file is stored each time, that is, a storage path in the computer during each transfer process.
Specifically, the update Log is an event record of a Log generated when the storage device runs, and is used for recording related operations such as a date, time, a user, an action and the like recorded in each storage and transmission process of the information to be traced.
S8, if the information to be traced exists in the first repository, obtaining a second target table storing the information to be traced from the first repository, obtaining a source file of the information to be traced and a first source table storing the source file according to the second target table, and obtaining a search path of the source file from the first source table.
In an embodiment of the present invention, the second target table storing the information to be traced is obtained from the first repository based on an Extract Transform Load (ETL), the data repository technology is used to obtain the target table from a source, and the source is the first repository.
In the embodiment of the present invention, if the information to be traced exists in the first repository, it indicates that a storage process of the information to be traced is recorded in the first repository. The source file and the first source table of the information to be traced can be directly obtained according to the second target table of the information to be traced stored in the first storage library, and the search path of the source file is obtained based on the first source table.
In this embodiment of the present invention, the location stored in the first source table is the search path of the source file.
And S9, tracing the information according to the search path.
In the embodiment of the invention, the information tracing is used for acquiring the evolution and processing content of the information to be traced in the whole life cycle.
In the embodiment of the invention, the table for storing the source file, namely the fact table, is obtained by tracing the information according to the searched path.
In another embodiment of the present invention, the search path may be further stored in a third storage library, and a Directed Acyclic Graph (DAG) is constructed based on the first storage library, the second storage library, and the third storage library, and information flow directions in an information tracing process are displayed through the directed acyclic graph.
In the embodiment of the invention, information to be traced is firstly obtained, whether the information exists in a first storage bank in a data storage system is judged, preliminary classification is carried out, if the information does not exist, the information to be traced is stored in the first storage bank in the data storage system, a source file and a source table of the information to be traced are obtained, the source file is analyzed to obtain analysis data, if the source table belongs to a fact table, the source table is stored in a second storage bank, a search path of the source file is obtained according to the source table, if the source table does not belong to the fact table, the source table of the source file is continuously searched until the fact table is searched, in addition, if the information to be traced exists in the first storage bank, the source table of the source file and the search path of the source file are directly obtained according to a target table of the information to be traced in the first storage bank, and finally information tracing is carried out according to the search path of the source file, therefore, the information flow direction of each stage in the information tracing process can be obtained, and therefore the purpose of improving the accuracy of the information tracing can be achieved.
Fig. 2 is a schematic block diagram of an information tracing apparatus based on a repository according to the present invention.
The information tracing apparatus 100 based on the repository can be installed in an electronic device. According to the implemented functions, the repository-based information tracing apparatus may include an information storage location determining module 101, a source file obtaining module 102, a fact table determining module 103, an information path searching module 104, and an information tracing module 105. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the information storage position determining module 101 is configured to obtain information to be traced and determine whether the preset first repository has the information to be traced;
a source file obtaining module 102, configured to, if the information to be traced does not exist in the first storage library, store the information to be traced into the first storage library, use a table in the first storage library, where the information to be traced is stored, as a first target table, and obtain a source file corresponding to the information to be traced and a first source table in which the source file is stored;
a fact table determining module 103, configured to search a higher-level file of the source file step by step if the first source table does not belong to a fact table until a second source table of the searched higher-level file is the fact table, obtain a search path of the source file according to the second source table of the higher-level file, store the first source table in a preset second storage repository if the first source table belongs to the fact table, and obtain the search path of the source file according to the first source table;
an information path search module 104, configured to, if the information to be traced exists in the first repository, obtain a second target table storing the information to be traced from the first repository, obtain a source file of the information to be traced and a first source table storing the source file according to the second target table, and obtain a search path of the source file from the first source table;
and the information tracing module 105 is configured to trace the information according to the search path.
In detail, when the modules in the information tracing apparatus 100 based on a repository according to the embodiment of the present invention are used, the same technical means as the information tracing method based on a repository described in fig. 1 is adopted, and the same technical effect can be produced, which is not described herein again.
Fig. 3 is a schematic structural diagram of an electronic device for implementing a repository-based information tracing method according to the present invention.
The electronic device may include a processor 10, a memory 11, a communication bus 12, and a communication interface 13, and may further include a computer program, such as a repository-based information tracing program, stored in the memory 11 and executable on the processor 10.
In some embodiments, the processor 10 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), a microprocessor, a digital Processing chip, a graphics processor, a combination of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules stored in the memory 11 (for example, executing a repository-based information tracing program, etc.) and calling data stored in the memory 11.
The memory 11 includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used to store not only application software installed in the electronic device and various data, such as codes of a repository-based information tracing program, but also temporarily store data that has been output or is to be output.
The communication bus 12 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
The communication interface 13 is used for communication between the electronic device and other devices, and includes a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Fig. 3 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 3 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The repository-based information tracing program stored in the memory 11 of the electronic device is a combination of a plurality of computer programs, and when running in the processor 10, can implement:
obtaining information to be traced and judging whether the preset first storage library has the information to be traced;
if the information to be traced does not exist in the first storage library, storing the information to be traced into the first storage library and taking a table in the first storage library, which stores the information to be traced, as a first target table;
obtaining a source file corresponding to the information to be traced and a first source table for storing the source file;
if the first source table does not belong to the fact table, searching the superior file of the source file step by step until the second source table of the found superior file is the fact table, and acquiring the search path of the source file according to the second source table of the superior file;
if the source table belongs to a first fact table, storing the first source table into a preset second storage library, and acquiring a search path of the source file according to the first source table;
if the information to be traced exists in the first storage bank, acquiring a second target table for storing the information to be traced from the first storage bank, acquiring a source file of the information to be traced and a source table for storing the first source file according to the second target table, and acquiring a search path of the source file from the first source table;
and tracing the information according to the search path.
Specifically, the processor 10 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer program, which is not described herein again.
Further, the electronic device integrated module/unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a non-volatile computer-readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
obtaining information to be traced and judging whether the preset first storage library has the information to be traced;
if the information to be traced does not exist in the first storage library, storing the information to be traced into the first storage library and taking a table in the first storage library, which stores the information to be traced, as a first target table;
obtaining a source file corresponding to the information to be traced and a first source table for storing the source file;
if the first source table does not belong to the fact table, searching the superior file of the source file step by step until the second source table of the found superior file is the fact table, and acquiring the search path of the source file according to the second source table of the superior file;
if the first source table belongs to the fact table, storing the first source table into a preset second storage library, and acquiring a search path of the source file according to the first source table;
if the information to be traced exists in the first storage bank, acquiring a second target table for storing the information to be traced from the first storage bank, acquiring a source file of the information to be traced and a first source table for storing the source file according to the second target table, and acquiring a search path of the source file from the first source table;
and tracing the information according to the search path.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A repository-based information tracing method, the method comprising:
obtaining information to be traced and judging whether the preset first storage library has the information to be traced;
if the information to be traced does not exist in the first storage library, storing the information to be traced into the first storage library and taking a table in the first storage library, which stores the information to be traced, as a first target table;
obtaining a source file corresponding to the information to be traced and a first source table for storing the source file;
if the first source table does not belong to the fact table, searching the superior file of the source file step by step until the second source table of the found superior file is the fact table, and acquiring the search path of the source file according to the second source table of the superior file;
if the first source table belongs to the fact table, storing the first source table into a preset second storage library, and acquiring a search path of the source file according to the first source table;
if the information to be traced exists in the first storage bank, acquiring a second target table for storing the information to be traced from the first storage bank, acquiring a source file of the information to be traced and a first source table for storing the source file according to the second target table, and acquiring a search path of the source file from the first source table;
and tracing the information according to the search path.
2. The repository-based information tracing method according to claim 1, wherein said determining whether the information to be traced exists in the preset first repository comprises:
if the information to be traced contains non-text information, converting the non-text information in the information to be traced into text information;
performing text word segmentation on the converted information to be traced according to a preset word segmentation algorithm to obtain word segmentation text information;
extracting keywords of the word segmentation text information through a preset keyword extraction model to obtain text keywords;
and judging whether the information to be traced exists in the first storage bank or not by searching whether the text key exists in each data table in the first storage bank or not.
3. The information tracing method based on repository according to claim 2, wherein the performing text segmentation on the converted information to be traced according to a preset segmentation algorithm to obtain segmented text information comprises:
performing text word segmentation on the converted information to be traced through various different word segmentation algorithms to obtain multiple groups of word segmentation information;
and calculating the minimum change rate of each group of word segmentation information, and selecting target word segmentation information from the groups of word segmentation information as word segmentation text information according to the minimum change rate.
4. The repository-based information tracing method according to any one of claims 1 to 3, wherein the obtaining a source file corresponding to the information to be traced and a first source table storing the source file comprises:
creating an information view by using a structured query language according to the information to be traced;
inquiring all data tables on which the information view depends in a preset data dictionary, and acquiring a source file of the information to be traced based on preset log information;
and taking the data table of the source file existing in all the data tables depended by the information view as a first source table of the source file.
5. The repository-based information tracing method according to any one of claims 1 to 3, wherein before progressively searching for an upper level file of the source file if the first source table does not belong to a fact table, the method further comprises:
analyzing the source file to obtain analysis data, and judging whether the analysis data are all digital data;
if the analysis data are all digital data types, determining that the first source table is a fact table;
and if the analysis data are not all digital data types, determining that the first source table is not a fact table.
6. The repository-based information tracing method according to claim 2, wherein said searching whether the text keyword exists in each data table in the first repository to determine whether the information to be traced exists in the first repository comprises:
vectorizing the text keywords and the text information in the first storage library to obtain a text keyword vector and a text information vector;
calculating the similarity between the text keyword vector and the text information vector through a text similarity algorithm;
if the similarity between the text keyword vector and the text information vector is greater than or equal to a preset threshold value, determining that the information to be traced exists in the first storage library;
and if the similarity between the text keyword vector and the text information vector is smaller than the preset threshold, determining that the information to be traced does not exist in the first storage library.
7. The repository-based information tracing method according to claim 3, wherein the minimum rate of change of each set of the participle information is calculated by the following formula:
Figure FDA0003384330570000031
where σ is the minimum rate of change, N is the number of segments of the participle information, μ is the segment average word number of the participle information, xiThe number of words of each segment in the word segmentation information.
8. An information tracing apparatus based on a repository, the apparatus comprising:
the information storage position judging module is used for acquiring information to be traced and judging whether the preset first storage library has the information to be traced;
a source file obtaining module, configured to, if the information to be traced does not exist in the first storage repository, store the information to be traced into the first storage repository, use a table in the first storage repository, where the information to be traced is stored, as a first target table, and obtain a source file corresponding to the information to be traced and a first source table in which the source file is stored;
a fact table judging module, configured to search a higher-level file of the source file step by step if the first source table does not belong to a fact table until a second source table of the searched higher-level file is the fact table, obtain a search path of the source file according to the second source table of the higher-level file, store the first source table in a preset second storage repository if the first source table belongs to the fact table, and obtain the search path of the source file according to the first source table;
an information path searching module, configured to, if the information to be traced exists in the first repository, obtain a second target table storing the information to be traced from the first repository, obtain a source file of the information to be traced and a first source table storing the source file according to the second target table, and obtain a search path of the source file from the first source table;
and the information tracing module is used for tracing the information according to the search path.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the repository-based information tracing method according to any one of claims 1 to 7.
10. A computer-readable storage medium comprising a storage data area storing created data and a storage program area storing a computer program; wherein the computer program when executed by a processor implements the repository-based information tracing method according to any of claims 1 to 7.
CN202111446604.5A 2021-11-30 2021-11-30 Information tracing method and device based on storage library, electronic equipment and medium Active CN114138784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111446604.5A CN114138784B (en) 2021-11-30 2021-11-30 Information tracing method and device based on storage library, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111446604.5A CN114138784B (en) 2021-11-30 2021-11-30 Information tracing method and device based on storage library, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN114138784A true CN114138784A (en) 2022-03-04
CN114138784B CN114138784B (en) 2024-07-02

Family

ID=80386173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111446604.5A Active CN114138784B (en) 2021-11-30 2021-11-30 Information tracing method and device based on storage library, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN114138784B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115965388A (en) * 2022-12-30 2023-04-14 国网数字科技控股有限公司 Industrial chain financial confidential traceability method and device based on block chain and related equipment
CN116450710A (en) * 2023-06-15 2023-07-18 南京哈卢信息科技有限公司 Data analysis tracing method and system based on big data
CN116468032A (en) * 2023-03-07 2023-07-21 北京智慧星光信息技术有限公司 Information tracing method, device and equipment based on self-media information

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319710A (en) * 2018-02-07 2018-07-24 何世容 Agricultural product based on Internet of Things are traced to the source information storage means, device and storage medium
CN110457430A (en) * 2019-07-02 2019-11-15 北京瑞卓喜投科技发展有限公司 A kind of Traceability detection method of text, device and equipment
CN111933241A (en) * 2020-08-31 2020-11-13 平安国际智慧城市科技股份有限公司 Medical data analysis method, medical data analysis device, electronic device, and storage medium
CN112363814A (en) * 2020-11-20 2021-02-12 中国平安财产保险股份有限公司 Task scheduling method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319710A (en) * 2018-02-07 2018-07-24 何世容 Agricultural product based on Internet of Things are traced to the source information storage means, device and storage medium
CN110457430A (en) * 2019-07-02 2019-11-15 北京瑞卓喜投科技发展有限公司 A kind of Traceability detection method of text, device and equipment
CN111933241A (en) * 2020-08-31 2020-11-13 平安国际智慧城市科技股份有限公司 Medical data analysis method, medical data analysis device, electronic device, and storage medium
CN112363814A (en) * 2020-11-20 2021-02-12 中国平安财产保险股份有限公司 Task scheduling method and device, computer equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115965388A (en) * 2022-12-30 2023-04-14 国网数字科技控股有限公司 Industrial chain financial confidential traceability method and device based on block chain and related equipment
CN115965388B (en) * 2022-12-30 2023-12-22 国网数字科技控股有限公司 Block chain-based industrial chain financial secret state tracing method, device and related equipment
CN116468032A (en) * 2023-03-07 2023-07-21 北京智慧星光信息技术有限公司 Information tracing method, device and equipment based on self-media information
CN116468032B (en) * 2023-03-07 2024-04-16 北京智慧星光信息技术股份有限公司 Information tracing method, device and equipment based on self-media information
CN116450710A (en) * 2023-06-15 2023-07-18 南京哈卢信息科技有限公司 Data analysis tracing method and system based on big data
CN116450710B (en) * 2023-06-15 2023-09-26 南京哈卢信息科技有限公司 Data analysis tracing method and system based on big data

Also Published As

Publication number Publication date
CN114138784B (en) 2024-07-02

Similar Documents

Publication Publication Date Title
CN114138784B (en) Information tracing method and device based on storage library, electronic equipment and medium
CN112541338A (en) Similar text matching method and device, electronic equipment and computer storage medium
CN113449187A (en) Product recommendation method, device and equipment based on double portraits and storage medium
CN113268615A (en) Resource label generation method and device, electronic equipment and storage medium
CN113886708A (en) Product recommendation method, device, equipment and storage medium based on user information
CN114416939A (en) Intelligent question and answer method, device, equipment and storage medium
CN112231417A (en) Data classification method and device, electronic equipment and storage medium
CN113360803A (en) Data caching method, device and equipment based on user behavior and storage medium
CN114398557A (en) Information recommendation method and device based on double portraits, electronic equipment and storage medium
CN115018588A (en) Product recommendation method and device, electronic equipment and readable storage medium
CN112632264A (en) Intelligent question and answer method and device, electronic equipment and storage medium
CN114708461A (en) Multi-modal learning model-based classification method, device, equipment and storage medium
CN115238670A (en) Information text extraction method, device, equipment and storage medium
CN111651625A (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
CN113806492A (en) Record generation method, device and equipment based on semantic recognition and storage medium
CN113434542A (en) Data relation identification method and device, electronic equipment and storage medium
CN112579781A (en) Text classification method and device, electronic equipment and medium
CN115409041B (en) Unstructured data extraction method, device, equipment and storage medium
CN114969385B (en) Knowledge graph optimization method and device based on document attribute assignment entity weight
CN115525761A (en) Method, device, equipment and storage medium for article keyword screening category
CN114385815A (en) News screening method, device, equipment and storage medium based on business requirements
CN115186188A (en) Product recommendation method, device and equipment based on behavior analysis and storage medium
CN115438048A (en) Table searching method, device, equipment and storage medium
CN115062023A (en) Wide table optimization method and device, electronic equipment and computer readable storage medium
CN114518993A (en) System performance monitoring method, device, equipment and medium based on business characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant