CN113254454A - Data extraction method and device, computer equipment and storage medium - Google Patents

Data extraction method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN113254454A
CN113254454A CN202110699900.XA CN202110699900A CN113254454A CN 113254454 A CN113254454 A CN 113254454A CN 202110699900 A CN202110699900 A CN 202110699900A CN 113254454 A CN113254454 A CN 113254454A
Authority
CN
China
Prior art keywords
data
dependency
relationship
identifier
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110699900.XA
Other languages
Chinese (zh)
Inventor
李晴阳
姬宁
李柏润
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN202110699900.XA priority Critical patent/CN113254454A/en
Publication of CN113254454A publication Critical patent/CN113254454A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The present disclosure proposes a data extraction method, apparatus, computer device, and storage medium, the method comprising: determining a first identification of the first data; determining a data dependency corresponding to the first data according to the first identifier, wherein the data dependency comprises: the first identification and a second identification of second data which has a dependency relationship with the first data; generating a target level relation according to the data dependency relation, wherein the target level relation describes a storage relation between the first data and the second data; extracting first data according to the first identification and extracting second data according to the second identification; and storing the first data and the second data according to the target hierarchical relationship. Through the data extraction method and device, the complexity of dependency description among data can be effectively reduced in the data extraction process, the data extraction efficiency is effectively improved in an auxiliary mode, the integrity and accuracy of data extraction are guaranteed, and the data extraction effect is improved.

Description

Data extraction method and device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data extraction method and apparatus, a computer device, and a storage medium.
Background
In engineering practice, a large amount of business data extraction and replication are usually involved, for example, data a in a business scenario is extracted and replicated in a batch to B business scenario, and in the data extraction and replication process, a manner of writing computer program code is usually adopted, and extraction and replication processing logic of all data in a business scenario is written into the computer program code, so that the computer program code may include: a series of program code segments related to processing logic, such as data relation of all data, data extraction and data recovery, and customization operation.
In this way, the description of the data relationship between data in the service scene is complex, which causes the data relationship to be difficult to change and difficult to maintain, and the data extraction cost is high, thereby affecting the efficiency and accuracy of data extraction.
Disclosure of Invention
The present disclosure is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the present disclosure aims to provide a data extraction method, an apparatus, a computer device, and a storage medium, which can effectively reduce the complexity of dependency description between data in a data extraction process, effectively assist in improving data extraction efficiency, ensure the integrity and accuracy of data extraction, and improve data extraction effect.
In order to achieve the above object, an embodiment of the first aspect of the present disclosure provides a data extraction method, including: determining a first identification of the first data; determining a data dependency corresponding to the first data according to the first identifier, wherein the data dependency comprises: the first identification and a second identification of second data which has a dependency relationship with the first data; generating a target hierarchical relationship according to the data dependency relationship, wherein the target hierarchical relationship describes a storage relationship between the first data and the second data; extracting the first data according to the first identification and extracting the second data according to the second identification; and storing the first data and the second data according to the target hierarchical relationship.
In the data extraction method provided in an embodiment of the first aspect of the present disclosure, by determining a first identifier of first data, and according to the first identifier, determining a data dependency corresponding to the first data, where the data dependency includes: the method comprises the steps of generating a target hierarchical relationship according to the data dependency relationship, describing a storage relationship between first data and second data by the target hierarchical relationship, extracting the first data according to the first identifier, extracting the second data according to the second identifier, and storing the first data and the second data according to the target hierarchical relationship, so that in the data extraction process, the complexity of dependency relationship description between the data is effectively reduced, the data extraction efficiency is effectively improved in an auxiliary mode, the integrity and accuracy of data extraction are guaranteed, and the data extraction effect is improved.
In order to achieve the above object, an embodiment of a second aspect of the present disclosure provides a data extraction apparatus, including: a first determining module, configured to determine a first identifier of first data; a second determining module, configured to determine, according to the first identifier, a data dependency corresponding to the first data, where the data dependency includes: the first identification and a second identification of second data which has a dependency relationship with the first data; the generating module is used for generating a target hierarchical relationship according to the data dependency relationship, wherein the target hierarchical relationship describes a storage relationship between the first data and the second data; the extraction module is used for extracting the first data according to the first identifier and extracting the second data according to the second identifier; and the storage module is used for storing the first data and the second data according to the target hierarchical relation. .
The data extraction device provided in an embodiment of the second aspect of the present disclosure determines, by determining a first identifier of first data, and according to the first identifier, a data dependency corresponding to the first data, where the data dependency includes: the method comprises the steps of generating a target hierarchical relationship according to the data dependency relationship, describing a storage relationship between first data and second data by the target hierarchical relationship, extracting the first data according to the first identifier, extracting the second data according to the second identifier, and storing the first data and the second data according to the target hierarchical relationship, so that in the data extraction process, the complexity of dependency relationship description between the data is effectively reduced, the data extraction efficiency is effectively improved in an auxiliary mode, the integrity and accuracy of data extraction are guaranteed, and the data extraction effect is improved.
An embodiment of a third aspect of the present disclosure provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the data extraction method as set forth in the embodiment of the first aspect of the present disclosure is implemented.
A fourth aspect of the present disclosure provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data extraction method as set forth in the first aspect of the present disclosure.
An embodiment of a fifth aspect of the present disclosure provides a computer program product, which when executed by an instruction processor in the computer program product performs the data extraction method as set forth in the embodiment of the first aspect of the present disclosure.
Additional aspects and advantages of the disclosure will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the disclosure.
Drawings
The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flow chart of a data extraction method according to an embodiment of the disclosure;
FIG. 2 is a schematic diagram of data dependencies between data sources in an embodiment of the disclosure;
FIG. 3 is a schematic diagram of data dependencies within a data source in an embodiment of the disclosure;
FIG. 4 is a schematic diagram of a data extraction flow in an embodiment of the disclosure;
FIG. 5 is a schematic structural diagram of a data extraction device in an embodiment of the present disclosure;
fig. 6 is a schematic flow chart of a data extraction method according to another embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a data extraction apparatus according to an embodiment of the disclosure;
fig. 8 is a schematic structural diagram of a data extraction apparatus according to another embodiment of the present disclosure;
FIG. 9 illustrates a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Reference will now be made in detail to the embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of illustrating the present disclosure and should not be construed as limiting the same. On the contrary, the embodiments of the disclosure include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Fig. 1 is a schematic flow chart of a data extraction method according to an embodiment of the present disclosure.
It should be noted that the main execution body of the data extraction method of this embodiment is a data extraction device, the device may be implemented by software and/or hardware, the device may be configured in an electronic device, and the electronic device may include, but is not limited to, a terminal, a server, and the like.
The method and the device can solve the technical problem existing in the data extraction process in the process of extracting and copying the data of the business module, and in the application scene of extracting the data of the business module, in order to realize the purpose of completely copying the data to various environments (such as another business in the environment to which the business belongs, other online environments and other company local privatization environments), the data can be exported and stored to a local and/or cloud server.
As shown in fig. 1, the data extraction method includes:
s101: a first identification of the first data is determined.
The data to be extracted at present may be referred to as first data, where the first data may be a field in one data table, or may also be all data in one data table, and the first data may also be data in the service module a of the service scenario, and when the data extraction task is executed, the first data may be extracted into a local storage medium of the data extraction device, which is not limited to this.
The identifier used to describe the first data may be referred to as a first identifier, and the first identifier may be, for example, an Identity Document (ID) number of the first data in the data table, which is not limited herein.
For example, a data extraction request may be received, a first identifier of first data may be parsed from the data extraction request, and then a subsequent step may be triggered.
S102: determining a data dependency corresponding to the first data according to the first identifier, wherein the data dependency comprises: the first identification, and the second identification of the second data that has a dependency relationship with the first data.
After the first identifier of the first data is determined, a data dependency corresponding to the first data may be determined according to the first identifier, where the data dependency may be used to describe an association between the first data and other data, and the other data may be, for example, data belonging to a data table with the first data, or may also be data belonging to a data source with the first data, where the data source may be understood as a data storage tool, and different data storage tools correspond to different data source information.
The data described in the data dependency relationship and having the data dependency relationship with the first data may be referred to as second data, and is used to describe an identifier of the second data, which may be, for example, an Identity Document (ID) number of the second data in the data table, without limitation.
As shown in fig. 2, fig. 2 is a schematic diagram of data dependency relationships among data sources in an embodiment of the present disclosure, where the schematic diagram includes a data source a, a data source B, and a data source C (the data source C may specifically be a document type data source C), where a data extraction request may be received via a data extraction entry, a first identifier (identifier of field 2 in data table B in data source a) of first data (field 2 in data table B in data source a) may be parsed from the data extraction request, and then second data may be determined based on the data dependency relationships, and an arrow in fig. 2 shows that the second data has a data dependency relationship, and the second data may be, for example, field 1 in data table a in data source B, and field 1 in document a in data source C, which is not limited.
In fig. 2, data may be generally persisted to different data sources (data source types corresponding to different data sources may include, for example, relational storage, document storage, object storage, middleware storage, and the like), each data source has different internal dependency manners, and relatively complex data dependency relationships may also exist between different data sources, and the data dependency relationships may be generally configured in a configuration file of a business module, and the configuration file may be configured in a development and production stage of the business module, which is not limited thereto.
Thus, the data dependency corresponding to the first data is determined to be the data dependency between the first data (field 2 in data table B in data source a) and the second data (field 1 in data table a in data source B, field 1 in document a in data source C).
As shown in fig. 3, fig. 3 is a schematic diagram of a data dependency relationship in a data source in the embodiment of the present disclosure, fig. 3 is an exemplary description of a data dependency relationship of a relational database, where each data dependency relationship has many different data records and some additional data.
For example, when determining the data dependency corresponding to the first data according to the first identifier, a configuration file corresponding to the business module may be read, where the configuration file may be configured in a development and production stage of the business module, and generally, the configuration file may be, for example, a call dependency between a plurality of data of the business module, or any other form of data dependency, so that the data dependency corresponding to the first data may be analyzed by combining the configuration file according to the first identifier of the first data.
S103: and generating a target hierarchical relationship according to the data dependency relationship, wherein the target hierarchical relationship describes the storage relationship between the first data and the second data.
After determining the data dependency corresponding to the first data according to the first identifier, a target hierarchical relationship may be generated according to the data dependency, where the target hierarchical relationship describes a storage relationship between the first data and the second data.
The storage relationship can be used as a reference when the first data and the second data are extracted and stored in a local and/or cloud server, so that the storage logic between the extracted first data and the extracted second data has a hierarchical relationship, and the subsequent replication processing logic of the whole data is facilitated.
After the target hierarchical relationship is generated according to the data dependency relationship, not only can the storage logic between the extracted first data and the extracted second data have the hierarchical relationship, but also the second data having the dependency relationship with the first data can be extracted by referring to the data dependency relationship in the actual data extraction process, so that the efficiency of data extraction can be effectively guaranteed.
S104: and extracting the first data according to the first identification and extracting the second data according to the second identification.
The first data and the second data may be extracted with reference to the data dependency relationship after the target hierarchical relationship is generated according to the data dependency relationship, and the target hierarchical relationship describes the storage relationship between the first data and the second data.
Alternatively, first data source information corresponding to the first identifier may be determined, so that data corresponding to the first identifier may be extracted from among the first data sources indicated by the first data source information and used as the first data.
Optionally, the data source information includes: the type of the data source and/or the storage location information corresponding to the data source.
The first data source information may be a data source type of a data source (first data source) to which the first data belongs and/or storage location information corresponding to the data source.
Therefore, the first data can be extracted from the storage position indicated by the storage position information corresponding to the first data source based on the data extraction mode adaptive to the data source type of the first data source, and the first data source information corresponding to the first data is referred to when the first data is extracted, so that the data extraction efficiency can be effectively improved, and the accuracy of the first data extraction can be guaranteed.
Optionally, the data dependency further includes: the second data source information corresponding to the second identifier, where the first data is extracted according to the first identifier and the second data is extracted according to the second identifier, may be that, while the first data is extracted according to the first identifier, the data corresponding to the second identifier is extracted from a second data source indicated by the second data source information and is used as the second data.
Optionally, the data source information includes: the type of the data source and/or the storage location information corresponding to the data source.
The second data source information may be a data source type of a data source (second data source) to which the second data belongs and/or storage location information corresponding to the data source.
Therefore, the second data can be extracted from the storage position indicated by the storage position information corresponding to the second data source based on the data extraction mode adaptive to the data source type of the second data source, the second data source information corresponding to the second data is referred to when the second data is extracted, so that the extraction efficiency of the second data can be effectively improved, the extraction accuracy of the second data can be ensured, the second data is extracted while the first data is extracted according to the first identifier, the data with the data dependency relationship is extracted and stored simultaneously, the extraction synchronization of the data with the data dependency relationship is effectively improved in an auxiliary mode, and the data extraction efficiency is improved to a greater extent.
In the embodiment of the disclosure, a data source adapter may be further preconfigured for different data sources, and the data source adapter may be used to encapsulate data of a corresponding data source, so that when the first data and the second data are extracted, the corresponding data source adapter is called to assist in data extraction, and difficulty in data extraction and recovery can be simplified to a greater extent.
In the embodiment of the disclosure, the first data source information and the second data source information are the same or different, when the first data source information and the second data source information are the same, it indicates that the first data and the second data belong to the same data source, the data dependency between the first data and the second data is a data dependency within the data source, and when the first data source information and the second data source information are different, it indicates that the first data and the second data belong to different data sources, the data dependency between the first data and the second data is a data dependency between different data sources, so in the embodiment of the disclosure, efficient extraction of data having a data dependency before different data sources can be simultaneously supported, and also efficient extraction of data having a data dependency within the same data source can be supported, so that the data extraction manner is flexible, can be effectively adapted to different service scene requirements.
S105: and storing the first data and the second data according to the target hierarchical relation.
After the first data is extracted according to the first identifier and the second data is extracted according to the second identifier, the first data and the second data can be stored according to the target hierarchical relationship.
That is to say, in the embodiment of the present disclosure, it may be supported to refer to a data dependency relationship and a data source adapter, read data in sequence into a memory, and retain an original dependency relationship value (the retained dependency relationship value may be embodied as a target hierarchical relationship, that is, when storing first data and second data, reference is made to a target hierarchical relationship generated according to the data dependency relationship for storage), where the target hierarchical relationship may be used as a reference when recovering the first data and the second data, and then, the extracted first data and second data may be exported and persisted to a local file/cloud file by using the data source adapter.
As shown in fig. 4, fig. 4 is a schematic diagram of a data extraction flow in the embodiment of the present disclosure, when a data extraction task is executed, a configuration file may be read to obtain a data dependency from the data extraction task, then, a target hierarchical relation is created according to the data dependency relation, corresponding data records are read from each data source adapter according to the data dependency relation, extracted data are stored in a memory according to the target hierarchical relation, then, the data in the memory is stored in the local file/cloud file in a persistent mode, data dependency relationship, configuration files, target hierarchy relationship and the like can be stored for data recovery so as to complete data extraction tasks, all the data can be extracted into the memory completely, the data recovery service can be selected to be created in the present environment immediately, and can also be exported to a file for cross-platform recovery, which is not limited to this.
In this embodiment, by determining the first identifier of the first data, according to the first identifier, a data dependency corresponding to the first data is determined, where the data dependency includes: the method comprises the steps of generating a target hierarchical relationship according to the data dependency relationship, describing a storage relationship between first data and second data by the target hierarchical relationship, extracting the first data according to the first identifier, extracting the second data according to the second identifier, and storing the first data and the second data according to the target hierarchical relationship, so that in the data extraction process, the complexity of dependency relationship description between the data is effectively reduced, the data extraction efficiency is effectively improved in an auxiliary mode, the integrity and accuracy of data extraction are guaranteed, and the data extraction effect is improved.
For the data extraction method described in the embodiment of the present disclosure, an architecture schematic diagram of a data extraction device is also provided, as shown in fig. 5, fig. 5 is a structural schematic diagram of the data extraction device in the embodiment of the present disclosure.
Fig. 6 is a schematic flow chart of a data extraction method according to another embodiment of the present disclosure.
As shown in fig. 6, the data extraction method includes:
s601: a first identification of the first data is determined.
S602: determining a data dependency corresponding to the first data according to the first identifier, wherein the data dependency comprises: the first identification, and the second identification of the second data that has a dependency relationship with the first data.
For description of S601-S602, reference may be made to the above embodiments, which are not described herein again.
S603: a type of dependency between the first identity and the second identity is determined.
S604: and generating a target hierarchical relationship according to the dependency relationship type.
The type of data dependency between the first data and the second data may be referred to as a dependency type, and the dependency type may be, for example, a dependency of a field type, and/or a dependency of a data table type, and/or a dependency of a data source type.
The dependency relationship of the field type may indicate that the first data and the second data belong to the same data source and belong to the same data table in the data source, and the dependency relationship of the field type exists between the first data and the second data in the data table.
The dependency relationship of the data table types may indicate that the first data and the second data belong to the same data source but belong to different data tables in the data source, and if the first data belongs to the data table a and the second data belongs to the data table B, the first data and the second data have a dependency relationship of the data table types.
The dependency relationship of the data source types may indicate that the first data and the second data belong to different data sources, and if the first data belong to the data source a and the second data belong to the data source B, the first data and the second data have the dependency relationship of the data source types.
Therefore, in the embodiment of the disclosure, the target hierarchical relationship can be generated according to the dependency relationship type, and describes the storage relationship between the first data and the second data, so that the extracted first data and the extracted second data can be stored with reference to the data dependency relationship type, the storage relationship can effectively represent the data dependency relationship, and the organization form of the extracted data is close to the form of the data in the corresponding data source, thereby effectively facilitating the recovery of the subsequent data.
Optionally, in some embodiments, the dependency type includes: the dependency relationship of the field type, and/or the dependency relationship of the data table type, and/or the dependency relationship of the data source type, that is, the dependency relationship type may be any one of the above three types, or may also be a combination of two or three types.
Generating a target hierarchical relationship according to the dependency relationship type, wherein the generating of the target hierarchical relationship according to the dependency relationship type comprises:
generating a storage relation of a field hierarchy according to the dependency relation of the field types; and/or
Generating a storage relation of a data table hierarchy according to the dependency relation of the data table types; and/or
And generating a storage relation of a data source hierarchy according to the dependency relation of the data source type, wherein the storage relation of a field hierarchy and/or the storage relation of a data table hierarchy and/or the storage relation of the data source hierarchy is used as a target hierarchy relation.
The field-level storage relationship can embody a field-type dependency relationship between data, the data-table-level storage relationship can embody a data-table-type dependency relationship between data, the data-source-level storage relationship can embody a data-source-type dependency relationship between data, and the explanation of the field-type dependency relationship, the data-table-type dependency relationship, and/or the data-source-type dependency relationship may refer to the above example, and correspondingly, the data-dependency-type storage relationship may be characterized, or may refer to the above example together, without limitation.
Therefore, the data dependency relationship obtained by analyzing the reference configuration file can be embodied in the storage logic, the target hierarchical relationship can be any one of the storage relationship of the field hierarchy, the storage relationship of the data table hierarchy and the storage relationship of the data source hierarchy, or can be a combination form of the two or three, and the data dependency relationship among the extracted data can be accurately represented.
S605: and extracting the first data according to the first identification and extracting the second data according to the second identification.
S606: and storing the first data and the second data according to the target hierarchical relation.
For the description of S605-S606, reference may be made to the above embodiments, which are not described herein again.
In this embodiment, by determining the first identifier of the first data, according to the first identifier, a data dependency corresponding to the first data is determined, where the data dependency includes: the method comprises the steps of generating a target hierarchical relationship according to the data dependency relationship, describing a storage relationship between first data and second data by the target hierarchical relationship, extracting the first data according to the first identifier, extracting the second data according to the second identifier, and storing the first data and the second data according to the target hierarchical relationship, so that in the data extraction process, the complexity of dependency relationship description between the data is effectively reduced, the data extraction efficiency is effectively improved in an auxiliary mode, the integrity and accuracy of data extraction are guaranteed, and the data extraction effect is improved.
Fig. 7 is a schematic structural diagram of a data extraction apparatus according to an embodiment of the disclosure.
As shown in fig. 7, the data extracting apparatus 70 includes:
a first determining module 701, configured to determine a first identifier of first data;
a second determining module 702, configured to determine, according to the first identifier, a data dependency corresponding to the first data, where the data dependency includes: the first identification and a second identification of second data which has a dependency relationship with the first data;
a generating module 703, configured to generate a target hierarchical relationship according to the data dependency relationship, where the target hierarchical relationship describes a storage relationship between the first data and the second data;
an extracting module 704, configured to extract first data according to the first identifier and extract second data according to the second identifier; and
the storage module 705 is configured to store the first data and the second data according to the target hierarchical relationship.
In some embodiments of the present disclosure, as shown in fig. 8, further comprising:
a third determining module 706, configured to determine, after determining the first identifier of the first data, first data source information corresponding to the first identifier;
wherein, the extracting module 704 is specifically configured to:
and extracting data corresponding to the first identifier from the first data source indicated by the first data source information to serve as the first data.
In some embodiments of the present disclosure, the data dependency further comprises: second data source information corresponding to the second identifier, wherein the extracting module 704 is specifically configured to:
and extracting data corresponding to the second identifier from a second data source indicated by the second data source information while extracting the first data according to the first identifier and taking the data as second data.
In some embodiments of the present disclosure, the generating module 703 comprises:
a determining submodule 7031 configured to determine a type of dependency relationship between the first identifier and the second identifier;
and the generating submodule 7032 is configured to generate a target hierarchical relationship according to the dependency relationship type.
In some embodiments of the present disclosure, the dependency types include: dependency of field type, and/or dependency of data table type, and/or dependency of data source type, wherein the generating submodule 7032 is specifically configured to:
generating a storage relation of a field hierarchy according to the dependency relation of the field types; and/or
Generating a storage relation of a data table hierarchy according to the dependency relation of the data table types; and/or
And generating a storage relation of a data source hierarchy according to the dependency relation of the data source type, wherein the storage relation of a field hierarchy and/or the storage relation of a data table hierarchy and/or the storage relation of the data source hierarchy is used as a target hierarchy relation.
In some embodiments of the present disclosure, the data source information comprises: the type of the data source and/or the storage location information corresponding to the data source.
In some embodiments of the present disclosure, the first data source information and the second data source information are the same or different.
Corresponding to the data extraction method provided in the embodiments of fig. 1 to 6, the present disclosure also provides a data extraction device, and since the data extraction device provided in the embodiments of the present disclosure corresponds to the data extraction method provided in the embodiments of fig. 1 to 6, the implementation manner of the data extraction method is also applicable to the data extraction device provided in the embodiments of the present disclosure, and is not described in detail in the embodiments of the present disclosure.
In this embodiment, by determining the first identifier of the first data, according to the first identifier, a data dependency corresponding to the first data is determined, where the data dependency includes: the method comprises the steps of generating a target hierarchical relationship according to the data dependency relationship, describing a storage relationship between first data and second data by the target hierarchical relationship, extracting the first data according to the first identifier, extracting the second data according to the second identifier, and storing the first data and the second data according to the target hierarchical relationship, so that in the data extraction process, the complexity of dependency relationship description between the data is effectively reduced, the data extraction efficiency is effectively improved in an auxiliary mode, the integrity and accuracy of data extraction are guaranteed, and the data extraction effect is improved.
In order to implement the foregoing embodiments, the present disclosure also provides a computer device, including: the data extraction method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the data extraction method is realized as the data extraction method provided by the previous embodiment of the disclosure.
In order to achieve the above embodiments, the present disclosure also proposes a non-transitory computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the data extraction method as proposed by the aforementioned embodiments of the present disclosure.
In order to implement the foregoing embodiments, the present disclosure also provides a computer program product, which when executed by an instruction processor in the computer program product, performs the data extraction method as set forth in the foregoing embodiments of the present disclosure.
FIG. 9 illustrates a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present disclosure. The computer device 12 shown in fig. 9 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.
As shown in FIG. 9, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. These architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, to name a few.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 9, and commonly referred to as a "hard drive").
Although not shown in FIG. 9, a disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk Read Only Memory (CD-ROM), a Digital versatile disk Read Only Memory (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described in this disclosure.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network such as the Internet) via Network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing, for example, implementing the data extraction method mentioned in the foregoing embodiments, by executing a program stored in the system memory 28.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
It should be noted that, in the description of the present disclosure, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present disclosure, "a plurality" means two or more unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present disclosure includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present disclosure.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present disclosure have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present disclosure, and that changes, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present disclosure.

Claims (10)

1. A method of data extraction, the method comprising:
determining a first identification of the first data;
determining a data dependency corresponding to the first data according to the first identifier, wherein the data dependency comprises: the first identification and a second identification of second data which has a dependency relationship with the first data;
generating a target hierarchical relationship according to the data dependency relationship, wherein the target hierarchical relationship describes a storage relationship between the first data and the second data;
extracting the first data according to the first identification and extracting the second data according to the second identification; and
and storing the first data and the second data according to the target hierarchical relation.
2. The method of claim 1, wherein after the determining the first identification of the first data, further comprising:
determining first data source information corresponding to the first identification;
the extracting the first data according to the first identifier comprises:
and extracting data corresponding to the first identifier from a first data source indicated by the first data source information to serve as the first data.
3. The method of claim 2, wherein the data dependencies further comprise: second data source information corresponding to the second identifier, wherein the extracting the first data according to the first identifier and the extracting the second data according to the second identifier includes:
and extracting data corresponding to the second identifier from a second data source indicated by the second data source information while extracting the first data according to the first identifier, and taking the data as the second data.
4. The method of claim 2, wherein generating a target hierarchy relationship from the data dependencies comprises:
determining a dependency type between the first identity and the second identity;
and generating the target hierarchical relationship according to the dependency relationship type.
5. The method of claim 4, wherein the dependency type comprises: a dependency of a field type, and/or a dependency of a data table type, and/or a dependency of a data source type, wherein the generating the target hierarchical relationship according to the dependency type comprises:
generating a storage relation of a field hierarchy according to the dependency relation of the field type; and/or
Generating a storage relation of a data table hierarchy according to the dependency relation of the data table types; and/or
And generating a storage relation of a data source hierarchy according to the dependency relation of the data source type, wherein the storage relation of a field hierarchy and/or the storage relation of the data table hierarchy and/or the storage relation of the data source hierarchy is used as the target hierarchy relation.
6. The method of claim 3, wherein the data source information comprises: the type of the data source and/or the storage position information corresponding to the data source.
7. The method of claim 3, wherein the first data source information and the second data source information are the same or different.
8. A data extraction apparatus, characterized in that the apparatus comprises:
a first determining module, configured to determine a first identifier of first data;
a second determining module, configured to determine, according to the first identifier, a data dependency corresponding to the first data, where the data dependency includes: the first identification and a second identification of second data which has a dependency relationship with the first data;
the generating module is used for generating a target hierarchical relationship according to the data dependency relationship, wherein the target hierarchical relationship describes a storage relationship between the first data and the second data;
the extraction module is used for extracting the first data according to the first identifier and extracting the second data according to the second identifier; and
and the storage module is used for storing the first data and the second data according to the target hierarchical relationship.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
CN202110699900.XA 2021-06-23 2021-06-23 Data extraction method and device, computer equipment and storage medium Pending CN113254454A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110699900.XA CN113254454A (en) 2021-06-23 2021-06-23 Data extraction method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110699900.XA CN113254454A (en) 2021-06-23 2021-06-23 Data extraction method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113254454A true CN113254454A (en) 2021-08-13

Family

ID=77189345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110699900.XA Pending CN113254454A (en) 2021-06-23 2021-06-23 Data extraction method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113254454A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020174125A1 (en) * 2001-03-14 2002-11-21 Microsoft Corporation Messaging infrastructure for identity-centric data access
US20150095528A1 (en) * 2013-09-29 2015-04-02 International Business Machines Corporation Method and apparatus for storing data
US20150347936A1 (en) * 2014-05-29 2015-12-03 International Business Machines Corporation Database partition
US20200110765A1 (en) * 2018-10-04 2020-04-09 Oracle International Corporation Storing and versioning hierarchical data in a binary format
CN111563103A (en) * 2020-04-28 2020-08-21 厦门市美亚柏科信息股份有限公司 Method and system for detecting data blood margin

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020174125A1 (en) * 2001-03-14 2002-11-21 Microsoft Corporation Messaging infrastructure for identity-centric data access
US20150095528A1 (en) * 2013-09-29 2015-04-02 International Business Machines Corporation Method and apparatus for storing data
US20150347936A1 (en) * 2014-05-29 2015-12-03 International Business Machines Corporation Database partition
US20200110765A1 (en) * 2018-10-04 2020-04-09 Oracle International Corporation Storing and versioning hierarchical data in a binary format
CN111563103A (en) * 2020-04-28 2020-08-21 厦门市美亚柏科信息股份有限公司 Method and system for detecting data blood margin

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张绍华,潘蓉,宗宇伟: "大数据技术与应用 大数据治理与服务", 上海科学技术出版社, pages: 0831 *

Similar Documents

Publication Publication Date Title
US10169034B2 (en) Verification of backward compatibility of software components
US10540383B2 (en) Automatic ontology generation
CN110647579A (en) Data synchronization method and device, computer equipment and readable medium
CN111324610A (en) Data synchronization method and device
CN111198868B (en) Intelligent database-dividing real-time data migration method and device
CN109471851B (en) Data processing method, device, server and storage medium
CN109376142B (en) Data migration method and terminal equipment
CN114925084A (en) Distributed transaction processing method, system, device and readable storage medium
CN110781197B (en) Hive offline synchronous verification method and device and electronic equipment
CN115408391A (en) Database table changing method, device, equipment and storage medium
CN109347899B (en) Method for writing log data in distributed storage system
CN114461691A (en) Control method and device of state machine, electronic equipment and storage medium
US10198784B2 (en) Capturing commands in a multi-engine graphics processing unit
CN112613964A (en) Account checking method, account checking device, account checking equipment and storage medium
CN109902070B (en) WiFi log data-oriented analysis storage search method
US10997057B2 (en) Debugging asynchronous functions
CN114816772B (en) Debugging method, debugging system and computing device for application running based on compatible layer
CN113254454A (en) Data extraction method and device, computer equipment and storage medium
US20190384825A1 (en) Method and device for data protection and computer readable storage medium
CN116244387A (en) Entity relationship construction method, device, electronic equipment and storage medium
CN109948251B (en) CAD-based data processing method, device, equipment and storage medium
CN109062797B (en) Method and device for generating information
CN103713987A (en) Keyword-based log processing method
CN114064642A (en) Data processing method and device, computer equipment and storage medium
US11593325B2 (en) Systems and methods of data migration in multi-layer model-driven applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination