CN111913963B

CN111913963B - Method and system for storing interface data on demand

Info

Publication number: CN111913963B
Application number: CN202010753684.8A
Authority: CN
Inventors: 易超; 任彦民; 张舒汇; 贺赞贤
Original assignee: Beijing Shulide Technology Co ltd
Current assignee: Beijing Shulide Technology Co ltd
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2023-12-26
Anticipated expiration: 2040-07-30
Also published as: CN111913963A

Abstract

The application provides a method and a system for storing interface data according to needs, and relates to the technical field of data mining. The method aims at mining data related to actual application requirements from different data interfaces, and integrating the data mined from the different data interfaces to obtain a target data table meeting the actual application requirements of the data. First determining a plurality of source data interfaces; analyzing the plurality of source data interfaces by using a preset knowledge rule map to obtain main keys of the plurality of source data interfaces; generating at least one recommendation table schema for interface data of each of a plurality of source data interfaces; merging the data table modes with the same main key to obtain a plurality of merged recommended table modes; determining a target recommendation table mode among the plurality of recommendation table modes according to the received determination operation; and generating a structured statement for executing atomic operation on the target recommended table mode according to the received modification operation, and modifying the recommended table mode to obtain the target data table.

Description

Method and system for storing interface data on demand

Technical Field

The present application relates to the field of data mining technologies, and in particular, to a method and system for on-demand storage of interface data

Background

With the development of technologies such as cloud computing, big data, artificial intelligence and the like, it has become a consensus that data is a key asset. The key to the exertion of data value is the fusion and mining of data. Under the existing internet system, data exist in each isolated WEB application service system or data island. The data interface is used as a pipeline and a foundation for data circulation in the WEB application business system, and an effective supporting scheme is provided for data fusion and connection.

However, since the interfaces of different WEB application service systems can only be called independently, the interfaces of data with different data types can also be called independently, so how to fuse and preserve the scattered interface data fragments into complete and continuous data sets based on the data scattered data fragments acquired by different interfaces is a problem to be solved.

In the prior art, the fragment data acquired from different WEB application service systems cannot be filtered and fused uniformly to form a data warehouse meeting actual analysis requirements, so that relevant data analysis is difficult to implement, and only the original manual screening mode can be used for identifying the data, so that the efficiency is low and the manual consumption is high. And as fragmented data is acquired at the interface, continuous data can be formed by continuously preserving the data, so that the data can be fused into data meeting the actual analysis requirement, but the preserving of the data is a continuous process, and the process comprises a plurality of stages, and once the system preserving process is interrupted due to external reasons, the data is lost, and even serious data accidents are caused.

Disclosure of Invention

According to the method and the system for storing the interface data according to the requirements of the application of the data, the source data interface is analyzed by utilizing a preset knowledge rule map according to the generated recommendation table mode of the obtained interface data, the primary key of the source data interface is obtained, the recommendation table modes with the same primary key are combined, and the purpose of fusing the fragment data obtained from different source data interfaces is achieved. Meanwhile, a reserved task sequence is established according to the sequence of the main key, data is reserved according to the reserved task sequence, and corresponding logs are recorded, so that the reserved data is prevented from being lost when the system is interrupted.

A first aspect of an embodiment of the present application provides a method for storing interface data on demand, the method including:

determining a plurality of source data interfaces according to the received storage operation; analyzing the plurality of source data interfaces by using a preset knowledge rule map to obtain a main key of each source data interface in the plurality of source data interfaces; generating at least one recommended table mode according to the main key of each source data interface in the plurality of source data interfaces for the interface data of each source data interface in the plurality of source data interfaces; merging the data table modes with the same main key to obtain a plurality of merged recommended table modes; determining a target recommendation table mode among the plurality of recommendation table modes according to the received determination operation; generating a structured statement for executing an atomic operation on the target recommendation table mode according to the received modification operation; wherein the atomic operations include a delete column operation, an adjust column operation, and an add column operation; and modifying the target recommendation table mode by using the structured statement to obtain a target data table.

Optionally, after determining the target recommendation table mode among the plurality of recommendation table modes according to the received determining operation, the method further comprises: acquiring attribute values of column attributes in the target recommendation table mode from the interface data by using a preset column extraction program; inserting the attribute value into the column attribute corresponding to the target recommendation table mode to obtain the column attribute with data; performing Cartesian product on a plurality of column attributes with data in the target recommended table mode to obtain an intermediate table; modifying the target recommendation table mode by using the structured statement to obtain a target data table, wherein the method comprises the following steps: modifying the intermediate table by using the structured statement to obtain a modified intermediate table; and screening the row tuples in the modified intermediate table by using a preset row extraction program to obtain the target data table.

Optionally, after parsing the plurality of source data interfaces by using a preset knowledge rule graph to obtain a primary key of each source data interface in the plurality of source data interfaces, the method further includes: establishing a retention task sequence for the plurality of source data interfaces; according to the task sequence, sequentially determining a target interface for data retention; forming a record log of data call according to the retention sequence number of the target interface in the retention task sequence; calling interface data of the target interface, and reserving the interface data to an original library; when the interface data fails to be saved to the original library, scanning a record log of the data call to acquire the saved sequence number; according to the reserved sequence number, recalling the interface data of the target interface, and reserving the interface data to the original library; acquiring attribute values of column attributes in the target recommendation table mode from the interface data by using a preset column extraction program, wherein the attribute values comprise: and acquiring attribute values of column attributes in the target recommendation table mode from the interface data in the original library by using a preset column extraction program.

Optionally, inserting the attribute value into a column attribute corresponding to the target recommendation table mode includes: generating an insertion task sequence according to the main key sequence of the target recommendation table mode; according to the insertion task sequence, sequentially determining target positions for data insertion in the target recommendation table mode; forming a record log of data insertion according to the insertion sequence number of the target position in the insertion task sequence; the record log of the data insertion comprises a numerical value of a main key sequence corresponding to the target position, a column attribute corresponding to the target position and the attribute value; and when the attribute value is failed to be inserted into the column attribute corresponding to the target recommendation table mode, scanning a record log in which data is inserted, and inserting the attribute value into the column attribute corresponding to the target recommendation table mode according to the numerical value of the main key sequence corresponding to the target position and the column attribute corresponding to the target position.

Optionally, generating a structured statement for performing an atomic operation on the target recommended table mode according to the received modification operation, including: generating the atomic operation according to the received modification operation, and forming a log record of column modification aiming at the type of the atomic operation; generating a structured statement for executing atomic operation on the target recommended table mode according to the atomic operation; before screening the row tuples in the intermediate data table, the method further comprises: forming a log record of the line modification; when the modification of the intermediate table by using the structuring statement fails, deleting the modified intermediate table according to the log record of the column modification, and re-modifying the intermediate table by using the structuring statement; or deleting the target data table according to the log record of the line modification when the line tuple in the modified intermediate table fails to be screened, and rescreening the line tuple in the modified intermediate table by using a preset line extraction program.

Optionally, generating at least one recommended table mode according to the primary key of each source data interface of the plurality of source data interfaces for the interface data of each source data interface of the plurality of source data interfaces, respectively, includes: generating a metadata mode of a target source data interface according to a hierarchical structure tree of the target source data interface; the target source data interface is any source data interface in the plurality of source data interfaces; traversing all nodes of the metadata mode according to the planned path; determining a non-leaf node comprising a plurality of different non-leaf nodes as the name of the first recommended table schema; determining a first layer of child nodes of the non-leaf nodes comprising a plurality of different non-leaf nodes according to the planned path; determining a first level child node of the non-leaf node comprising a plurality of different non-leaf nodes as a tuple of the first recommended table pattern; determining a non-leaf node comprising a plurality of different leaf nodes as the name of the second recommended table mode; determining a first layer of child nodes of the non-leaf nodes comprising a plurality of different leaf nodes according to the planned path; determining the first level child node of the non-leaf node comprising a plurality of different leaf nodes as a column attribute of the second recommended table mode.

Optionally, the method further comprises: acquiring a plurality of data bodies from different data interfaces of a plurality of application programs; labeling each data body in the plurality of data bodies respectively to obtain a plurality of meta-label sets corresponding to single data bodies; acquiring reference primary keys from each of the plurality of data bodies respectively to obtain a plurality of sets of meta primary keys corresponding to a single data body; respectively obtaining a plurality of sets of meta-structure description information corresponding to a single data body according to the structure description of the interface where each data body in the plurality of data bodies is located and the structure description of the application program where each data body in the plurality of data bodies is located; acquiring meta tags, meta reference primary keys and meta structure description information corresponding to the same data interface from the set of meta tags, the set of meta primary keys and the set of meta structure description information respectively; establishing knowledge element rules corresponding to the data interfaces according to the element labels, the element reference primary keys and the element structure description information of each data interface respectively to obtain a plurality of knowledge element rules; aiming at each knowledge element rule in the plurality of knowledge element rules, searching in the plurality of knowledge element rules to obtain a similar knowledge element rule and a father knowledge element rule; establishing similar connection between each knowledge element rule in the plurality of knowledge element rules and a similar knowledge element rule in each knowledge element rule in the plurality of knowledge element rules, and establishing connection between each knowledge element rule in the plurality of knowledge element rules and a father knowledge element rule in each knowledge element rule in the plurality of knowledge element rules to form the preset knowledge rule map.

Optionally, analyzing the plurality of source data interfaces by using a preset knowledge rule graph to obtain a primary key of each source data interface in the plurality of source data interfaces, including: sequentially determining each source data interface in the plurality of source data interfaces as a target source data interface; obtaining a label of the target source data interface; analyzing the target source data interface to obtain a target structure description of the target source data interface; searching a target meta-tag matched with the tag and target meta-structure description information matched with the structure description in the knowledge rule map solution; determining the same knowledge element rule corresponding to the target element tag and the target element structure description information as a target knowledge element rule; and determining the primary key of the target knowledge meta rule as the primary key of the target source data interface.

Optionally, the method further comprises:

constructing a mapping model based on the mapping language; obtaining a plurality of sample interfaces and generating a plurality of sample metadata modes according to the plurality of sample interfaces; determining a sample column attribute to be added in the sample metadata schema, and an attribute value corresponding to the sample column attribute to be added; collecting a sample example data set based on the sample column attributes; training the mapping model according to the knowledge rule by using the sample example data set, the sample metadata mode, the sample column attribute and the attribute value of the sample column attribute; and determining the mapping model which is trained for a plurality of times as the preset column extraction program.

Optionally, the method further comprises:

determining a plurality of sample row tuples in the sample metadata schema; wherein all attribute values contained in each of the plurality of sample row tuples correspond to the same primary key; inserting the attribute value of the sample column attribute to be added into the sample column attribute to be added to obtain a column attribute of which the sample has the attribute value; obtaining a sample recommendation table with attribute values according to the sample metadata mode; performing Cartesian product on the column attribute with the attribute value of the sample and the sample recommendation table with the attribute value to obtain a sample intermediate table; obtaining a first atomic rule which constrains attribute values in the intermediate table based on a numerical rule according to the sample example data set; obtaining a second atomic rule based on attribute values of the intermediate table constrained by non-leaf ancestor nodes according to the sample example data set; combining the first atomic rule and the second atomic rule to obtain a predicate combination; screening the sample intermediate table by utilizing the predicate combinations to obtain sample target data; verifying the sample target data by using the plurality of sample row tuples, and adjusting the predicate combination according to a verification result; and determining the predicate combination subjected to multiple adjustments as the preset row tuple extraction program.

A second aspect of embodiments of the present application provides a system for on-demand storage of interface data, the system for on-demand storage of interface data comprising: the system comprises a primary key discovery module, a recommendation table generation module and an intermediate table mapping module;

the main key discovery module is used for determining a plurality of source data interfaces according to the received storage operation; the primary key discovery module is further configured to parse the plurality of source data interfaces by using a preset knowledge rule map to obtain a primary key of each source data interface in the plurality of source data interfaces; the recommendation table generation module is used for generating at least one recommendation table mode according to the main key of each source data interface in the plurality of source data interfaces for the interface data of each source data interface in the plurality of source data interfaces; the middle table mapping module is used for merging the recommended table modes with the same main key to obtain a plurality of merged recommended table modes; the intermediate table mapping module is further used for determining a target recommended table mode in the plurality of recommended table modes according to the received determining operation; the intermediate table mapping module is further used for generating a structured statement for performing atomic operation on the target recommended table mode according to the received modification operation; wherein the atomic operations include a delete column operation, an adjust column operation, and an add column operation; the intermediate table mapping module is further configured to modify the target recommendation table mode by using the structured statement to obtain a target data table.

Optionally, the middle table mapping module is further configured to obtain, from the interface data, an attribute value of a column attribute in the target recommendation table mode using a preset column extraction program; the middle table mapping module also inserts the attribute value into the column attribute corresponding to the target recommended table mode to obtain the column attribute with data; the middle table mapping module is further used for inserting the attribute value into the column attribute corresponding to the target recommended table mode to obtain the column attribute with data; the intermediate table mapping module is further used for carrying out Cartesian product on a plurality of column attributes with data in the target recommended table mode to obtain an intermediate table; the intermediate table mapping module is used for: modifying the intermediate table by using the structured statement to obtain a modified intermediate table; and screening the row tuples in the modified intermediate table by using a preset row extraction program to obtain the target data table.

A third aspect of the embodiments of the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described in the first aspect of the present application.

A fourth aspect of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method described in the first aspect of the present application when the processor executes the computer program.

According to the method, after a data recommendation table which lists column attributes of all data interfaces is generated and displayed according to a metadata model of the data interfaces, a structured statement is generated according to an operation instruction of a user, operations such as deleting columns and modifying columns are executed on the data recommendation table, and the data recommendation table which preliminarily meets application requirements is obtained. And (3) utilizing a preset column extraction program to mine candidate column attributes again from the data interfaces, ensuring that the column attributes meeting the application requirements can be obtained without manually browsing a plurality of data interfaces of a plurality of service systems, and selecting the column attributes meeting the application requirements and attribute values of the corresponding column attributes from the candidate column attributes mined by the preset column extraction program directly, and adding the column attributes and the attribute values of the corresponding column attributes into a data recommendation table to obtain the data recommendation table further meeting the application requirements. And then the preset column extraction program is used for mining attribute values corresponding to the column attributes to obtain a data recommendation table with the attribute values, and the column attributes with the attribute values and the data recommendation table with the attribute values are subjected to Cartesian product to obtain an intermediate table, so that the final target data table is further formed according to the intermediate table with complete data. And finally, screening the intermediate table by using a preset row extraction program to obtain a target data table, so that each attribute value of the row tuple in the target data table corresponds to the same main body, the integrity of the main body of the data is met, and the application requirement of the data is met.

Drawings

FIG. 1 is a schematic diagram of an implementation environment involved in a method for on-demand storage of interface data provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a system architecture in which interface data is stored on demand;

FIG. 3 is a metadata schema corresponding to the data body shown in Table 1;

FIG. 4 is a schematic diagram of the knowledge rule maintenance sub-module;

FIG. 5 is a flowchart of steps for on-demand storage of interface data according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a data metadata schema in an embodiment of the present application;

FIG. 7 is a flowchart illustrating steps for obtaining a preset row tuple extraction procedure according to an embodiment of the present application;

FIG. 8 is a flow chart of persisting data in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

With the development of technologies such as cloud computing, big data, artificial intelligence and the like, various platforms release a large amount of data every day, and certain differences exist in data information released by the data of each platform, so that it is difficult to obtain real and effective information in a large amount of data.

In view of the foregoing, the present application proposes a method and system for on-demand storage of interface data. Generating a recommended list mode aiming at the data which should be required, analyzing a source data interface for acquiring the required data by using a preset knowledge rule map, acquiring a main key of the source data interface, merging the recommended list modes with the same main key, and achieving the purpose of merging the fragment data acquired from different source data interfaces. Meanwhile, a reserved task sequence is established according to the attribute of the main key, data is reserved according to the reserved task sequence, and corresponding logs are recorded, so that the reserved data is prevented from being lost when the system is interrupted.

Fig. 1 is a schematic diagram of an implementation environment related to a data processing method according to an embodiment of the present application. As shown in fig. 1, the implementation environment may include: a data server 110 and at least one business system 120.

The data server 110 may be a server, a server cluster formed by a plurality of servers, or a cloud computing service center. Business system 120 may be the back end of an application, such as the back end of a sound, a blog, a degree. The data server 110 has an electronic device and a storage device that integrate the method of on-demand storage of interface data as proposed herein.

A connection may be established between the data server 110 and the service system 120 through a wired network or a wireless network, and a user may invoke a data interface in the service system 120 using the data server 110. Specifically, the user may obtain the data interface at the back end of the service system 120 through the data server 110, and select a related data interface, for example, the data interface is a data interface for storing and obtaining climate data in the service system, where the data application needs to analyze precipitation in the whole country in summer. The data server 110 may generate a structured statement according to a call instruction of a user, and obtain related data from a related data interface.

The method for storing the interface data according to the requirement is applied to a system for storing the interface data according to the requirement. As shown in fig. 2, fig. 2 is a schematic diagram of a system in which interface data is stored on demand. As shown in fig. 2, the system for storing data on demand mainly includes: a primary key discovery module 201, an interface data persistence module 202, a recommendation table generation module 203, an intermediate table mapping module 204, a target data persistence module 205, and a retention log module 206.

After determining the form of the target data table according to the application requirement of the data, a user uses the system for storing the data according to the requirement, specifically uses a primary key discovery module to call the data source interfaces of a plurality of service platforms, and obtains the primary keys of the source data interfaces from the service platforms.

The primary key discovery module specifically analyzes the plurality of data source interfaces by using the knowledge rule patterns, and further obtains primary keys of the data source interfaces. The main key discovery module integrates a pre-constructed knowledge rule map, and the method for specifically constructing the knowledge rule map comprises the following steps:

acquiring a plurality of data bodies from different data interfaces of a plurality of application programs; labeling each data body in the plurality of data bodies respectively to obtain a plurality of meta-label sets corresponding to single data bodies; acquiring reference primary keys from each of the plurality of data bodies respectively to obtain a plurality of sets of meta primary keys corresponding to a single data body; respectively obtaining a plurality of sets of meta-structure description information corresponding to a single data body according to the structure description of the interface where each data body in the plurality of data bodies is located and the structure description of the application program where each data body in the plurality of data bodies is located; acquiring meta tags, meta reference primary keys and meta structure description information corresponding to the same data interface from the set of meta tags, the set of meta primary keys and the set of meta structure description information respectively; establishing knowledge element rules corresponding to the data interfaces according to the element labels, the element reference primary keys and the element structure description information of each data interface respectively to obtain a plurality of knowledge element rules; aiming at each knowledge element rule in the plurality of knowledge element rules, searching in the plurality of knowledge element rules to obtain a similar knowledge element rule and a father knowledge element rule; establishing similar connection between each knowledge element rule in the plurality of knowledge element rules and a similar knowledge element rule in each knowledge element rule in the plurality of knowledge element rules, and establishing connection between each knowledge element rule in the plurality of knowledge element rules and a father knowledge element rule in each knowledge element rule in the plurality of knowledge element rules to form the preset knowledge rule map.

The data body refers to various associated data which are gathered together to form a data association structure. Table 1 shows an embodiment of a data body.

TABLE 1

The label is a property of the data body, taking the data body of table 1 as an example, and the manually marked label can be the score of poetry. Assuming the data ontology is a movie attendance statistic, the tag may be an action movie. Meta tags are tags that manually label data obtained from a data interface.

The data structure of the data interface in the present application is a hierarchical tree, such as JSON, XML, and other data formats. In general, the data body may be generated from a hierarchical tree of data interfaces, or may be generated from an interface metadata schema derived from the hierarchical tree. As shown in fig. 3, fig. 3 is a metadata schema corresponding to the data body shown in table 1.

The reference primary key is a primary key manually obtained from the data body. A primary key is data that can identify attributes of different subjects. As shown in table 1, the student is a data body to which the data body focuses. An attribute represents a certain information element that the data body possesses. The attributes of the data subject student include: mathematical achievements, chinese achievements and academic numbers. The number can be used as the unique identification of different data bodies, and the number is the main key of the data body shown in table 1. The primary key is a primary key manually specified from the data of the data interface.

The structural description of the interface is mainly a description about the hierarchical tree of interfaces. The structural description of the system is information describing the system, for example: parameters of the interface, call form of the interface, protocol, etc. The meta-structure description information is a structure description obtained manually according to the actual parameters of the interface and the actual parameters of the system.

Similar knowledge meta-rules refer to meta-rules for interfaces that are the same or interdependent in interface type. Such as knowledge element rules for the statistics action movie attendance data interface and knowledge element rules for the statistics literature movie attendance data interface. The parent knowledge element rule refers to a rule that contains the current knowledge element rule. For example, the knowledge element rule of the interface that counts the company fee data is a parent knowledge element rule of the interface that counts the sales section fee data.

The specific method for the primary key discovery module to analyze the primary key of the data interface by using the knowledge rule pattern is as follows: sequentially determining each source data interface in the plurality of source data interfaces as a target source data interface; obtaining a label of the target source data interface; analyzing the target source data interface to obtain a target structure description of the target source data interface; searching a target meta-tag matched with the tag and target meta-structure description information matched with the structure description in the knowledge rule map solution; determining the same knowledge element rule corresponding to the target element tag and the target element structure description information as a target knowledge element rule; and determining the primary key of the target knowledge meta rule as the primary key of the target source data interface.

The primary key discovery module comprises a knowledge rule maintenance sub-module. As shown in fig. 4, fig. 4 is a schematic structural diagram of the knowledge rule maintenance sub-module. The knowledge rule tag retrieval module is used for executing the steps of searching the target meta tag matched with the tag in the knowledge rule map solution and the target meta structure description information matched with the structure description.

The labels entered by the user correspond to the application requirements of the target data table, in the above example the steps performed by the primary key discovery module determine the embodiments of the plurality of source data interfaces based on the received storage operations.

Application requirements refer to the data body of the requirements when actually analyzing a certain class of data. For example, the data ontology of table 1, the user intends to analyze the cultural level of the poetry, and according to the requirement of the user for analyzing the cultural level of the poetry, the target data table is determined to need to list the attribute Chinese score, the data score, the poetry name and the like, so as to better meet the application requirement (analyze the cultural level of the poetry).

The knowledge rule structure retrieval module is used for retrieving knowledge element rules which are consistent with the structure description input by the user in the knowledge graph.

The target meta-structure description is a structure description in a knowledge meta-rule having the same meta-tag as the tag (user-entered tag) in the knowledge graph. And assuming that the labels in the knowledge graph are knowledge element rules of the number of newly added diagnosticians and the knowledge element rules, which are identical to the structure description in the knowledge graph and the structure description input by the user, determining the knowledge element rules as target original rules.

The knowledge rule recommending module is used for recommending the primary key in the target original rule to the user.

The knowledge rule expansion module is used for adding the new knowledge element rule into the knowledge rule map. Adding a new knowledge element rule refers to establishing a similar connection between the new knowledge element rule and a similar knowledge element rule in a knowledge graph, and establishing an inclusion connection between the new knowledge element rule and a father knowledge element rule in the knowledge graph.

If the rule maintenance sub-module does not recommend the primary key of the target source data interface based on the knowledge rule picture, the MD5 value of the data can be calculated by using the parameters, the attributes and the attribute values of the hierarchical tree of the interfaces as the primary key of the current interface, or the primary keys of other data interfaces with the same type can be used as the primary keys of the current interface. Therefore, according to the steps executed by the primary key discovery module, the data interface is successfully analyzed to obtain the primary key of the data body in the data interface.

FIG. 5 is a flowchart of steps for on-demand storage of interface data according to an embodiment of the present application. As shown in figure 5 of the drawings,

step S501: determining a plurality of source data interfaces according to the received storage operation; step S502: analyzing the plurality of source data interfaces by using a preset knowledge rule map to obtain a main key of each source data interface in the plurality of source data interfaces; step S503: generating at least one recommended table mode according to the main key of each source data interface in the plurality of source data interfaces for the interface data of each source data interface in the plurality of source data interfaces;

Another embodiment of the present application proposes a method of generating a recommendation table:

generating a metadata mode of a target source data interface according to a hierarchical structure tree of the target source data interface; the target source data interface is any source data interface in the plurality of source data interfaces;

as shown in fig. 6, fig. 6 is a schematic structural diagram of a data metadata schema in an embodiment of the present application. The method comprises the steps of firstly taking an interface main key as a child node of a non-leaf node in a metadata mode, and then generating the metadata mode according to the connection mode of a root node and a leaf node in the metadata mode and the hierarchical distribution of all nodes. Specifically, each node of the hierarchical tree may be traversed according to a path from top to bottom and from left to right, and the positions of the data recorded by the nodes in the metadata pattern are determined according to the positions of the nodes in the hierarchical tree, and the metadata pattern is generated after the positions of all the nodes in the hierarchical tree are sequentially listed.

Traversing all nodes of the metadata mode according to the planned path; determining a non-leaf node comprising a plurality of different non-leaf nodes as the name of the first recommended table schema; the planned path refers to a path from top to bottom, left to right, starting from the root node of the metadata schema. Taking the metadata schema shown in fig. 3 as an example, the data server 110 first reads the data in the root node user set, and then reads the data in the non-leaf node user number one. One metadata schema has only one root node, which is the user collection in FIG. 3.

Determining a first layer of child nodes of the non-leaf nodes comprising a plurality of different non-leaf nodes according to the planned path; leaf nodes are end nodes in metadata schema such as name, number, age, etc. Non-leaf nodes are other nodes in the metadata schema than end nodes, such as user number one, friends. The next node of the node is a child node in the direction from the root node to the leaf node. Non-leaf node expressions mean that a child node of a node is not a leaf node. For example, the user set contains non-leaf nodes user one, user two, and user three, i.e., the user set is a non-leaf node containing a plurality of different non-leaf nodes.

Determining a first level child node of the non-leaf node comprising a plurality of different non-leaf nodes as a tuple of the first recommended table pattern; determining a non-leaf node comprising a plurality of different leaf nodes as the name of the second recommended table mode; determining a first layer of child nodes of the non-leaf nodes comprising a plurality of different leaf nodes according to the planned path; determining the first level child node of the non-leaf node comprising a plurality of different leaf nodes as a column attribute of the second recommended table mode.

The first tier child nodes are child nodes that directly connect nodes. Taking the user set node in fig. 6 as an example, the first layer of sub-nodes of the user set are a first user, a second user and a third user, and the second layer of sub-nodes are numbers, names and the like.

Taking fig. 6 as an example, the non-leaf node user set includes a plurality of non-leaf node users one, two and three. And then taking the first user as a tuple, the second user as a tuple and the third user as a tuple. The first user, the second user and the third user all have attributes and corresponding attribute values respectively.

Tuples refer to attribute values in a data table that are associated with the same body. Table 2 is one tuple in table 1.

1

Zhang San

80

61

TABLE 2

Leaf node expression means that a child node of a node is a leaf node. For example, a friendship contains a leaf node friend number and year, i.e., a friend is a non-leaf node that contains multiple different leaf nodes. Taking fig. 6 as an example, the first data table and the second data table are recommendation tables respectively named as a friend number and a user set. According to the above embodiment, the data recommendation tables generated according to the metadata schema shown in fig. 3 are shown in tables 3 and 4.

TABLE 2

TABLE 3 Table 3

Tables 2 and 3 are data recommendation tables corresponding to the names of the data recommendation tables generated by the current node when traversing to a non-leaf node comprising a plurality of different leaf nodes by the data server 110. After obtaining the table name, the data server 110 obtains the child node of the current node, and determines the child node as a column attribute.

An attribute represents a certain information element owned by a subject. Taking table 1 as an example, the math score and the chinese score are all column attributes of the subject. It follows that determining child nodes as column attributes means that, depending on the data pattern of the data interface, it is determined that friend numbers and years can be used to describe friend bodies.

Step S504: merging the data table modes with the same main key to obtain a plurality of merged recommended table modes; for example, the user may store the interface data as needed according to a system-generated recommendations table schema displayed by the server 110: tables 2 and 3 may be combined according to the primary key numbers after tables 2 and 3.

Step S505: according to the received determining operation, determining the target recommending table mode in the plurality of recommending table modes is determining the recommending table mode closest to the data application requirement in the plurality of target table modes according to the selection instruction of the user. Step S506: generating a structured statement for executing an atomic operation on the target recommendation table mode according to the received modification operation; wherein the atomic operations include a delete column operation, an adjust column operation, and an add column operation; step S507: and modifying the target recommendation table mode by using the structured statement to obtain a target data table.

Another embodiment of the present application also proposes a method of adding a column attribute: the following operations are performed by the intermediate table mapping module: acquiring attribute values of column attributes in the target recommendation table mode from the interface data by using a preset column extraction program; inserting the attribute value into the column attribute corresponding to the target recommendation table mode to obtain the column attribute with data; performing Cartesian product on a plurality of column attributes with data in the target recommended table mode to obtain an intermediate table; and further obtaining the intermediate table with comprehensive data from the recommendation table with increased column attribute of the attribute value. The intermediate table obtained by cartesian product of the attribute names and mathematical achievements listed in table 1 is shown in table 5:

TABLE 5

The target data table obtained after the preset row tuple extraction program screens the intermediate table shown in table 5 for tuples conforming to the knowledge rules is shown in table 6.

Name of name	Mathematical performance
		Zhang San	80
Li Si	85
		Wang Wu	90

TABLE 6

Another embodiment of the present application specifically describes a method for obtaining a preset column extraction procedure:

constructing a mapping model based on the mapping language; firstly, a mapping language SML is introduced, a mapping model is constructed, then a sample for training the mapping model is obtained, the mapping model is trained and verified, and a model which can obtain column attributes and attribute values consistent with example data according to a metadata mode of a data interface is obtained and is used as a column extraction program. SML is used to process or represent the definition of related data items in a map.

Obtaining a plurality of sample interfaces and generating a plurality of sample metadata modes according to the plurality of sample interfaces; the data interface of the existing service system can be obtained in a large amount to be used as a sample interface, the service system can be developed by itself, and the data interface of the service system which is developed by itself is used as a sample interface.

Determining a sample column attribute to be added in the sample metadata schema, and an attribute value corresponding to the sample column attribute to be added; an example data set of sample column attributes may be obtained from a database or the internet. For example, a data interface of the weather news is firstly obtained from the back end of the news service system, the data interface of the weather news is determined to be a sample data interface, and the column attribute describing the weather is obtained from a hierarchical tree species of the data interface of the weather news: temperature, humidity, wind level, precipitation, etc. Collecting enough attribute values for each column attribute; such as: for the temperature column attribute, collecting an integer ranging from minus 30 ℃ to 30 ℃ and a part of decimal as an attribute value of the temperature column attribute; for precipitation column attributes, attribute values of 10mm every 24 hours, 20mm every 24 hours, etc. are collected.

Collecting a sample example data set based on the sample column attributes; training the mapping model according to the knowledge rule by using the sample example data set, the sample metadata mode, the sample column attribute and the attribute value of the sample column attribute; and determining the mapping model which is trained for a plurality of times as the preset column extraction program.

The specific process of training the mapping model according to the knowledge rule by using the sample example data set, the sample metadata mode, the sample column attribute and the attribute value of the sample column attribute is as follows:

sequentially acquiring data of all leaf nodes of the sample metadata mode according to a planning path by using the mapping model; determining leaf nodes corresponding to the numerical values in the sample example data set as prediction target attributes;

the sample example data set is first input into a mapping model.

For example, the sample example data set of the temperature sample column attribute of a weather subject is [10, 12, 11.5, 13, 13.2, 14, 14.6, … … ], the mapping model searches ten leaf nodes from the sample metadata schema of the weather data interface, and the values of some of the ten leaf nodes are in the sample example data set [10, 12, 11.5, 13, 13.2, 14, 14.6, … … ], assuming that the values of leaf node 1 and leaf node 2 are 10 and 11.5, respectively, then the node attributes of the acquired leaf node 1 and leaf node 2 are taken as prediction target attributes.

Determining the numerical value of the node with the same attribute name as the predicted target attribute in the sample metadata mode as a predicted target numerical value;

the predicted target attribute is the column attribute to be added which is extracted from the data interface and needs to be verified in the training process of the mapping model. The predicted target value is an attribute value of an attribute of the column attribute to be added, which is extracted from the data interface and needs to be verified, in the training process of the mapping model.

Judging whether the attribute of the other eight leaf nodes in the ten leaf nodes is the same as the prediction target attribute, and assuming that the attribute of the node 4 and the node 5 is the same as the attribute of the node 1 and the node 1, in other words, the attribute of the node 4 and the node 5 is the same as the prediction target attribute, setting the numerical value 16 described by the node 4 and the numerical value 18 described by the node 5 as the prediction target numerical values.

The sample data interface may also search for a parameter name that is the same as the attribute name of the predicted target attribute, and determine an interface parameter corresponding to the parameter name as the predicted target value. And verifying the certificate predicted target attribute and the predicted target numerical value by using the sample column attribute and the attribute value of the sample column attribute.

Judging whether the sample column attribute is equal to the predicted target attribute, and calculating the degree of phase difference between the sample column attribute and the predicted target attribute; judging whether the attribute value of the sample column attribute is equal to the predicted target value or not, and calculating the degree of phase difference between the attribute value of the sample column attribute and the predicted target value; and adjusting parameters of the mapping model according to the degree of the phase difference between the sample column attribute and the predicted target attribute and the degree of the phase difference between the attribute value of the sample column attribute and the predicted target value, and continuing training the mapping model after adjusting the parameters according to the flow until the mapping model can accurately obtain the predicted target value identical to the attribute value of the sample column attribute and the predicted target attribute identical to the sample column attribute, and determining the mapping model as a preset column extraction program.

Training of the mapping model for different cycles may use the acquired sample example data set, sample metadata schema, sample column attributes, and attribute values of the sample column attributes based on different sample data interfaces. For example, after a previous round of training of the mapping model using training data related to air temperature samples obtained from the weather data interface (sample example data sets are [10, 12, 11.5, 13, 13.2, 14, 14.6, … … ], sample metadata schema, etc.), the weather data interface may be used as a sample data interface for the next round of training the mapping model.

Another embodiment of the present application specifically describes a method for obtaining a preset row extraction program: as shown in fig. 7, fig. 7 is a flowchart illustrating steps of obtaining a preset row tuple extraction procedure according to an embodiment of the present application.

According to the metadata model of the data interfaces, the data recommendation table which lists the column attributes of each data interface is generated, and after a plurality of data recommendation tables are displayed, operations such as deleting, modifying and the like are generated in a structuring mode according to operation instructions of users, so that the data recommendation table which preliminarily meets application requirements is obtained. And (3) utilizing a preset column extraction program to mine candidate column attributes again from the data interface, and directly selecting the column attributes meeting the application requirements and attribute values of the corresponding column attributes from the candidate column attributes of the preset column extraction program and adding the column attributes and the attribute values of the corresponding column attributes into the data recommendation table to obtain the data recommendation table further meeting the application requirements under the condition that a user does not need to manually browse a plurality of service systems to manually obtain the column attributes. And then the preset column extraction program is used for mining attribute values corresponding to the column attributes to obtain a data recommendation table with the attribute values, and the column attributes with the attribute values and the data recommendation table with the attribute values are subjected to Cartesian product to obtain an intermediate table, so that the final target data table is further formed according to the intermediate table with complete data. And finally, screening the intermediate table by using a preset row extraction program to obtain a target data table, so that each attribute value of the row tuple in the target data table corresponds to the same main body, the integrity of the main body of the data is met, and the application requirement of the data is met.

The column extraction program is trained based on a mapping model constructed by a mapping language, and has the congenital advantage of processing the mapping relation from the intermediate table to the target data table. Meanwhile, a large amount of sample interface data are collected based on an actual service system, sample example data are collected according to a sample data interface, and attribute values of a sample metadata mode, a sample column attribute and a sample column attribute are obtained.

Step S701: determining a plurality of sample row tuples in the sample metadata schema; wherein all attribute values contained in each of the plurality of sample row tuples correspond to the same primary key; all nodes of the sample metadata schema may be traversed along the planned path, determining a first tier of children nodes of the non-leaf nodes comprising a plurality of different non-leaf nodes as a plurality of sample row tuples.

Step S702: inserting the attribute value of the sample column attribute to be added into the sample column attribute to be added to obtain a column attribute of which the sample has the attribute value; taking table 5 as an example, adding column attribute language scores on the basis of table 5, and inserting 61, 82 and 70 into column attributes to obtain column attributes with attribute values: chinese achievements- [61, 82, 70].

Step S703: obtaining a sample recommendation table with attribute values according to the sample metadata mode;

the method comprises traversing each node of a hierarchical tree of a sample data interface according to a path from top to bottom and from left to right, determining the position of data recorded by the node in a metadata mode according to the position of the node in the hierarchical tree, sequentially listing the positions of all nodes in the hierarchical tree, generating a sample recommendation table, and adding the numerical value recorded by the leaf node corresponding to the column attribute into the sample recommendation table to obtain the sample recommendation table with the attribute value.

Step S704: performing Cartesian product on the column attribute with the attribute value of the sample and the sample recommendation table with the attribute value to obtain a sample intermediate table; step S705: obtaining a first atomic rule which constrains attribute values in the intermediate table based on a numerical rule according to the sample example data set;

And learning to obtain a numerical rule based on the sample example data set. Taking sample example data sets of temperature sample column attributes as examples of [10, 12, 11.5, 13, 13.2, 14, 14.6 and … … ], the numerical rule is in the range of 1-20 ℃ and the numerical rule is in the range of 60 cm-200 as examples of height sample column attributes.

The meaning of restricting the attribute values of the intermediate table based on the data rule is to exclude data in the intermediate table that does not conform to the column attribute property. Step S706: obtaining a second atomic rule based on attribute values of the intermediate table constrained by non-leaf ancestor nodes according to the sample example data set;

constraint of the intermediate table based on non-leaf ancestor nodes refers to: the attribute values of different attributes in any tuple are located at the same ancestor node. As shown in FIG. 2, year 2 and friend number 3 have the same ancestor node friendship node, corresponding to which is 1. In other words, constraining the intermediate table based on non-leaf ancestor nodes refers to attribute values for different columns of attributes in the tuples in the intermediate table, corresponding to the same primary key in the sample metadata schema.

More than one first atomic rule is taken as an example of a weather data interface, and besides the value range of the attribute value of the temperature column attribute is within the range of 1-20 ℃, the attribute value of the precipitation attribute with the atomic rule cannot be negative, and the like.

The second atomic rule is more than one, except that the year 2 and the friend number 3 have the same ancestor node friendship node, and the number 1 and the name wave also have the same ancestor node number one user node. And (3) randomly combining the different first atomic rules and second atomic rules to obtain predicate combinations.

Step S707: combining the first atomic rule and the second atomic rule to obtain a predicate combination; one predicate is located by one or more first atomic rules and one or more second atomic rules combined into a group. There are multiple predicates in the predicate combination.

For example, the friend relationship node of the ancestor node is provided with the year 2 and the friend number 3, the user node of the ancestor node I is provided with the number 1 and the name wave, and three atomic rules of the attribute value of the temperature column attribute in the range of 1-20 ℃ can be combined into one predicate. The attribute values of the friend relationship node and the precipitation attribute of the friend relationship node with the same ancestor node in the year 2 and the friend number 3 are not negative numbers and can be combined into another predicate.

Step S708: screening the sample intermediate table by utilizing the predicate combinations to obtain sample target data; and the sample target data is a target data table obtained by screening the sample intermediate table by using a predicate combination which is not trained yet. Step S709: verifying the sample target data by using the plurality of sample row tuples, and adjusting the predicate combination according to a verification result; step S710: and determining the predicate combination subjected to multiple adjustments as the preset row tuple extraction program.

The sample line tuple is an accurate tuple by human validation derived from sample interface data. According to the difference degree of the sample row tuple and the sample target data, an atomic rule in the predicate combination is adjusted, and then the adjusted predicate combination is subjected to tuple screening on a sample intermediate table until the predicate combination can accurately obtain the row tuple which accords with the sample example data.

The knowledge rules are: sequentially acquiring data of all leaf nodes in a sample metadata mode by using a mapping model according to a planning path; leaf nodes corresponding to values located in the sample example data set are determined as column attributes to be added. And determining the numerical value of the node which is the same as the attribute name of the column attribute to be added in the sample metadata mode as the attribute value of the column attribute to be added.

According to the method and the system, the first atomic rule is obtained through learning the numerical rules in the example data, the second atomic rule is obtained through combining the connection relation learning of each node in the metadata mode of the example data and the data interface, and the predicate combination is formed by combining the first atomic rule and the second atomic rule, wherein the first atomic rule can screen the tuple conforming to the numerical rules, the first atomic rule can screen the tuple conforming to the metadata mode of the data interface, and the entity integrity of the tuple is met. And the tuple meeting the application requirement can be screened from the redundant intermediate table according to the row extraction program obtained by the predicates after multiple adjustment and verification, so that the target data table meeting the application requirement is obtained.

After the intermediate table mapping module obtains the intermediate table, the redundant intermediate table is adjusted according to the received column deleting operation and column adjusting operation, and then a row extraction program is used for screening row tuples in the intermediate table to obtain the target data table. Modifying the intermediate table by using the structured statement to obtain a modified intermediate table; and screening the row tuples in the modified intermediate table by using a preset row extraction program to obtain the target data table.

Determining a plurality of sample row tuples in a sample metadata schema; wherein, all attribute values contained in each of the plurality of sample row tuples correspond to the same primary key; inserting the attribute value of the sample column attribute to be increased into the sample column attribute to be increased to obtain a column attribute with the attribute value of the sample; obtaining a sample recommendation table with attribute values according to the sample metadata mode; performing Cartesian product on the column attribute of the sample with the attribute value and the sample recommendation table with the attribute value to obtain a sample intermediate table; obtaining a first atomic rule which constrains attribute values in the intermediate table based on a numerical rule according to the sample example data set; obtaining a second atomic rule based on attribute values of the intermediate table constrained by non-leaf ancestor nodes according to the sample example data set; combining the first atomic rule and the second atomic rule to obtain a predicate combination; screening the sample intermediate table by utilizing the predicate combinations to obtain sample target data; verifying the sample target data by using the plurality of sample row tuples, and adjusting the predicate combination according to a verification result; and determining the predicate combination subjected to multiple adjustments as the preset row tuple extraction program.

According to the method, firstly, a plurality of source data interfaces are respectively analyzed to obtain the primary keys of the source data interfaces, then the recommended table mode of the source data interfaces is generated according to the primary keys, the structured statement is generated according to the received operation on the recommended table mode, and the recommended table mode is directly modified, so that the target data table meeting the data application requirements can be obtained. Modifying the recommendation table schema includes: and continuing to mine the column attributes and the attribute values corresponding to the column attributes in the plurality of source data interfaces by using a column extraction program, so that a user does not need to manually add the column attributes and the attribute values corresponding to the column attributes, and the efficiency of data mining is improved. Modifying the recommendations table schema further comprises: and performing Cartesian product on the column attribute with the attribute value to obtain an intermediate table, further ensuring the integrity of data acquired from a plurality of source data interfaces, and screening row tuples in the intermediate table by using a row extraction program to obtain a target data table formed by the row tuples which are sequentially arranged according to the primary key.

The method and the device also combine the log-keeping module, the interface data persistence module and the target data persistence module to realize the purpose of ensuring that the data are not lost when the system is interrupted. The method comprises the steps that an interface data persistence module is used for preserving data acquired from a source data interface to an original library, a log is established by a preserving log module in the process of preserving the data, the preserved data is recorded, the log can be scanned under the assumption that system disconnection recovery can be carried out, the preserved data before system disconnection is acquired, and the data is preserved again; and the target data persistence module is used for storing the generated target data table into a persistence database, establishing a log in the data storage process and recording the storage data so as to ensure that the process of generating the intermediate table and the process of generating the target data table are not influenced by system disconnection.

Fig. 8 is a flowchart of data retention according to an embodiment of the present application, and as shown in fig. 8, interface data retention may be roughly divided into three stages. In stage 1, interface data is obtained and persisted into the original library by calling the interface. The method adopts the mongoDB as a storage database of the interface data, and uses the original database to persist the data of the source interface, so that the missing data can be found from the original database under the condition that the data loss occurs later. After stage 1 is completed, a many-to-many mapping process (recommending tables to intermediate tables, intermediate tables to target data tables) of the data is performed, namely stage 2, the data of the target table is mined from the original library. The target data generated in phase 2 will persist into the target database, phase 3. This stage is performed in parallel with stage 2. The problem of inconsistent data can occur from the acquisition of interface data to the mapping and retention of the data in a series of stages.

For the first stage, the method of interface data is: establishing a retention task sequence for the plurality of source data interfaces; according to the task sequence, sequentially determining a target interface for data retention; forming a record log of data call according to the retention sequence number of the target interface in the retention task sequence; calling interface data of the target interface, and reserving the interface data to an original library; when the interface data fails to be saved to the original library, scanning a record log of the data call to acquire the saved sequence number; according to the reserved sequence number, recalling the interface data of the target interface, and reserving the interface data to the original library; acquiring attribute values of column attributes in the target recommendation table mode from the interface data by using a preset column extraction program, wherein the attribute values comprise: and acquiring attribute values of column attributes in the target recommendation table mode from the interface data in the original library by using a preset column extraction program.

For the process of acquiring data from an interface and saving the data to an original database, a text firstly acquires and establishes a saved task sequence for the interface call, and a sequence number of the call is set for each call. Recording a log of data call before a request corresponding to the sequence number, and clearing the log of data call after the interface acquires successful data and is successful in persistence. The log record operation process is as follows: and a, before sending the interface request, recording the related information of the interface request and the serial number of the request. And b, sending a request, storing the data into a physical disk to record the request after the data is obtained, and deleting the record log called by the data. When the problems such as program interruption and the like are caused by faults, the program is started again, the program scans the record log of the data call of the system, and interface data corresponding to the serial number of the reserved task sequence in the record log of the data call is obtained again and stored in the original library.

For the second phase, the many-to-many mapping of data, the following procedure is broken down:

(1) The target data table schema is recommended from the origin library. (2) converting the data of the original library into recommended table data. (3) And the user performs operations such as deletion, modification and the like on the basis of the recommended table mode, and determines the target data table mode. (4) And converting the recommendation table data into target data table data according to the operation sequence of the user.

In the above operation process, log processing needs to be performed for the step 2 and the step 4 respectively, because in both steps, data writing operation is involved, and when a fault occurs, the step 2 and the step 4 cause problems of missing or inconsistent final data due to program interruption. Therefore, we will log for step 2 and step 4, respectively.

The specific method for processing the step 2 is as follows: generating an insertion task sequence according to the main key sequence of the target recommendation table mode; according to the insertion task sequence, sequentially determining target positions for data insertion in the target recommendation table mode; forming a record log of data insertion according to the insertion sequence number of the target position in the insertion task sequence; the record log of the data insertion comprises a numerical value of a main key sequence corresponding to the target position, a column attribute corresponding to the target position and the attribute value; and when the attribute value is failed to be inserted into the column attribute corresponding to the target recommendation table mode, scanning a record log in which data is inserted, and inserting the attribute value into the column attribute corresponding to the target recommendation table mode according to the numerical value of the main key sequence corresponding to the target position and the column attribute corresponding to the target position.

In step 2, we turn the data in the original library into a recommended table mode. The data of the recommended table schema and the intermediate tables are stored on the target database. All tables in the recommended table mode are provided with a main key, and a conversion insertion task sequence is generated according to the main key sequence of the original library. When data insertion is carried out, firstly recording the record log of data insertion, if the insertion is successful, deleting the record log of data insertion, failing the insertion, scanning the record log of data insertion, and redoing the task of converting the data of the recommended table into the intermediate table. The logging process is as follows: before inserting data, recording the sequence of inserting data, main key and all column attribute values. And B, when a fault is encountered, scanning a record log of data insertion, and reinserting the data which are not successfully inserted.

The specific method for processing the step 4 is as follows: generating the atomic operation according to the received modification operation, and forming a log record of column modification aiming at the type of the atomic operation; generating a structured statement for executing atomic operation on the target recommended table mode according to the atomic operation; before screening the row tuples in the intermediate data table, the method further comprises: forming a log record of the line modification; when the modification of the intermediate table by using the structuring statement fails, deleting the modified intermediate table according to the log record of the column modification, and re-modifying the intermediate table by using the structuring statement; or deleting the target data table according to the log record of the line modification when the line tuple in the modified intermediate table fails to be screened, and rescreening the line tuple in the modified intermediate table by using a preset line extraction program

Recording the type of operation currently performed (column extraction operation or row extraction operation), generating a corresponding column modified log record or a row modified log record, wherein the column modified log record and the row modified log record comprise: the attribute name and table name of the column attribute involved, or the primary key sequence of the row tuple involved. b, when a fault occurs, after the program is restarted, scanning a log, deleting an unfinished intermediate table when the scanned log is a log record with a modified column, and regenerating the intermediate table; when the scanned log is the log record of line modification, deleting the unfinished target data table, and continuing the mapping conversion process from the original intermediate table where the fault occurs.

Based on the same inventive concept, the embodiment of the application provides a system for storing interface data on demand. As shown in fig. 2, the system for on-demand storage of interface data includes: a primary key discovery module 201, an interface data persistence module 202, a recommendation table generation module 203, an intermediate table mapping module 204, a target data persistence module 205, and a retention log module 206.

The main key discovery module is used for determining a plurality of source data interfaces according to the received storage operation; the primary key discovery module is further configured to parse the plurality of source data interfaces by using a preset knowledge rule map to obtain a primary key of each source data interface in the plurality of source data interfaces; the recommendation table generation module is used for generating at least one recommendation table mode according to the main key of each source data interface in the plurality of source data interfaces for the interface data of each source data interface in the plurality of source data interfaces; the middle table mapping module is used for merging the recommended table modes with the same main key to obtain a plurality of merged recommended table modes; the intermediate table mapping module is further used for determining a target recommended table mode in the plurality of recommended table modes according to the received determining operation; the intermediate table mapping module is further used for generating a structured statement for executing atomic operation on the target recommended table mode according to the received modification operation; wherein the atomic operations include a delete column operation, an adjust column operation, and an add column operation; the intermediate table mapping module is further configured to modify the target recommendation table mode by using the structured statement to obtain a target data table.

Optionally, the system for on-demand data storage further comprises: the log storage module and the interface data persistence module are used for storing the log; the retention log module is used for establishing a retention task sequence aiming at the plurality of source data interfaces; the retention log module is also used for sequentially determining a target interface for data retention according to the retention task sequence; the retention log module is also used for forming a record log of data call according to the retention sequence number of the target interface in the retention task sequence; the interface data persistence module is used for calling the interface data of the target interface and reserving the interface data to an original library; the retention log module is further configured to scan a log record of the data call to obtain the retention sequence number when the retention of the interface data to the original library fails; the interface data persistence module is also used for recalling the interface data of the target interface according to the retention sequence number and retaining the interface data to the original library; the intermediate table mapping module is used for: and acquiring attribute values of column attributes in the target recommendation table mode from the interface data in the original library by using a preset column extraction program.

Optionally, the system for on-demand data storage further comprises: a target data persistence module; the target data persistence module is used for generating an insertion task sequence according to the main key sequence of the target recommendation table mode; the target data persistence module is further used for sequentially determining target positions for data insertion in the target recommendation table mode according to the insertion task sequence; the target data persistence module is further used for forming a record log of data insertion according to the insertion sequence number of the target position in the insertion task sequence; the record log of the data insertion comprises a numerical value of a main key sequence corresponding to the target position, a column attribute corresponding to the target position and the attribute value; the target data persistence module is further configured to scan a log of data insertion when the attribute value fails to be inserted into a column attribute corresponding to the target recommendation table mode, and insert the attribute value into the column attribute corresponding to the target recommendation table mode according to the numerical value of the primary key sequence corresponding to the target position and the column attribute corresponding to the target position.

Optionally, the keep log module is further configured to generate the atomic operation according to the received modification operation, and form a log record of column modification for the type of the atomic operation; generating a structured statement for performing atomic operation on the target recommended table mode according to the atomic operation; the intermediate table mapping module is further used for the remaining log module to form a log record of line modification; the target data persistence module is further configured to delete the modified intermediate table according to the log record of the column modification when the modification of the intermediate table by using the structured statement fails, and re-modify the intermediate table by using the structured statement; or, the target data persistence module is further configured to delete the target data table according to the log record modified by the line when the filtering of the line tuple in the modified intermediate table fails, and rescreen the line tuple in the modified intermediate table by using a preset line extraction program.

Optionally, the recommendation table generation module is configured to: generating a metadata mode of a target source data interface according to a hierarchical structure tree of the target source data interface; the target source data interface is any source data interface in the plurality of source data interfaces; traversing all nodes of the metadata mode according to the planned path; determining a non-leaf node comprising a plurality of different non-leaf nodes as the name of the first recommended table schema; determining a first layer of child nodes of the non-leaf nodes comprising a plurality of different non-leaf nodes according to the planned path; determining a first level child node of the non-leaf node comprising a plurality of different non-leaf nodes as a tuple of the first recommended table pattern; determining a non-leaf node comprising a plurality of different leaf nodes as the name of the second recommended table mode; determining a first layer of child nodes of the non-leaf nodes comprising a plurality of different leaf nodes according to the planned path; determining the first level child node of the non-leaf node comprising a plurality of different leaf nodes as a column attribute of the second recommended table mode.

Optionally, the primary key discovery module is configured to:

Optionally, the primary key discovery module is configured to: sequentially determining each source data in the plurality of source data interfaces as a target source data interface; obtaining a label of the target source data interface; analyzing the target source data interface to obtain a target structure description of the target source data interface; searching a target meta-tag matched with the tag and target meta-structure description information matched with the structure description in the knowledge rule map solution; determining the same knowledge element rule corresponding to the target element tag and the target element structure description information as a target knowledge element rule; and determining the primary key of the target knowledge meta rule as the primary key of the target source data interface.

Based on the same inventive concept, another embodiment of the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements steps in a method for on-demand storage of interface data according to any of the embodiments of the present application.

Based on the same inventive concept, another embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the steps in the method for storing interface data according to any of the foregoing embodiments of the present application.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive or illustrative manner, and each embodiment is mainly described by the differences from other embodiments, and identical and similar parts between the embodiments are mutually referred.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the present application.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The above detailed description of a method and system for storing interface data on demand provided in the present application is provided only for helping to understand the method and core ideas of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method of on-demand storage of interface data, the method comprising:

determining a plurality of source data interfaces according to the received storage operation;

analyzing the plurality of source data interfaces by using a preset knowledge rule map to obtain a main key of each source data interface in the plurality of source data interfaces;

generating at least one recommended table mode according to the main key of each source data interface in the plurality of source data interfaces for the interface data of each source data interface in the plurality of source data interfaces;

combining the recommended list modes with the same main key to obtain a plurality of combined recommended list modes;

determining a target recommendation table mode among the plurality of recommendation table modes according to the received determination operation;

generating a structured statement for executing an atomic operation on the target recommendation table mode according to the received modification operation; wherein the atomic operations include a delete column operation, an adjust column operation, and an add column operation;

modifying the target recommendation table mode by using the structured statement to obtain a target data table;

after determining the target recommendation table mode among the plurality of recommendation table modes according to the received determining operation, the method further comprises:

Acquiring attribute values of column attributes in the target recommendation table mode from the interface data by using a preset column extraction program;

inserting the attribute value into the column attribute corresponding to the target recommendation table mode to obtain the column attribute with data;

performing Cartesian product on a plurality of column attributes with data in the target recommended table mode to obtain an intermediate table;

modifying the target recommendation table mode by using the structured statement to obtain a target data table, wherein the method comprises the following steps:

modifying the intermediate table by using the structured statement to obtain a modified intermediate table;

and screening the row tuples in the modified intermediate table by using a preset row extraction program to obtain the target data table.

2. The method of claim 1, wherein after parsing the plurality of source data interfaces using a preset knowledge-rule graph to obtain a primary key for each of the plurality of source data interfaces, the method further comprises:

establishing a retention task sequence for the plurality of source data interfaces;

according to the task sequence, sequentially determining a target interface for data retention;

forming a record log of data call according to the retention sequence number of the target interface in the retention task sequence;

Calling interface data of the target interface, and reserving the interface data to an original library;

when the interface data fails to be saved to the original library, scanning a record log of the data call to acquire the saved sequence number;

according to the reserved sequence number, recalling the interface data of the target interface, and reserving the interface data to the original library;

acquiring attribute values of column attributes in the target recommendation table mode from the interface data by using a preset column extraction program, wherein the attribute values comprise:

and acquiring attribute values of column attributes in the target recommendation table mode from the interface data in the original library by using a preset column extraction program.

3. The method of claim 1, wherein inserting the attribute value into a column attribute corresponding to the target recommendation table schema comprises:

generating an insertion task sequence according to the main key sequence of the target recommendation table mode;

according to the insertion task sequence, sequentially determining target positions for data insertion in the target recommendation table mode;

forming a record log of data insertion according to the insertion sequence number of the target position in the insertion task sequence; the record log of the data insertion comprises a numerical value of a main key sequence corresponding to the target position, a column attribute corresponding to the target position and the attribute value;

And when the attribute value is failed to be inserted into the column attribute corresponding to the target recommendation table mode, scanning a record log in which data is inserted, and inserting the attribute value into the column attribute corresponding to the target recommendation table mode according to the numerical value of the main key sequence corresponding to the target position and the column attribute corresponding to the target position.

4. The method of claim 1, wherein generating a structured statement to perform an atomic operation on the target recommended table mode based on the received modification operation comprises:

generating the atomic operation according to the received modification operation, and forming a log record of column modification aiming at the type of the atomic operation;

generating a structured statement for executing atomic operation on the target recommended table mode according to the atomic operation;

before screening the row tuples in the intermediate table, the method further comprises:

forming a log record of the line modification;

when the modification of the intermediate table by using the structuring statement fails, deleting the modified intermediate table according to the log record of the column modification, and re-modifying the intermediate table by using the structuring statement; or alternatively, the first and second heat exchangers may be,

and deleting the target data table according to the log record modified by the line when the line tuple in the modified intermediate table fails to be screened, and rescreening the line tuple in the modified intermediate table by using a preset line extraction program.

5. The method of claim 1, wherein generating at least one recommendations table schema for the interface data of each of the plurality of source data interfaces based on the primary key of each of the plurality of source data interfaces, respectively, comprises:

traversing all nodes of the metadata mode according to the planned path;

determining a non-leaf node comprising a plurality of different non-leaf nodes as the name of the first recommended table schema;

determining a first layer of child nodes of the non-leaf nodes comprising a plurality of different non-leaf nodes according to the planned path;

determining a first level child node of the non-leaf node comprising a plurality of different non-leaf nodes as a tuple of the first recommended table pattern;

determining a non-leaf node comprising a plurality of different leaf nodes as the name of the second recommended table mode;

determining a first layer of child nodes of the non-leaf nodes comprising a plurality of different leaf nodes according to the planned path;

Determining the first level child node of the non-leaf node comprising a plurality of different leaf nodes as a column attribute of the second recommended table mode.

6. The method according to claim 1, wherein the method further comprises:

acquiring a plurality of data bodies from different data interfaces of a plurality of application programs;

labeling each data body in the plurality of data bodies respectively to obtain a plurality of meta-label sets corresponding to single data bodies;

acquiring reference primary keys from each of the plurality of data bodies respectively to obtain a plurality of sets of meta primary keys corresponding to a single data body;

respectively obtaining a plurality of sets of meta-structure description information corresponding to a single data body according to the structure description of the interface where each data body in the plurality of data bodies is located and the structure description of the application program where each data body in the plurality of data bodies is located;

acquiring meta tags, meta reference primary keys and meta structure description information corresponding to the same data interface from the set of meta tags, the set of meta primary keys and the set of meta structure description information respectively;

establishing knowledge element rules corresponding to the data interfaces according to the element labels, the element reference primary keys and the element structure description information of each data interface respectively to obtain a plurality of knowledge element rules;

Aiming at each knowledge element rule in the plurality of knowledge element rules, searching in the plurality of knowledge element rules to obtain a similar knowledge element rule and a father knowledge element rule;

establishing similar connection between each knowledge element rule in the plurality of knowledge element rules and a similar knowledge element rule in each knowledge element rule in the plurality of knowledge element rules, and establishing connection between each knowledge element rule in the plurality of knowledge element rules and a father knowledge element rule in each knowledge element rule in the plurality of knowledge element rules to form the preset knowledge rule map.

7. The method according to claim 1 or 6, wherein parsing the plurality of source data interfaces using a preset knowledge rule graph to obtain a primary key for each of the plurality of source data interfaces, comprises:

sequentially determining each source data interface in the plurality of source data interfaces as a target source data interface;

obtaining a label of the target source data interface;

analyzing the target source data interface to obtain a target structure description of the target source data interface;

searching a target meta-tag matched with the tag and target meta-structure description information matched with the structure description in the knowledge rule map solution;

Determining the same knowledge element rule corresponding to the target element tag and the target element structure description information as a target knowledge element rule;

and determining the primary key of the target knowledge meta rule as the primary key of the target source data interface.

8. The method according to claim 1, wherein the method further comprises:

constructing a mapping model based on the mapping language;

obtaining a plurality of sample interfaces and generating a plurality of sample metadata modes according to the plurality of sample interfaces;

determining a sample column attribute to be added in the sample metadata schema, and an attribute value corresponding to the sample column attribute to be added;

collecting a sample example data set based on the sample column attributes;

training the mapping model according to the knowledge rule by using the sample example data set, the sample metadata mode, the sample column attribute and the attribute value of the sample column attribute;

and determining the mapping model which is trained for a plurality of times as the preset column extraction program.

9. The method of claim 8, wherein the method further comprises:

determining a plurality of sample row tuples in the sample metadata schema; wherein all attribute values contained in each of the plurality of sample row tuples correspond to the same primary key;

Inserting the attribute value of the sample column attribute to be added into the sample column attribute to be added to obtain a column attribute of which the sample has the attribute value;

obtaining a sample recommendation table with attribute values according to the sample metadata mode;

performing Cartesian product on the column attribute with the attribute value of the sample and the sample recommendation table with the attribute value to obtain a sample intermediate table;

obtaining a first atomic rule which constrains attribute values in the intermediate table based on a numerical rule according to the sample example data set;

obtaining a second atomic rule based on attribute values of the intermediate table constrained by non-leaf ancestor nodes according to the sample example data set;

combining the first atomic rule and the second atomic rule to obtain a predicate combination;

screening the sample intermediate table by utilizing the predicate combinations to obtain sample target data;

verifying the sample target data by using the plurality of sample row tuples, and adjusting the predicate combination according to a verification result;

and determining the predicate combination subjected to multiple adjustments as the preset row extraction program.

10. A system for on-demand storage of interface data, said system comprising: the system comprises a primary key discovery module, a recommendation table generation module and an intermediate table mapping module;

The main key discovery module is used for determining a plurality of source data interfaces according to the received storage operation;

the primary key discovery module is further configured to parse the plurality of source data interfaces by using a preset knowledge rule map to obtain a primary key of each source data interface in the plurality of source data interfaces;

the recommendation table generation module is used for generating at least one recommendation table mode according to the main key of each source data interface in the plurality of source data interfaces for the interface data of each source data interface in the plurality of source data interfaces;

the middle table mapping module is used for merging the recommended table modes with the same main key to obtain a plurality of merged recommended table modes;

the intermediate table mapping module is further used for determining a target recommended table mode in the plurality of recommended table modes according to the received determining operation;

the intermediate table mapping module is further used for generating a structured statement for performing atomic operation on the target recommended table mode according to the received modification operation; wherein the atomic operations include a delete column operation, an adjust column operation, and an add column operation;

the intermediate table mapping module is further used for modifying the target recommendation table mode by utilizing the structured statement to obtain a target data table;

The middle table mapping module is further used for acquiring attribute values of column attributes in the target recommendation table mode from the interface data by using a preset column extraction program; the middle table mapping module is further used for inserting the attribute value into the column attribute corresponding to the target recommended table mode to obtain the column attribute with data; the intermediate table mapping module is further used for carrying out Cartesian product on a plurality of column attributes with data in the target recommended table mode to obtain an intermediate table; the intermediate table mapping module is used for: modifying the intermediate table by using the structured statement to obtain a modified intermediate table; and screening the row tuples in the modified intermediate table by using a preset row extraction program to obtain the target data table.