CN113779002A - Data processing method and device - Google Patents
Data processing method and device Download PDFInfo
- Publication number
- CN113779002A CN113779002A CN202011248188.3A CN202011248188A CN113779002A CN 113779002 A CN113779002 A CN 113779002A CN 202011248188 A CN202011248188 A CN 202011248188A CN 113779002 A CN113779002 A CN 113779002A
- Authority
- CN
- China
- Prior art keywords
- tables
- association
- reachable
- script
- paths
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 238000013515 script Methods 0.000 claims abstract description 51
- 238000000034 method Methods 0.000 claims abstract description 32
- 230000002787 reinforcement Effects 0.000 claims abstract description 29
- 238000010276 construction Methods 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims description 21
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 8
- 238000004140 cleaning Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 5
- 239000003795 chemical substances by application Substances 0.000 description 21
- 230000009471 action Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 13
- 230000006870 function Effects 0.000 description 8
- 238000011161 development Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000586 desensitisation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data processing method and device, and relates to the technical field of computers. A specific implementation mode of the method comprises the steps of receiving a two-dimensional table construction request, obtaining full-scale scripts of a data warehouse, extracting tables in each script and association relations between the tables, and further obtaining all reachable association paths from an initial table to a target table; performing reinforcement learning on all reachable associated paths according to a preset selection model to obtain an optimal associated path; and according to the optimal association path, acquiring corresponding fields with association relations among different tables in the path, constructing a two-dimensional table, and outputting the two-dimensional table. Therefore, the method and the device can solve the problems of low efficiency and high cost of the existing data modeling.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus.
Background
No matter data modeling, data analysis, data development or deep data mining, the association among tables is inevitable in the structured data application process, and corresponding fields are obtained from the association of different tables for modeling (namely building a structured two-dimensional table model). At present, in the process of constructing a two-dimensional table, if a field is missing and needs to be obtained from other tables, the field is basically obtained based on the research of a service model so as to be associated. The data modeling refers to a process for establishing a two-dimensional structured model in a data warehouse, namely a process for constructing a two-dimensional table.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
at present, in the process of constructing a two-dimensional table, if a certain field is missing, the field needs to be obtained from other tables, and basically, the corresponding field is obtained through research and then correlation is performed. The aforementioned methods require a lot of manpower, time and effort, and the model data found is generally not highly reliable in quality. In addition, if the new staff is not familiar with the table of the data warehouse, the new staff needs to rely heavily on the modeling experience of the old staff for guidance, and a lot of time is wasted for the old staff. Meanwhile, whether online correlation development is performed or not cannot be known, so that a large amount of time is required for checking verification data, and the data development period is prolonged.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method and apparatus, which can solve the problems of low efficiency and high cost of the existing data modeling.
In order to achieve the above object, according to an aspect of the embodiments of the present invention, a data processing method is provided, including receiving a two-dimensional table construction request, and obtaining a full-scale script of a data warehouse to extract a table in each script and an association relationship between tables, so as to obtain all reachable association paths from an initial table to a target table; performing reinforcement learning on all reachable associated paths according to a preset selection model to obtain an optimal associated path; (ii) a And according to the optimal association path, acquiring corresponding fields with association relations among different tables in the path, and constructing a two-dimensional table so as to output the two-dimensional table.
Optionally, after acquiring the full-volume script of the data warehouse, the method includes:
and calling a preset cleaning component to clean the data of the full-scale script of the data warehouse.
Optionally, extracting the table in each script and the association relationship between the tables includes:
and extracting the table in each script and the association relation between the tables through the keywords according to the regular matching.
Optionally, before performing reinforcement learning on all reachable associated paths according to a preset selection model, the method includes:
acquiring the user use times and each execution time length between every two tables, and calculating the average execution time length;
and normalizing the user use times, summing the normalized user use times and the execution average duration to obtain a correlation coefficient between the tables, and further taking the correlation coefficient as a reward value of reinforcement learning.
Optionally, extracting the table in each script and the association relationship between the tables to obtain all reachable association paths from the start table to the target table, including:
and calculating associated paths for fields in all tables through a Cartesian product algorithm according to the tables in the script and the association relationship between the tables, so as to obtain all reachable associated paths from the starting table to the target table.
In addition, the invention also provides a data processing device, which comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for receiving the two-dimensional table construction request and acquiring the full amount of scripts of the data warehouse so as to extract the tables in each script and the association relationship between the tables, and further obtain all reachable association paths from the starting table to the target table; the processing module is used for carrying out reinforcement learning on all reachable associated paths according to a preset selection model so as to obtain an optimal associated path; and according to the optimal association path, acquiring corresponding fields with association relations among different tables in the path, constructing a two-dimensional table, and outputting the two-dimensional table. .
Optionally, before the processing module performs reinforcement learning on all reachable associated paths according to a preset selection model, the method includes:
acquiring the user use times and each execution time length between every two tables, and calculating the average execution time length;
and normalizing the user use times, summing the normalized user use times and the execution average duration to obtain a correlation coefficient between the tables, and further taking the correlation coefficient as a reward value of reinforcement learning.
Optionally, the obtaining module extracts each script table and the association relationship between the tables, and further obtains all reachable association paths from the start table to the target table, including:
and calculating associated paths for fields in all tables through a Cartesian product algorithm according to the tables in the script and the association relationship between the tables, so as to obtain all reachable associated paths from the starting table to the target table.
One embodiment of the above invention has the following advantages or benefits: the invention obtains the incidence relation between the tables based on the on-line full script, and the incidence relation between the tables including the required fields and the main table under development in other historical on-line task scripts can be found only by obtaining the tables including the fields through the metadata in the field required in the actual development, thereby being capable of fast correlating to the required fields and having high reliability. Meanwhile, the optimal association relationship between the tables is obtained by combining reinforcement learning.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of a data processing method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of an association path according to an embodiment of the invention;
fig. 3 is a schematic diagram of a main flow of a data processing method according to a second embodiment of the present invention;
FIG. 4 is a schematic diagram of the main modules of a data processing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of a data processing method according to a first embodiment of the present invention, as shown in fig. 1, the data processing method including:
step S101, receiving a two-dimensional table construction request, obtaining a full amount of scripts of a data warehouse to extract a table in each script and an association relation between the tables, and further obtaining all reachable association paths from an initial table to a target table.
In an embodiment, step S101 calculates the correlation between all the fields extracted by the script, and preferably calculates the associated path for each field in all tables by using the cartesian product algorithm, wherein the path includes the start table and the target table. That is, a network connectivity graph is constructed by using the table as a node and the association relationship as an edge, and an association path from one node to another node can be obtained from the connectivity graph.
In some embodiments, after the full-amount script of the data warehouse is obtained, a preset cleaning component can be called to perform data cleaning on the full-amount script of the data warehouse. For example: and acquiring a full sql script of the bins, and calling the desensitization component to clean data.
As another embodiment, extracting the table in each script and the association relationship between the tables specifically includes: and extracting the table in each script and the association relation between the tables through the keywords according to the regular matching. For example: and extracting the association relation between each script table and the tables according to the regular matching (by extracting keywords on, fdm, adm database names and the like, wherein on is a keyword used when the tables are associated with each other in the sql language, and the fdm and the adm are the database names in the database).
It should be noted that, obtaining pairwise association relationships among all tables, obtaining reachable association paths from the start table to the target table from the data, for example: as shown in fig. 2, there are two associative paths between the start table adm.adm _ s03 to the target table app.app _ cmo.
And step S102, performing reinforcement learning on all reachable associated paths according to a preset selection model to obtain the optimal associated path.
In some embodiments, before performing reinforcement learning on all reachable associated paths according to a preset selection model, the method includes:
and acquiring the user use times and each execution time length between every two association tables, and calculating the average execution time length. And normalizing the user use times, summing the normalized user use times and the execution average duration to obtain a correlation coefficient between the tables, and further taking the correlation coefficient as a reward value of reinforcement learning. For example: as shown in fig. 2, the statistical number of times of use of all user scripts between table adm.adm _ s03 and table adm.adm _ s14 is 6, normalized to 0.06, and the average execution duration of all users is 0.04, so that the final sum score is 0.1.
As another embodiment, before performing reinforcement learning on all reachable associated paths according to a preset selection model, the method includes:
the table in the association path is referred to as an Agent (Agent) that performs reinforcement learning using the selection model, and an action issuer that applies different actions in different environments in the reinforcement learning algorithm is referred to as an Agent), and the association relationship between the table and the table is referred to as an Environment (Environment) that performs reinforcement learning using the selection model. The action space (action) in the reinforcement learning model is executed from the start table as the agent, and the execution result is transmitted to the environment so that the environment returns the association coefficient corresponding to the agent and each of the other agents associated therewith. And then selecting the intelligent agent which is optimally associated with the intelligent agent from other associated intelligent agents according to the association coefficient, wherein the intelligent agent with the maximum association coefficient is the intelligent agent which is optimally associated with the intelligent agent. And circulating the previous processes until the target agent (namely the target table) is associated, and further obtaining the optimal association path from the starting table to the target table. Wherein, the action space is a process of reaching other agents by selecting a certain agent from n agents to execute the association operation.
In addition, in the reinforcement learning process of the present invention, the state (state) is the state of the node (i.e. table), and the paths of all tables to the target table are represented as a sequence, for example, (0,0,0,1,0,0 …), where the node is set to 1 by the current node and the other nodes are set to 0, and such a sequence is used as the state.
It should be noted that the motion space value is obtained by calculating a path with the lowest composite score as the best associated path through the loss function as follows:
loss=(r+y max Qtarge(s',a)-Qcurrent(s,a))2
wherein r is a reward value in reinforcement learning, and the Agent of the Agent obtains the reward value given by the environment after taking a certain action; y is a greedy threshold of a greedy strategy, and Qcurrent (s', a) is a Q estimation value, namely the currently estimated Q value is a value obtained after the current state and action are used as the input of the neural network; qtarget (s, a) is similarly a value obtained after the next state and action are input to the neural network. S is the state of the Agent in the exploration process; a is the action taken by the Agent in each state.
Fig. 3 is a schematic diagram of a main flow of a data processing method according to a second embodiment of the present invention, which may include:
and receiving a two-dimensional table construction request, acquiring a full amount of scripts of the data warehouse, namely triggering historical data acquisition, and extracting a historical task script. Thereby obtaining the table in each script and the association relationship between the tables, and further obtaining all reachable association paths from the starting table to the target table. And learning all reachable associated paths by using a preset selection model to obtain the optimal associated path. And according to the optimal association path, acquiring corresponding fields with association relations among different tables in the path, and constructing a two-dimensional table so as to output the two-dimensional table.
Preferably, the selection model needs to be trained before learning all reachable associated paths by using the preset selection model. The specific implementation process comprises the following steps:
two networks with identical structure but different parameters are initialized: current Net and Target Net. The former is called an estimation network and is used for training an input of a current state, and is also called a network obtained by real training in dqn (one of value based algorithms in a reinforcement learning algorithm, a value function approximation algorithm), and the latter is called a real network and is used for storing an execution parameter in the estimation network of each stage so as to play a role in cutting off correlation. The Replay buffer is called a memory bank or a Replay buffer, and the training data is randomly extracted from the memory bank, and the memory bank records action, reward (reward) and the result of the next state in each state.
As an embodiment, a playback buffer is initialized first, the environment is preprocessed, and the Current state is input into the network Current Net, and the Q values of all possible action actions in the state are returned. An action is selected with a greedy policy: for example, a random action a is selected, and when a certain set probability threshold is reached, the action with the highest Q value is selected. After selecting action A, the agent performs the selected action in the state and proceeds to get a new state, receiving the reward R. Transition samples including action in state, reward (reward), and result of next state are stored in a playback buffer. Then, random conversion samples of the batch are extracted from the playback buffer, and loss is calculated. Gradient descent is performed for the real network Target Net parameters to minimize losses. After every few steps, copy Target Net weights into Target network weights. And repeating the steps for a plurality of rounds to obtain the trained selection model.
Fig. 4 is a schematic diagram of main modules of a data processing apparatus according to an embodiment of the present invention, and as shown in fig. 4, the data processing apparatus 400 includes an acquisition module 401 and a processing module 402. The obtaining module 401 receives the two-dimensional table construction request, obtains the full-scale scripts of the data warehouse, extracts the tables in each script and the association relationship between the tables, and further obtains all reachable association paths from the start table to the target table; the processing module 402 performs reinforcement learning on all reachable association paths according to a preset selection model to obtain an optimal association path; (ii) a And according to the optimal association path, acquiring corresponding fields with association relations among different tables in the path, constructing a two-dimensional table, and outputting the two-dimensional table.
In some embodiments, after the obtaining module 401 obtains the full-size script of the data warehouse, the method includes:
and calling a preset cleaning component to clean the data of the full-scale script of the data warehouse.
In some embodiments, the obtaining module 401 extracts each script table and the association relationship between the tables, including:
and extracting the table in each script and the association relation between the tables through the keywords according to the regular matching.
In some embodiments, before the processing module 402 performs reinforcement learning on all reachable associated paths according to the preset selection model, the method includes:
acquiring the user use times and each execution time length between every two tables, and calculating the average execution time length; and normalizing the user use times, summing the normalized user use times and the execution average duration to obtain a correlation coefficient between the tables, and further taking the correlation coefficient as a reward value of reinforcement learning.
In some embodiments, the obtaining module extracts each script table and the association relationship between tables to obtain all reachable association paths from the start table to the target table, including:
and calculating associated paths for fields in all tables through a Cartesian product algorithm according to the tables in the script and the association relationship between the tables, so as to obtain all reachable associated paths from the starting table to the target table.
It should be noted that the data processing method and the data processing apparatus according to the present invention have corresponding relation in the specific implementation contents, and therefore, the repeated contents are not described again.
Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the computer system 500 are also stored. The CPU501, ROM502, and RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output section 507 including a display such as a Cathode Ray Tube (CRT), a liquid crystal data processor (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module and a processing module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs, and when the one or more programs are executed by the device, the device comprises a processor, a database, a first storage unit, a second storage unit, a first processing unit and a second processing unit, wherein the processor receives a two-dimensional table construction request, acquires full scripts of a data warehouse to extract tables and association relations between the tables in each script, and further obtains all reachable association paths from a starting table to a target table; performing reinforcement learning on all reachable associated paths according to a preset selection model to obtain an optimal associated path; and according to the optimal association path, acquiring corresponding fields with association relations among different tables in the path, constructing a two-dimensional table, and outputting the two-dimensional table.
According to the technical scheme of the embodiment of the invention, the problems of low efficiency and high cost of the existing data modeling can be solved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A data processing method, comprising:
receiving a two-dimensional table construction request, acquiring full-scale scripts of a data warehouse to extract a table in each script and an association relation between the tables, and further acquiring all reachable association paths from an initial table to a target table;
performing reinforcement learning on all reachable associated paths according to a preset selection model to obtain an optimal associated path; (ii) a
And according to the optimal association path, acquiring corresponding fields with association relations among different tables in the path, and constructing a two-dimensional table so as to output the two-dimensional table.
2. The method of claim 1, wherein obtaining the full-size script of the data warehouse comprises:
and calling a preset cleaning component to clean the data of the full-scale script of the data warehouse.
3. The method of claim 1, wherein extracting tables and associations between tables in each script comprises:
and extracting the table in each script and the association relation between the tables through the keywords according to the regular matching.
4. The method of claim 1, wherein before performing reinforcement learning on all reachable association paths according to a preset selection model, the method comprises:
acquiring the user use times and each execution time length between every two tables, and calculating the average execution time length;
and normalizing the user use times, summing the normalized user use times and the execution average duration to obtain a correlation coefficient between the tables, and further taking the correlation coefficient as a reward value of reinforcement learning.
5. The method of any one of claims 1 to 4, wherein extracting the table in each script and the association relationship between the tables to obtain all reachable association paths from the start table to the target table comprises:
and calculating associated paths for fields in all tables through a Cartesian product algorithm according to the tables in the script and the association relationship between the tables, so as to obtain all reachable associated paths from the starting table to the target table.
6. A data processing apparatus, comprising:
the acquisition module is used for receiving a two-dimensional table construction request, acquiring the full amount of scripts of the data warehouse to extract the tables in each script and the association relationship between the tables, and further acquiring all reachable association paths from the starting table to the target table;
the processing module is used for carrying out reinforcement learning on all reachable associated paths according to a preset selection model so as to obtain an optimal associated path; and according to the optimal association path, acquiring corresponding fields with association relations among different tables in the path, constructing a two-dimensional table, and outputting the two-dimensional table.
7. The apparatus of claim 6, wherein before the processing module performs reinforcement learning on all reachable association paths according to a preset selection model, the processing module comprises:
acquiring the user use times and each execution time length between every two tables, and calculating the average execution time length;
and normalizing the user use times, summing the normalized user use times and the execution average duration to obtain a correlation coefficient between the tables, and further taking the correlation coefficient as a reward value of reinforcement learning.
8. The apparatus of claim 6 or 7, wherein the obtaining module extracts each script table and the association relationship between tables to obtain all reachable association paths from the start table to the target table, including:
and calculating associated paths for fields in all tables through a Cartesian product algorithm according to the tables in the script and the association relationship between the tables, so as to obtain all reachable associated paths from the starting table to the target table.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011248188.3A CN113779002A (en) | 2020-11-10 | 2020-11-10 | Data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011248188.3A CN113779002A (en) | 2020-11-10 | 2020-11-10 | Data processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113779002A true CN113779002A (en) | 2021-12-10 |
Family
ID=78835297
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011248188.3A Pending CN113779002A (en) | 2020-11-10 | 2020-11-10 | Data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113779002A (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002342381A (en) * | 2001-05-21 | 2002-11-29 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for searching shortest route, recording medium and program |
US20120095957A1 (en) * | 2010-10-18 | 2012-04-19 | Tata Consultancy Services Limited | Component Based Approach to Building Data Integration Tools |
CN104899209A (en) * | 2014-03-05 | 2015-09-09 | 阿里巴巴集团控股有限公司 | Optimization method and device for open type data processing service |
CN108762281A (en) * | 2018-06-08 | 2018-11-06 | 哈尔滨工程大学 | It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory |
US20180349256A1 (en) * | 2017-06-01 | 2018-12-06 | Royal Bank Of Canada | System and method for test generation |
CN109002289A (en) * | 2017-06-07 | 2018-12-14 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus constructing data model |
KR102091529B1 (en) * | 2019-09-03 | 2020-03-23 | (주)빅인사이트 | Method and apparatus for training AI model using user's time series behavior data |
JP2020092490A (en) * | 2018-12-03 | 2020-06-11 | 富士通株式会社 | Reinforcement learning program, reinforcement learning method, and reinforcement learning device |
CN111506613A (en) * | 2020-04-22 | 2020-08-07 | 支付宝(杭州)信息技术有限公司 | Method, system, device and equipment for querying incidence relation of data record |
CN111552792A (en) * | 2020-04-30 | 2020-08-18 | 中国建设银行股份有限公司 | Information query method and device, electronic equipment and storage medium |
CN111597243A (en) * | 2020-05-15 | 2020-08-28 | 中国工商银行股份有限公司 | Data warehouse-based abstract data loading method and system |
-
2020
- 2020-11-10 CN CN202011248188.3A patent/CN113779002A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002342381A (en) * | 2001-05-21 | 2002-11-29 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for searching shortest route, recording medium and program |
US20120095957A1 (en) * | 2010-10-18 | 2012-04-19 | Tata Consultancy Services Limited | Component Based Approach to Building Data Integration Tools |
CN104899209A (en) * | 2014-03-05 | 2015-09-09 | 阿里巴巴集团控股有限公司 | Optimization method and device for open type data processing service |
US20180349256A1 (en) * | 2017-06-01 | 2018-12-06 | Royal Bank Of Canada | System and method for test generation |
CN109002289A (en) * | 2017-06-07 | 2018-12-14 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus constructing data model |
CN108762281A (en) * | 2018-06-08 | 2018-11-06 | 哈尔滨工程大学 | It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory |
JP2020092490A (en) * | 2018-12-03 | 2020-06-11 | 富士通株式会社 | Reinforcement learning program, reinforcement learning method, and reinforcement learning device |
KR102091529B1 (en) * | 2019-09-03 | 2020-03-23 | (주)빅인사이트 | Method and apparatus for training AI model using user's time series behavior data |
CN111506613A (en) * | 2020-04-22 | 2020-08-07 | 支付宝(杭州)信息技术有限公司 | Method, system, device and equipment for querying incidence relation of data record |
CN111552792A (en) * | 2020-04-30 | 2020-08-18 | 中国建设银行股份有限公司 | Information query method and device, electronic equipment and storage medium |
CN111597243A (en) * | 2020-05-15 | 2020-08-28 | 中国工商银行股份有限公司 | Data warehouse-based abstract data loading method and system |
Non-Patent Citations (2)
Title |
---|
YU-HSIN HSU ET AL: "Reinforcement Learning-Based Collision Avoidance and Optimal Trajectory Planning in UAV Communication Networks", 《IEEE TRANSACTIONS ON MOBILE COMPUTING》, vol. 21, no. 1, 19 June 2020 (2020-06-19), XP011891509, DOI: 10.1109/TMC.2020.3003639 * |
吴宏杰;杨茹;傅启明;陈建平;陆卫忠;: "基于强化学习的HP模型优化方法研究", 计算机工程与应用, no. 12, 15 March 2019 (2019-03-15) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110807515B (en) | Model generation method and device | |
US20190251963A1 (en) | Voice awakening method and device | |
CN108520470B (en) | Method and apparatus for generating user attribute information | |
CN110766142A (en) | Model generation method and device | |
CN111523640B (en) | Training method and device for neural network model | |
CN112487173B (en) | Man-machine conversation method, device and storage medium | |
CN110309275A (en) | A kind of method and apparatus that dialogue generates | |
US20220245465A1 (en) | Picture searching method and apparatus, electronic device and computer readable storage medium | |
CN110162518B (en) | Data grouping method, device, electronic equipment and storage medium | |
CN113656179A (en) | Scheduling method and device of cloud computing resources, electronic equipment and storage medium | |
CN113409898B (en) | Molecular structure acquisition method and device, electronic equipment and storage medium | |
CN114205690A (en) | Flow prediction method, flow prediction device, model training method, model training device, electronic equipment and storage medium | |
CN111368973A (en) | Method and apparatus for training a hyper-network | |
CN111160847A (en) | Method and device for processing flow information | |
CN114119123A (en) | Information pushing method and device | |
CN115481227A (en) | Man-machine interaction dialogue method, device and equipment | |
CN116684330A (en) | Traffic prediction method, device, equipment and storage medium based on artificial intelligence | |
CN111957053A (en) | Game player matching method and device, storage medium and electronic equipment | |
CN114997329A (en) | Method, apparatus, device, medium and product for generating a model | |
CN114896291A (en) | Training method and sequencing method of multi-agent model | |
CN111145063A (en) | Business system guiding method and device | |
CN112783508B (en) | File compiling method, device, equipment and storage medium | |
CN117971487A (en) | High-performance operator generation method, device, equipment and storage medium | |
CN113361574A (en) | Training method and device of data processing model, electronic equipment and storage medium | |
CN110348581B (en) | User feature optimizing method, device, medium and electronic equipment in user feature group |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |