CN112347104B - Column storage layout optimization method based on deep reinforcement learning - Google Patents

Column storage layout optimization method based on deep reinforcement learning Download PDF

Info

Publication number
CN112347104B
CN112347104B CN202011228158.6A CN202011228158A CN112347104B CN 112347104 B CN112347104 B CN 112347104B CN 202011228158 A CN202011228158 A CN 202011228158A CN 112347104 B CN112347104 B CN 112347104B
Authority
CN
China
Prior art keywords
columns
query
column
data
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011228158.6A
Other languages
Chinese (zh)
Other versions
CN112347104A (en
Inventor
覃雄派
陈跃国
杜小勇
赵丽萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN202011228158.6A priority Critical patent/CN112347104B/en
Publication of CN112347104A publication Critical patent/CN112347104A/en
Application granted granted Critical
Publication of CN112347104B publication Critical patent/CN112347104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a column storage layout optimization method based on deep reinforcement learning, which comprises the following steps: receiving a query load; analyzing the query load to generate query characteristics; acquiring feature data of a data column according to the query feature; determining the output sequence of the columns based on the strategy of the output sequence of the columns and the characteristic data of the data columns; performing quantitative evaluation on the output sequence, wherein the quantitative evaluation strategy is adjusted based on rewards of the system; and a strategy for adjusting the output sequence of the columns according to the quantitative evaluation result. According to the application, the used model parameters can be continuously adjusted in the expected direction of reducing the disk skip time, so that the neural network automatically learns the optimal column ordering according to the characteristic data of the columns, incremental training can be realized, and the column ordering is not required to be recalculated during each optimization, thereby greatly reducing the calculation cost.

Description

Column storage layout optimization method based on deep reinforcement learning
Technical Field
The application relates to the field of computers, in particular to a column storage layout optimization method based on deep reinforcement learning, which mainly performs layout optimization on column storage of big data so as to improve data reading performance.
Background
OLAP (Online Line Analytic Processing) analysis of relational data plays a vital role in many analysis and decision support class applications. In the big data age, many big data analysis systems, such as Hive, spark SQL, etc., take HDFS (Hadoop Distributed File System) as the storage of the bottom layer, and a large amount of data is continuously accumulated and stored on the HDFS, and the real-time performance requirement of data analysis is higher and higher. HDFS, as a de facto standard for distributed big data low cost data storage and processing, provides a fault tolerant, portable, scalable, high read and write throughput unified data store for big data analytics systems. Big data analysis systems on HDFS are typically used to support batch and interactive query analysis on massive data.
In these systems, the data table is typically in a column storage format such as RCFile, ORC, parquet, carbonData, with column stored data storage providing flexible and efficient data encoding and compression functions and being able to read only the necessary data columns, avoiding unnecessary I/O, but we have found that the query analysis performance of the data on the HDFS can be further improved by optimization of the storage layout. When a query accesses a data column in a horizontal slice in an HDFS data block, multiple disk jumps are required, and an optimal column order may provide minimal disk jump costs. Among these, the rank ordering problem has been demonstrated by academic papers as NP-Hard. How to design an efficient column ordering algorithm to find a near optimal column order for a given query load is a challenge. The existing heuristic search has strong randomness in optimization, is easy to fall into suboptimal, and meanwhile, the column ordering is required to be recalculated for each optimization, so that the calculation cost is high.
Disclosure of Invention
The present application has been made in view of the above problems, and it is an object of the present application to provide a solution to overcome or at least partially solve the above problems. Accordingly, in one aspect of the present application, there is provided a column storage layout optimization method based on deep reinforcement learning, the method comprising:
receiving a query load;
analyzing the query load to generate query characteristics;
acquiring feature data of a data column according to the query feature;
determining the output sequence of the columns based on the strategy of the output sequence of the columns and the characteristic data of the data columns;
performing quantitative evaluation on the output sequence, wherein the quantitative evaluation strategy is adjusted based on rewards of the system;
and a strategy for adjusting the output sequence of the columns according to the quantitative evaluation result.
Optionally, the output sequence is quantitatively evaluated according to the disc skip time.
Optionally, a strategy of implementing the output sequence of the deep reinforcement learning column by adopting an Actor-Critic algorithm, and adjusting the quantitative evaluation strategy based on rewards of the system, wherein the strategy comprises adjusting parameters in the Critic neural network according to rewards given by the system.
Alternatively, neural networks employing the Pointer Net can be used to make decisions about the order of output, including mapping from one sequence to another.
Optionally, determining the output order of the columns based on the policy of the output order of the columns and the characteristic data of the data columns includes:
obtaining the weight associated with each position of the input sequence by using an attention mechanism;
and combining the input sequence with the weight to calculate the element with the maximum relation between the current output and the input sequence, and taking the element of the input sequence as an output element.
Optionally, the method further comprises: the method for uniformly coding the input query load specifically comprises the following steps:
initializing each query in the input query load to a set;
determining a corresponding column access characteristic for each query;
and carrying out binary coding on the elements in the set corresponding to the query according to the column access characteristics.
The technical scheme provided by the application has at least the following technical effects or advantages: the application realizes a column storage layout optimization method based on deep reinforcement learning, and performs experimental comparison with the existing heuristic column ordering algorithm, further reduces the skip cost of the disk, can continuously adjust the used model parameters in the expected direction of reducing the skip time of the disk, enables the neural network to automatically learn the optimal column ordering according to the characteristic data of the column, can realize incremental training, directly inputs the latest query load into the model, and does not need to recalculate the column ordering during each optimization, thereby greatly reducing the calculation cost.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the technical solutions of the present application and the objects, features and advantages thereof more clearly understood, the following specific embodiments of the present application will be specifically described.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a flow chart of a column storage layout optimization method based on deep reinforcement learning according to the present application;
FIG. 2 shows a map of a skip cost model for a disk in a wide-table based column storage layout optimization scheme;
FIG. 3 illustrates an overall framework of the deep reinforcement learning based column storage layout optimization proposed by the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.
In big data analysis systems, I/O is often the main performance bottleneck, and the design and optimization of storage systems is critical to the improvement of big data analysis performance.In terms of data organization, column storage (e.g., ORC, part) provides flexible and efficient data encoding and compression functions and enables reading only the necessary data columns, thus avoiding unnecessary I/O. Under a column storage layout, how to adjust the physical data storage layout to suit the changing query load and system environment is a challenge. The application aims to design and realize a column storage layout optimization method DRL-COA based on deep reinforcement learning under a Hadoop environmentDeep Reinforcement Learning based Column Ordering Algorithm) is applied to adaptive column storage layout optimization, and compared with the existing heuristic column ordering algorithm, the method further reduces the read-out cost of the disk.
In the DRL-COA provided by the application, reinforcement learning is performed by using an Actor-Critic algorithm model, and a network structure of a Pointer Net is applied to an Actor neural network, wherein the network structure is mainly used for continuously outputting new actions (column sequences) according to initial input, and the Critic neural network is used for evaluating the actions according to 'benefits' after the actions, so that the new actions are continuously selected. In order to perform the action selection, the selection of the columns of each position is also important because of a column order, and here, the weight associated with each position of the input sequence by the element of a certain position is obtained through the attention mechanism, so that the selection is performed.
In one aspect of the present application, there is provided a column storage layout optimization method based on deep reinforcement learning, which solves a sequential decision problem of columns by using a deep reinforcement learning technique, optimizes a storage layout by training a model, and in particular, as shown in fig. 1, the method includes:
receiving a query load;
analyzing the query load to generate query characteristics;
acquiring feature data of a data column according to the query feature;
determining the output sequence of the columns based on the strategy of the output sequence of the columns and the characteristic data of the data columns;
performing quantitative evaluation on the output sequence, wherein the quantitative evaluation strategy is adjusted based on rewards of the system;
and a strategy for adjusting the output sequence of the columns according to the quantitative evaluation result.
As a preferred embodiment, the strategy of implementing the output sequence of the deep reinforcement learning column by adopting an Actor-Critic algorithm, and the quantitative evaluation strategy is adjusted based on the rewards of the system, wherein the quantitative evaluation strategy comprises the adjustment of parameters in the Critic neural network according to the rewards given by the system.
The specific process implemented by the Actor-Critic algorithm is described in detail below. The inputs to the DRL-COA network model in the scheme may be represented as a matrix 1*n]:c i Representing whether each data column exists in the query (wherein 1 represents the existence and 0 represents the nonexistence), n represents the number of queries, and the input is a mathematical representation of the query load Q. The output of the Actor-Critic model is the Order of the data columns and may be denoted as O (Order). Meanwhile, the output result of each iteration of the model is evaluated by using the disk skip time. The flow is mainly as follows:
(1) According to the current state, the Actor makes an action of column ordering output;
(2) Critic scores the behavior of the action just according to state and action;
(3) According to the scoring of critic, the Actor adjusts the current strategy (i.e. parameters in the Actor neural network) and executes the next action;
(4) Critic also adjusts current scoring strategies (i.e., parameters in the Critic neural network) according to the report given by the system;
(5) Initially, the Actor performs randomly and Critic scores randomly. However, because of the presence of reorder, critic scoring is more and more accurate and Actor also performs better.
In this method, the output order is preferably quantitatively evaluated based on the disk skip time. Network model random strategy p of the scheme θ (o|c) can be expressed as: when the input is c (column) and the output is o (order), the model evaluates SC (o-c) as the corresponding disc skip time. Grinding processThe targets of the model training are as follows: when the model evaluation SC (o|c) value is smaller, the output o will be selected with a greater probability. The output result of each iteration of the model is evaluated by the disk skip time during training, so that the design can ensure that model parameters are continuously adjusted in the expected direction of reducing the disk skip time during training.
FIG. 2 shows a map of the skip cost model of a disk in a wide-table based column storage layout optimization scheme. In a column storage layout optimization scheme based on a broad table, a skip time cost model based on a disk is designed aiming at the characteristics of data access on a traditional disk, skip operations with equal distances are executed for a plurality of times in a series of HDFS files, and the average skip time on the skip distance d is taken as the skip cost in the statistical sense corresponding to d. After the skip cost under different skip distances is obtained, a piecewise skip cost function is constructed by adopting linear fitting. Fig. 2 shows skip cost functions obtained with this method on three different types of disks.
In the patent, the method adopts an Actor-Critic algorithm to realize deep reinforcement learning, so that discrete values can be processed and single-step updating can be realized on processing game data, and compared with the prior art, the algorithm based on a value function can only process discrete values and update at each step of a game, and the algorithm based on a strategy can process discrete values and continuous values, but can not process until the end of each round of the game. But after the Actor-Critic deep learning algorithm is adopted in the application, the method can process discrete values and can update in a single step.
FIG. 3 is a general framework diagram of the deep reinforcement learning based column storage layout optimization proposed by the present application. Several components of the DRL-COA model are shown in the figure, from collection and analysis of query load to characteristic input and training of the model, an intelligent body (a deep learning body) learns characteristic data of a learning column through continuous interaction with the environment, and a Critic neural network evaluates the output result of each iteration of the Actor neural network model according to a skip estimator component, so that model parameters can be continuously adjusted in an expected direction of reducing magnetic disc skip time, and compared with stochastic optimization of heuristic search, the model parameters have more directionality and are not easy to fall into suboptimal conditions; meanwhile, as a deep reinforcement learning model, the column ordering is not required to be recalculated in each optimization, and the calculation cost is greatly reduced in an incremental training mode.
In this patent, determining the output order of the columns based on the policy of the output order of the columns, the characteristic data of the data columns may include:
obtaining the weight associated with each position of the input sequence by using an attention mechanism;
and combining the input sequence with the weight to calculate the element with the maximum relation between the current output and the input sequence, and taking the element of the input sequence as an output element.
Preferably, neural networks employing the Pointer Net are used to make decisions about the order of output, including mapping from one sequence to another.
The solution is to consider the problem of column ordering and analogize it to a related problem of combinatorial optimization, i.e. decisions between sequences need to be made to adjust the column order. The DRL-COA model adopts a neural network of the Pointer Net to solve a sequence decision problem in column ordering and solve a mapping problem from one sequence to another sequence. Meanwhile, when calculating the output sequence, the attention mechanism is utilized to obtain the weight of the element of a certain position of the output sequence and each position of the input sequence, and then the input sequence and the weight are combined in a certain mode to influence the output. Thus, the element with the greatest relation between the current output and the input sequence can be calculated, and the element of the input sequence is taken as the output element, and each output element points to the input element like a pointer. The design can control each input element to be pointed by only one output element, so that the repeated occurrence of the input elements is avoided.
In the present application, encoding of the input query load samples is performed prior to model training. This is because each query accesses columns may be only a partial column, while the output is a set of all columns, and the Pointer Net network structure can be achieved by input encodingIt is required that the content of the output sequence is identical to the content of the input sequence, except that the order of the sequences is changed. In FIG. 3, C 1 ,C 2 ,C 3 ,C 4 ,C 5 For the data columns input by the encoder<g>,C 4 ,C 5 ,C 1 ,C 2 For the data column output by the decoder.
Assuming that the number of queries in the load Q is N and the accessed data column set length is N, we initialize each query Q to bec i Set N' of=0.
Thus, the input query load is uniformly encoded, which may specifically include:
initializing each query in the input query load to a set;
determining a corresponding column access characteristic for each query; specifically, for each query Q (involving only m data columns) in load Q, its column access feature is C q ={c q,1 ,c q,2 ,...,c q,m }。
The elements in the set corresponding to the query are binary coded according to the column access feature, specifically, the subscript position of the data column {1,2,..m } corresponding to the query q in N' may be set to 1, and the other positions are still 0 (indicating that the query q did not access the column). Thus, the load is uniformly encoded into a {1,0,..1 } pattern.
According to the method, the input load sample is effectively encoded through the techniques of an Actor-Critic deep reinforcement learning algorithm, a Pointer Net neural network, a attention mechanism, a disk skip cost simulation and the like, the column sequence is used as output, the disk skip cost is used as an output result of each iteration of a model to evaluate, and therefore model parameters can be continuously adjusted in an expected direction of reduction of disk skip time. Under the implementation scheme, the neural network automatically learns the optimal column ordering according to the characteristic data of the columns, the incremental training of the DRL-COA model can be realized, the latest query load is directly input into the model, and the column ordering is not required to be recalculated during each optimization, so that the calculation cost is greatly reduced.
The technical scheme provided by the application has at least the following technical effects or advantages: the application realizes a column storage layout optimization method based on deep reinforcement learning, and performs experimental comparison with the existing heuristic column ordering algorithm, further reduces the skip cost of the disk, can continuously adjust the used model parameters in the expected direction of reducing the skip time of the disk, enables the neural network to automatically learn the optimal column ordering according to the characteristic data of the column, can realize incremental training, directly inputs the latest query load into the model, and does not need to recalculate the column ordering during each optimization, thereby greatly reducing the calculation cost.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the above description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim.

Claims (2)

1. A column storage layout optimization method based on deep reinforcement learning is characterized by comprising the following steps:
receiving a query load;
analyzing the query load to generate query characteristics, which specifically comprises the following steps: the query load is uniformly coded, specifically: initializing each query in the query load to a set; determining a corresponding column access characteristic for each query; performing binary coding on elements in the set corresponding to the query according to the column access characteristics;
acquiring feature data of a data column according to the query feature;
adopting an Actor-Critic algorithm to realize a strategy of the output sequence of the deep reinforcement learning column;
determining an output order of the columns based on a strategy of the output order of the columns, characteristic data of the data columns, wherein a neural network of the Pointer Net is adopted to make a decision of the output order, including mapping from one sequence to another sequence;
carrying out quantitative evaluation on the output sequence, wherein a quantitative evaluation strategy is adjusted based on rewards of the system, and the quantitative evaluation strategy based on the rewards adjustment of the system comprises the steps of adjusting parameters in a critic neural network according to rewards given by the system;
a strategy for adjusting the output sequence of the columns according to the quantitative evaluation result;
wherein determining the output order of the columns based on the policy of the output order of the columns, the characteristic data of the data columns, comprises:
obtaining the weight associated with each position of the input sequence by using an attention mechanism; and combining the input sequence with the weight to calculate the element with the maximum relation between the current output and the input sequence, and taking the element of the input sequence as an output element.
2. The column storage layout optimization method of claim 1, further characterized by quantitatively evaluating the output order based on disk skip time.
CN202011228158.6A 2020-11-06 2020-11-06 Column storage layout optimization method based on deep reinforcement learning Active CN112347104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011228158.6A CN112347104B (en) 2020-11-06 2020-11-06 Column storage layout optimization method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011228158.6A CN112347104B (en) 2020-11-06 2020-11-06 Column storage layout optimization method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112347104A CN112347104A (en) 2021-02-09
CN112347104B true CN112347104B (en) 2023-09-29

Family

ID=74429231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011228158.6A Active CN112347104B (en) 2020-11-06 2020-11-06 Column storage layout optimization method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112347104B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117332229B (en) * 2023-09-27 2024-05-10 天津大学 Fault diagnosis-oriented inter-satellite interaction information optimization method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609493A (en) * 2012-01-20 2012-07-25 东华大学 Connection sequence inquiry optimizing method based on column-storage model
CN103294831A (en) * 2013-06-27 2013-09-11 中国人民大学 Multidimensional-array-based grouping aggregation calculating method in column storage database
CN103324765A (en) * 2013-07-19 2013-09-25 西安电子科技大学 Multi-core synchronization data query optimization method based on column storage
CN106528737A (en) * 2016-10-27 2017-03-22 中企动力科技股份有限公司 Website navigation display method and system
CN108197275A (en) * 2018-01-08 2018-06-22 中国人民大学 A kind of distributed document row storage indexing means
CN108804473A (en) * 2017-05-04 2018-11-13 华为技术有限公司 The method, apparatus and Database Systems of data query
CN110084375A (en) * 2019-04-26 2019-08-02 东南大学 A kind of hierarchy division frame based on deeply study
CN110114783A (en) * 2016-11-04 2019-08-09 渊慧科技有限公司 Utilize the intensified learning of nonproductive task
CN110278149A (en) * 2019-06-20 2019-09-24 南京大学 Multi-path transmission control protocol data packet dispatching method based on deeply study
CN111612126A (en) * 2020-04-18 2020-09-01 华为技术有限公司 Method and device for reinforcement learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9325344B2 (en) * 2010-12-03 2016-04-26 International Business Machines Corporation Encoding data stored in a column-oriented manner
CN106250381B (en) * 2015-06-04 2020-11-17 微软技术许可有限责任公司 System and method for determining column layout of tabular storage
KR20230151047A (en) * 2017-05-23 2023-10-31 구글 엘엘씨 Attention-based sequence transduction neural networks
CN110032604B (en) * 2019-02-02 2021-12-07 北京奥星贝斯科技有限公司 Data storage device, translation device and database access method
US20200311585A1 (en) * 2019-03-31 2020-10-01 Palo Alto Networks Multi-model based account/product sequence recommender
CN111797860B (en) * 2019-04-09 2023-09-26 Oppo广东移动通信有限公司 Feature extraction method and device, storage medium and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609493A (en) * 2012-01-20 2012-07-25 东华大学 Connection sequence inquiry optimizing method based on column-storage model
CN103294831A (en) * 2013-06-27 2013-09-11 中国人民大学 Multidimensional-array-based grouping aggregation calculating method in column storage database
CN103324765A (en) * 2013-07-19 2013-09-25 西安电子科技大学 Multi-core synchronization data query optimization method based on column storage
CN106528737A (en) * 2016-10-27 2017-03-22 中企动力科技股份有限公司 Website navigation display method and system
CN110114783A (en) * 2016-11-04 2019-08-09 渊慧科技有限公司 Utilize the intensified learning of nonproductive task
CN108804473A (en) * 2017-05-04 2018-11-13 华为技术有限公司 The method, apparatus and Database Systems of data query
CN108197275A (en) * 2018-01-08 2018-06-22 中国人民大学 A kind of distributed document row storage indexing means
CN110084375A (en) * 2019-04-26 2019-08-02 东南大学 A kind of hierarchy division frame based on deeply study
CN110278149A (en) * 2019-06-20 2019-09-24 南京大学 Multi-path transmission control protocol data packet dispatching method based on deeply study
CN111612126A (en) * 2020-04-18 2020-09-01 华为技术有限公司 Method and device for reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HDFS存储和优化技术研究综述;金国栋等;软件学报;第31卷(第1期);137-161 *
Wide Table Layout Optimization based on Column Ordering and Duplication;Haoqiong Bian等;ACM International Conference on Management of Data;299-314 *

Also Published As

Publication number Publication date
CN112347104A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN111247537B (en) Method and system for effectively storing sparse neural network and sparse convolutional neural network
US9235651B2 (en) Data retrieval apparatus, data storage method and data retrieval method
US8650144B2 (en) Apparatus and methods for lossless compression of numerical attributes in rule based systems
CN110851620B (en) Knowledge representation method based on text embedding and structure embedding combination
US11544542B2 (en) Computing device and method
EP3738080A1 (en) Learning compressible features
CN111382307A (en) Video recommendation method, system and storage medium based on deep neural network
CN112347104B (en) Column storage layout optimization method based on deep reinforcement learning
CN113312505A (en) Cross-modal retrieval method and system based on discrete online hash learning
CN113487028A (en) Knowledge distillation method, knowledge distillation device, knowledge distillation terminal equipment and knowledge distillation medium
CN115511071A (en) Model training method and device and readable storage medium
CN113934851A (en) Data enhancement method and device for text classification and electronic equipment
CN115577144A (en) Cross-modal retrieval method based on online multi-hash code joint learning
JP6795721B1 (en) Learning systems, learning methods, and programs
CN112836794B (en) Method, device, equipment and storage medium for determining image neural architecture
CN106802787A (en) MapReduce optimization methods based on GPU sequences
CN115905546B (en) Graph convolution network literature identification device and method based on resistive random access memory
CN116467281A (en) Database management system parameter tuning and model training method, system and equipment
KR102597184B1 (en) Knowledge distillation method and system specialized for lightweight pruning-based deep neural networks
CN112840358B (en) Cursor-based adaptive quantization for deep neural networks
CN116737607B (en) Sample data caching method, system, computer device and storage medium
KR102466482B1 (en) System and method for accelerating deep neural network training using adaptive batch selection
WO2024078376A1 (en) Model pruning method and related apparatus
CN115033669A (en) New question mining method and terminal for FAQ question-answering system
Wei et al. Research on Deep Neural Network Model Compression Based on Quantification Pruning and Huffmann Encoding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant