CN103020151A - Batch processing large amounts of data systems and large data batch method - Google Patents

Batch processing large amounts of data systems and large data batch method Download PDF

Info

Publication number
CN103020151A
CN103020151A CN2012104800632A CN201210480063A CN103020151A CN 103020151 A CN103020151 A CN 103020151A CN 2012104800632 A CN2012104800632 A CN 2012104800632A CN 201210480063 A CN201210480063 A CN 201210480063A CN 103020151 A CN103020151 A CN 103020151A
Authority
CN
China
Prior art keywords
data
primary
cache
set
key
Prior art date
Application number
CN2012104800632A
Other languages
Chinese (zh)
Other versions
CN103020151B (en
Inventor
张�成
Original Assignee
用友软件股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 用友软件股份有限公司 filed Critical 用友软件股份有限公司
Priority to CN201210480063.2A priority Critical patent/CN103020151B/en
Publication of CN103020151A publication Critical patent/CN103020151A/en
Application granted granted Critical
Publication of CN103020151B publication Critical patent/CN103020151B/en

Links

Abstract

The invention provides a large data quantity batch processing system which comprises a middleware unit, a first-order cache device and a second-order cache device, wherein the middleware unit is used for sending an information request to the first-order cache device, receiving a second-order paging primary key set from the second-order cache device, and sending a persistence data request to a database after querying to-be-processed data to the database according to the second-order paging primary key set and calculating the to-be-processed data; the first-order cache device is used for querying the primary key set according with the information request to the database, generating a first-order paging primary key set according to the primary key set and returning the first-order paging primary key set to the second-order cache device; and the second-order cache device is used for generating the second-order paging primary key set according to the first-order paging primary key set, and returning the second-order paging primary key set to the middleware unit. The invention further provides a large data quantity batch processing method. According to the technical scheme, the processing speed of mass data of the system can be increased greatly, the processing time of the system is shortened, and the combination property of the system is improved.

Description

大数据量批处理系统和大数据量批处理方法 Batch processing large amounts of data systems and large data batch method

技术领域 FIELD

[0001] 本发明涉及计算机技术领域,具体而言,涉及一种大数据量批处理系统和一种大数据量批处理方法。 [0001] The present invention relates to computer technologies, and particularly, to a batch processing large amounts of data system and a method for batch processing a large amount of data.

背景技术 Background technique

[0002]目前大型的联机事务处理系统(OLTP)中,衡量其系统性能好坏的指标,往往是一些关键核心算法在大数据量应用场景下的处理速度,而处理速度的快慢直接影响整个系统的性能。 [0002] At present large-scale online transaction processing systems (OLTP), a measure of its performance indicators of good or bad, tend to be some of the key core algorithm processing speed in a large amount of data scenarios, and the speed of the processing speed of a direct impact on the entire system performance.

[0003] 一个大型的信息化系统,往往都有一些自己比较复杂业务处理逻辑、业务处理算法,当这些复杂的业务处理在小数据量应用场景下效率问题往往都被忽视,因为这种场景下系统响应速度是比较快的,而在大数据量情况下可能就会出现系统处理性能的瓶颈,长时间无响应或者直接宕机等严重情况,那么其中比较共性和核心的问题就是:第一,如果数据量过大,程序一次性读到内存中可能造成系统内存溢出;第二,如果不是一次性读取数据到内存中,循环读取一条一条数据在进行处理,则算法由批处理变成了循环单个处理,也必定大大影响系统的性能。 [0003] a large-scale information systems, often have some of their more complex business logic, business processing algorithms, when these complex business process efficiency in a small amount of data scenarios issue often overlooked because such a scenario the system response speed is relatively fast, while system processing performance bottlenecks may occur in the case of a large volume of data, long time no serious cases or direct response downtime, etc., then one of the more common and the core of the problem is this: first, If the data is too large, one-time program into memory may cause the system memory overflow; second, if not one-time data read into memory, a read cycle a data processing is performed, the algorithm becomes the batch a single cycle of treatment, but also will greatly affect the performance of the system. 对此,现有技术使用后台分页技术来解决这样的问题。 In this regard, the prior art using a background paging technology to solve this problem.

[0004] 现有的分页技术都是在数据库端实现分页技术,一种是直接利用SQL语句来进行分页,例如第一次取第1-50条记录,第二次取第51-100条记录等等依次类推,这种方式虽然达到了每次读取有限的记录加载到内存中,但是数据库端的压力依然很大,因为每次SQL语句的查询都是对结果集全纪录的扫描,处理速度并没有优化;另外一种是通过代码来实现分页,例如JAVA中利用ResultSet结果集进行循环遍历来实现,第一次遍历第1_50条记录并取出。 [0004] The prior art paging paging techniques are implemented at the database side, a direct use of SQL statements to be paged, take for example the first 1-50 record, the second record taken of 51-100 and so on and so on, in this way, although the limited reach of each read record is loaded into memory, but the pressure on the database side is still great, because each query SQL statements set Quan Jilu are the result of scanning, processing speed and not optimized; another is achieved by a paging code, such as JAVA utilized ResultSet loop through the result set is achieved, the first pass of 1_50 records and remove. 第二次遍历第1-100条记录,但只取出第51-100条记录,这种方式依然存在每次预先查询所有记录的缺点;其次还有一种通过预先查出将满足条件结果集的主键PK,然后存入临时表并编上序号,之后通过序号一批批读出PK集合,在利用PK集合到数据库中查询出数据,这种方式虽然解决了前面的问题,但是由于要一批批从数据库临时表中读取数据,在高并发的情况下,数据库端的压力还是非常大,并且会有多次中间件单元到数据库的连接、查询、数据网络传输,在窄带环境中,效率依然存在一些瓶颈,另外没有合理的利用中间件单元资源。 Second pass 1-100 records, but only 51-100 take the first record, this way is still disadvantageous each query all previously recorded; secondly there is a previously isolated by satisfying the condition of the primary key of the result set PK, and then stored in a temporary table compiled on the number, batch after batch number is read out by a set of PK, PK in the use of the collection to the database query the data, this approach solves the previous problem, but due to the batch read from the database the data in the temporary table, in the case of concurrent high pressure side of the database is very large, and there will be many times middleware unit is connected to the database, query, data transmission networks, in a narrowband environment, the efficiency remains Some bottlenecks, no additional middleware unit rational use of resources. 最后上述三种方案都没有提出加载数据到内存中后,如何用一种通用的方式进一步优化数据处理的速度,都只是考虑解决整个算法中数据加载的瓶颈,而往往大数据量批处理算法往往有查询加载和数据处理持久化两个过程,并且分页处理如何自动适配多数据库,这些都是问题。 Finally, the three programs are not made to load data into memory, the speed of how to further optimize data processing in a generic way, just consider the whole algorithm to solve the bottleneck in data loading, and often large data batch algorithms tend to inquiries and data processing load persistence two processes, and how to handle pagination automatically adapts to multiple databases, these are the problems.

[0005] 所以,如何解决大数据量加载过程中中间件单元资源和数据库资源的合理使用,如何使分页底层自适应多种数据库,如何提出一整套解决方法和系统,防止中间件单元内存溢出、减轻数据库端处理压力、降低中间件单元和数据库之间网络传输数据量,这是亟待解决的技术问题。 [0005] Therefore, how to solve the large data load process middleware unit rational use of resources and database resources, how to make a variety of adaptive tab underlying database, how to make a set of solutions and systems, middleware unit to prevent memory overflow, reduce the processing pressure of the database side, reducing the amount of data transmitted between the network and the database middleware unit, which is a technical problem to be solved.

发明内容[0006] 本发明正是基于上述问题,提出了一种大数据量批处理技术,能够防止中间件单元内存溢出、减轻数据库端的处理压力。 SUMMARY OF THE INVENTION [0006] The present invention is based on the above-described problems, proposed a batch processing large amounts of data, it is possible to prevent memory overflow middleware unit, to reduce the processing pressure of the database side.

[0007] 根据本发明的一个方面,本发明提供了一种大数据量批处理系统,包括:中间件单元、一级缓存装置和二级缓存装置,其中,所述中间件单元用于向所述一级缓存装置发送查询请求,以及接收来自所述二级缓存装置的二级分页主键集合,根据所述二级分页主键集合向数据库查询待处理数据并在对所述待处理数据进行计算处理后,向所述数据库发送持久化数据请求;所述一级缓存装置用于向所述数据库查询符合所述查询请求的主键集合,以及根据所述主键集合生成一级分页主键集合并将所述一级分页主键集合返回至所述二级缓存装置;所述二级缓存装置用于根据所述一级分页主键集合生成二级分页主键集合并将所述二级分页主键集合返回至所述中间件单元。 [0007] In accordance with one aspect of the present invention, the present invention provides a batch processing large amounts of data system, comprising: a middleware unit, a buffer means and secondary buffer means, wherein said intermediate means are used to said cache query request sending means, and receiving from the paging set of primary keys of the two secondary cache means, according to the two primary tab set key to query the database and data to be processed in the calculation process of the data to be processed after, to the persistent database sends a data request; means for querying said cache line with a primary key set of the query request to the database, and generates a set of tabs and said set of master keys based on the primary key a tab set key to return to the main secondary cache means; said secondary buffer means for generating a set of two tabs and the two primary key set key based on said one primary tab tab set back to the primary key intermediate member unit.

[0008] 通过上述技术方案,在中间件读取数据的过程中加入两级缓存结构,大大优化数据读取,解决了中间件内存溢出的技术问题。 [0008] Through the above technical solution, was added during the intermediate two read data buffer structure, greatly improving the data read, solves the technical problem intermediate memory overflow.

[0009] 在上述技术方案中,优选的,还可以包括:第一设置单元,设置所述一级缓存装置的一级缓存阈值;所述一级缓存装置还用于在所述主键集合的数据量小于等于所述一级缓存阈值时,直接将所述一级分页主键集合返回至所述二级缓存装置,以及在所述主键集合的数据量大于所述一级缓存阈值时,建立并插入临时表,对所述临时表进行分页并将获取的主键返回至所述二级缓存装置。 [0009] In the above aspect, preferably, may further include: a first setting unit that sets a threshold value of the cache buffer means; said apparatus further configured to cache data in the master key set of when the amount is smaller than the threshold level cache, a page directly to the primary key set means returns to said secondary cache, and the data in the master key set of the cache is larger than the threshold value, created and inserted into temporary table, the primary key and the acquired temporary paging table is returned to the secondary cache means.

[0010] 如果只有一级缓存结构来解决中间件内存溢出的问题,则必须对每页主键数据量做更细粒度的控制,当采用了两级缓存结构之后,由于一级缓存返回的只是主键,每一个主键只是一个固定长度的字符串,占用内存较少,所以可大大提高一级缓存结构每页的主键数据总量。 [0010] If only one cache structure to solve the problem of middleware memory overflow, you must do more fine-grained control over the amount of data per primary key, when using two levels of cache structure, the cache returns only the primary key each primary key only a fixed length string, takes up less memory, it is possible to greatly increase the total amount of the primary key of a cache page structure.

[0011] 在上述技术方案中,优选的,还可以包括:第二设置单元,设置所述二级缓存装置的二级缓存阈值;所述二级缓存装置还用于在所述一级分页主键的数据量小于等于所述二级缓存阈值时,直接将所述二级分页主键集合返回至所述中间件单元,以及在所述主键集合的数据量大于所述二级缓存阈值时,将所述二级分页主键集合暂存于内存,从所述内存中取出每一页主键数据,根据所述每一页主键数据查询所述待处理数据。 [0011] In the above aspect, preferably, may further comprises: a second setting unit that sets a threshold value of the two secondary cache buffer means; said apparatus further secondary cache for the one primary tab key when the data is less than or equal to the threshold value of the secondary cache, the two tab directly back to the primary keyset middleware unit, and the data in the master key set is greater than the threshold level two cache, the said two tabs primary keyset temporarily stored in memory, each page of the primary key data taken from the memory, the query data to be processed in accordance with the data of each page of the primary key.

[0012] 基于中间件实际处理数据的占用量设置二级缓存装置的二级缓存阈值,合理设置各级缓存结构的存储阈值能够最大限度的提升系统的处理效率。 [0012] Based on the actual occupancy of intermediate processing data cache means provided two secondary cache threshold value, a reasonable set storage buffer structure threshold levels to maximize the efficiency of the lift system.

[0013] 上述技术方案中,优选的,所述中间件单元包括:事务建立子单元,用于建立独立事务;加锁子单元,用于对所述待处理数据添加中间件单元级别主键锁,对所述待处理数据进行处理,在处理结束后,对所述中间件单元级别锁进行解锁。 [0013] In the above aspect, preferably, the intermediate unit comprising: establishing a transaction sub-unit for establishing individual transactions; lock subunit, middleware unit for adding the level of the primary keying data to be processed, the data to be processed is processed, after the processing, the middleware unit level lock to unlock.

[0014] 每一页数据采用独立事务处理,也就是说每页数据处理完毕后事务立即提交,而不是仅在整个算法最外层起一个事务,不会对数据库中所有数据进行长时间加锁锁定,从而提升数据库整体并发处理能力,降低了数据库端的压力。 [0014] each page of data using an independent transaction, that transaction immediately after submitting each page of data processed, and not only in the outermost layer of the algorithm from a transaction, not all the data in the database locked for a long time lock, so as to enhance the overall database concurrency processing power, reducing the pressure on the database side.

[0015] 在上述任一技术方案中,优选的,还可以包括:自识别装置,使所述一级缓存装置自适应多类型数据库。 [0015] In either aspect, preferably, may further include: self-identification means, said buffer means a plurality of adaptive type database.

[0016] 根据本发明的另一方面,还提供了一种大数据量批处理方法,包括以下步骤:步骤402,中间件单元向一级缓存装置发送查询请求,数据库返回符合所述查询请求的主键集合至所述一级缓存装置;步骤404,所述一级缓存装置根据所述主键集合生成一级分页主键集合并将所述一级分页主键集合返回至二级缓存装置;步骤406,所述二级缓存装置根据所述一级分页主键集合生成二级分页主键集合并将所述二级分页主键集合返回至所述中间件单元;步骤408,所述中间件单元根据所述二级分页主键集合向所述数据库查询待处理数据并再对所述待处理数据进行计算处理后,向所述数据库发送持久化数据请求。 [0016] According to another aspect of the present invention, there is provided a method of batch processing large amounts of data, comprising the following steps: Step 402, the middleware unit sends a query to a buffer device, conforming to the database query request to return a set of keys to the primary buffer means; step 404, the cache set means generates a tab according to the primary key of the primary key and a tab set key set is returned to the primary secondary cache means; step 406, the said secondary cache means based on said one primary tab set key generation master key two tab and said secondary set of primary tab key set back to the middleware unit; step 408, the middleware unit according to the two tab after collection of the primary key to query the database data to be processed and then the processed data calculation processing to be sent to the database persistent data request.

[0017] 通过上述技术方案,在中间件读取数据的过程中加入两级缓存结构,大大优化数据读取,解决了中间件内存溢出的技术问题。 [0017] Through the above technical solution, was added during the intermediate two read data buffer structure, greatly improving the data read, solves the technical problem intermediate memory overflow.

[0018] 在上述技术方案中,优选的,所述步骤404具体包括:设置所述一级缓存装置的一级缓存阈值;在所述主键集合的数据量小于等于所述一级缓存阈值时,直接将所述一级分页主键集合返回至所述二级缓存装置;在所述主键集合的数据量大于所述一级缓存阈值时,建立并插入临时表,对所述临时表进行分页并将获取的主键返回至所述二级缓存装置。 [0018] In the above aspect, preferably, the step 404 specifically comprises: setting a threshold level cache buffer means; amount of data in the master key set is less than equal to the threshold level cache, directly to the primary key a tab set back to the secondary cache means; key set in said main data is greater than said threshold level cache, establishing a temporary table and inserts, the temporary table and tab acquired primary key to return to the secondary cache means.

[0019] 如果只有一级缓存结构来解决中间件内存溢出的问题,则必须对每页主键数据量做更细粒度的控制,当采用了两级缓存结构之后,由于一级缓存返回的只是主键,每一个主键只是一个固定长度的字符串,占用内存较少,所以可大大提高一级缓存结构每页的主键数据总量。 [0019] If only one cache structure to solve the problem of middleware memory overflow, you must do more fine-grained control over the amount of data per primary key, when using two levels of cache structure, the cache returns only the primary key each primary key only a fixed length string, takes up less memory, it is possible to greatly increase the total amount of the primary key of a cache page structure.

[0020] 在上述技术方案中,优选的,所述步骤406具体包括:设置所述二级缓存装置的二级缓存阈值;在所述一级分页主键的数据量小于等于所述二级缓存阈值时,直接将所述二级分页主键集合返回至所述中间件单元;在所述主键集合的数据量大于所述二级缓存阈值时,将所述二级分页主键集合暂存于内存,从所述内存中取出每一页主键数据,根据所述每一页主键数据查询所述待处理数据。 [0020] In the above aspect, preferably, the step 406 specifically comprises: setting a threshold value of the two secondary cache buffer means; the amount of data in a page is less than or equal to the primary key of the secondary cache threshold value when, the two tab directly back to the primary keyset middleware unit; key set in said main data is greater than a threshold value when the secondary cache, the two primary keyset tab temporarily stored in memory, from each page of the memory extracted primary key data, querying the data to be processed in accordance with the master key data for each page.

[0021] 基于中间件实际处理数据的占用量设置二级缓存装置的二级缓存阈值,合理设置各级缓存结构的存储阈值能够最大限度的提升系统的处理效率。 [0021] Based on the actual occupancy of intermediate processing data cache means provided two secondary cache threshold value, a reasonable set storage buffer structure threshold levels to maximize the efficiency of the lift system.

[0022] 在上述技术方案中,优选的,所述步骤408具体包括:在所述中间件单元建立独立事务,对所述待处理数据添加中间件单元级别主键锁,对所述待处理数据进行处理,在处理结束后,对所述中间件单元级别锁进行解锁。 [0022] In the above aspect, preferably, the step 408 specifically comprises: establishing individual transactions in the middleware unit, the middleware unit adding the level of the primary keying data to be processed, the data to be processed process, after the processing, the middleware unit level lock to unlock.

[0023] 每一页数据采用独立事务处理,也就是说每页数据处理完毕后事务立即提交,而不是仅在整个算法最外层起一个事务,不会对数据库中所有数据进行长时间加锁锁定,从而提升数据库整体并发处理能力,降低了数据库端的压力。 [0023] each page of data using an independent transaction, that transaction immediately after submitting each page of data processed, and not only in the outermost layer of the algorithm from a transaction, not all the data in the database locked for a long time lock, so as to enhance the overall database concurrency processing power, reducing the pressure on the database side.

[0024] 在上述任一技术方案中,优选的,所述步骤404还可以包括,在所述一级缓存装置处,采用自识别装置自适应多类型数据库。 [0024] In either aspect, preferably, the step 404 may further comprise, in said buffer means at one, self adaptive multi-type recognition means database.

[0025] 因此,根据本发明的大数据量批处理方法能够大大提高系统对大数据量业务的处理速度,最大程度上平衡使用中间件和数据库资源,在降低各自负载的境况下,又充分利用各自资源,以达到系统性能的最大提升 [0025] Thus, according to the large amount of data of the batch process of the present invention can greatly improve the system performance on large data service, and the middleware database resource balancing maximum extent, reduction in the respective load situation, and make full use of their resources in order to achieve maximum system performance improved

附图说明 BRIEF DESCRIPTION

[0026] 图1示出了相关技术中大数据量批处理的原理图; [0026] FIG. 1 shows a schematic diagram of the related art batch large amounts of data;

[0027] 图2示出了根据本发明的实施例的大数据量批处理系统的框图; [0027] FIG. 2 shows a block diagram of a large amount of data of an embodiment of the present invention according to a batch system;

[0028] 图3示出了根据本发明的实施例的大数据量批处理的原理图; [0028] FIG. FIG. 3 shows a large amount of data in accordance with principles of the present invention, an embodiment of the batch;

[0029] 图4示出了根据本发明的实施例的大数据量批处理方法的流程图; [0029] FIG. 4 shows a flowchart of a large amount of data of the embodiment of the present invention is a batch method;

[0030] 图5示出了根据本发明的实施例的大数据量批处理方法的流程图。 [0030] FIG. 5 shows a flowchart of a large amount of data of the embodiment of the present invention is a batch method. 具体实施方式 Detailed ways

[0031] 为了能够更清楚地理解本发明的上述目的、特征和优点,下面结合附图和具体实施方式对本发明进行进一步的详细描述。 [0031] In order to more clearly understand the present invention the above object, features and advantages, the following specific embodiments and accompanying drawings further detailed description of the invention.

[0032] 在下面的描述中阐述了很多具体细节以便于充分理解本发明,但是,本发明还可以采用其他不同于在此描述的其他方式来实施,因此,本发明并不限于下面公开的具体实施例的限制。 [0032] numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention, however, the present invention may also be in other ways other than as described herein employed to implement, therefore, the present invention is not limited to the specific disclosed below limited to the examples.

[0033] 在说明根据本发明的大数据量批处理系统之前,先简单介绍现有的大数据匹配处理过程。 [0033] Before explaining the batch processing large amounts of data system of the invention, briefly explain the conventional large data matching process.

[0034] 如图1所示,一般的大数据量批处理业务场景,所有的处理逻辑和算法都大致分为如下几个过程:中间件向数据库发起查询加载数据的请求,中间件获取数据并在内存计算处理,处理结束后最后向数据库发起持久化数据的请求,数据库完成持久化操作,这样的处理过程很容易造成中间件内存溢出。 [0034] As shown, batch processing large amounts of data in general business scenario 1, all of the processing logic and the algorithms are roughly divided into the following process: initiation request middleware to load data into a database query, the data acquisition and middleware memory calculation processing, after processing the last database request to initiate persistent data, the persistent database to complete the operation, such a process is likely to cause intermediate memory overflow. 为了解决该技术问题,公开了根据本发明的大数据量批处理系统。 In order to solve this technical problem, there is disclosed a batch processing system according to the present invention, large amounts of data.

[0035] 图2示出了根据本发明的实施例的大数据量批处理系统的框图。 [0035] FIG. 2 shows a block diagram of a batch system, a large amount of data in accordance with an embodiment of the present invention.

[0036] 如图2所示,根据本发明实施例的大数据量批处理系统200包括:中间件单元202、一级缓存装置204和二级缓存装置206,其中,所述中间件单元202用于向所述一级缓存装置204发送查询请求,以及接收来自所述二级缓存装置206的二级分页主键集合,根据所述二级分页主键集合向数据库查询待处理数据并在对所述待处理数据进行计算处理后,向所述数据库发送持久化数据请求;所述一级缓存装置204用于向所述数据库查询符合所述查询请求的主键集合,以及根据所述主键集合生成一级分页主键集合并将所述一级分页主键集合返回至所述二级缓存装置206 ;所述二级缓存装置206用于根据所述一级分页主键集合生成二级分页主键集合并将所述二级分页主键集合返回至所述中间件单元202。 [0036] As shown, the large amount of data of the batch processing system according to embodiment 2 of the present invention 200 comprises: a middleware unit 202, a cache 204 and secondary cache 206, wherein the middleware unit 202 with in a buffer 204 sends the device query request, and receiving from the two secondary cache 206 the primary paging device key set, data to be processed to a database query based on the two primary tab and the key set to be after processing the data calculation processing to the database of persistent data transmission request; said buffer means 204 is used to query a primary key set conforming to the query request to the database, and generates a set according to the primary key tab set the primary key and a primary key set tab back to the secondary cache unit 206; the secondary cache means 206 for generating a set of two keys according to said one primary tab tab and said secondary primary keyset primary tab set key 202 is returned to the intermediate unit.

[0037] 通过上述技术方案,在中间件读取数据的过程中加入两级缓存结构,大大优化数据读取,解决了中间件内存溢出的技术问题。 [0037] Through the above technical solution, was added during the intermediate two read data buffer structure, greatly improving the data read, solves the technical problem intermediate memory overflow.

[0038] 上述技术方案中,优选的,还可以包括:第一设置单元208,设置所述一级缓存装置204的一级缓存阈值;所述一级缓存装置204还用于在所述主键集合的数据量小于等于所述一级缓存阈值时,直接将所述一级分页主键集合返回至所述二级缓存装置206,以及在所述主键集合的数据量大于所述一级缓存阈值时,建立并插入临时表,对所述临时表进行分页并将获取的主键返回至所述二级缓存装置206。 [0038] The above technical solution, preferably, may further include: a first setting unit 208, a cache is provided a threshold level cache device 204; the primary cache device 204 is further configured to set the primary key when the amount of data less than or equal to the threshold level cache, a page directly to the primary keyset returns to the secondary cache unit 206, and the primary key in the data set is greater than said threshold level cache, created and inserted into a temporary table, the temporary table is the primary key and the acquired page back to the secondary cache unit 206.

[0039] 如果只有一级缓存结构来解决中间件内存溢出的问题,则必须对每页主键数据量做更细粒度的控制,当采用了两级缓存结构之后,由于一级缓存返回的只是主键,每一个主键只是一个固定长度的字符串,占用内存较少,所以可大大提高一级缓存结构每页的主键数据总量。 [0039] If only one cache structure to solve the problem of middleware memory overflow, you must do more fine-grained control over the amount of data per primary key, when using two levels of cache structure, the cache returns only the primary key each primary key only a fixed length string, takes up less memory, it is possible to greatly increase the total amount of the primary key of a cache page structure.

[0040] 优选的,大数据量批处理系统200还可以包括:第二设置单元210,设置所述二级缓存装置206的二级缓存阈值;所述二级缓存装置206还用于在所述一级分页主键的数据量小于等于所述二级缓存阈值时,直接将所述二级分页主键集合返回至所述中间件单元202,以及在所述主键集合的数据量大于所述二级缓存阈值时,将所述二级分页主键集合暂存于内存,从所述内存中取出每一页主键数据,根据所述每一页主键数据查询所述待处理数据。 [0040] Preferably, the large amount of data batch processing system 200 may further comprises: a second setting unit 210, the threshold value is provided two secondary cache 206 cache means; said secondary cache means 206 for the further when paging a data amount less than or equal to the primary key of the secondary cache threshold value, set the two directly tab key to return to the main middleware unit 202, and the data in the master key set is greater than said secondary cache when the threshold value, the two primary keyset tab temporarily stored in memory, each page of the primary key data taken from the memory, the query data to be processed in accordance with the data of each page of the primary key.

[0041] 基于中间件实际处理数据的占用量设置二级缓存装置206的二级缓存阈值,合理设置各级缓存结构的存储阈值能够最大限度的提升系统的处理效率。 [0041] Based on the occupied amount of the actual processing data provided intermediate secondary cache unit 206 of the secondary cache threshold value, a reasonable set storage buffer structure threshold levels to maximize the efficiency of the lift system.

[0042] 上述技术方案中,优选的,所述中间件单元202包括:事务建立子单元2022,用于建立独立事务;加锁子单元2024,用于对所述待处理数据添加中间件单元202级别主键锁,对所述待处理数据进行处理,在处理结束后,对所述中间件单元202级别锁进行解锁。 [0042] The above-described aspect, preferably, the middleware unit 202 includes: establishing a transaction subunit 2022, for establishing individual transactions; lock sub-unit 2024, add the middleware unit 202 for data to be processed keylock primary level, the data to be processed is processed, after the processing, the middleware unit 202 to unlock the lock level.

[0043] 每一页数据采用独立事务处理,也就是说每页数据处理完毕后事务立即提交,而不是仅在整个算法最外层起一个事务,不会对数据库中所有数据进行长时间加锁锁定,从而提升数据库整体并发处理能力,降低了数据库端的压力。 [0043] each page of data using an independent transaction, that transaction immediately after submitting each page of data processed, and not only in the outermost layer of the algorithm from a transaction, not all the data in the database locked for a long time lock, so as to enhance the overall database concurrency processing power, reducing the pressure on the database side.

[0044] 优选的,大数据量批处理系统200还可以包括:自识别装置212,使所述一级缓存装置204自适应多类型数据库。 [0044] Preferably, the large amount of data batch processing system 200 may also include: self-identifying means 212, the means 204 a cache adaptive multi-type database.

[0045] 综合上述,整个大数据量批处理系统可分为如下几个模块相互传递数据协调工作:一级缓存装置解决中间件内存瓶颈;数据库自识别装置自动适配多类型数据库;二级缓存装置降低数据库负载压力的同时合理化使用中间件资源;独立事务处理装置进一步提升数据库和中间件多并发处理能力,从而整体提升系统效率。 [0045] In summary, the entire large amount of data can be divided into the following batch system, each transmitted data coordination module: a cache memory means to solve the bottleneck middleware; database identifying means adapted automatically from many types of databases; secondary cache It means reducing the load pressure while the database middleware rationalize the use of resources; transaction processing means further enhance independent databases and middleware multiple concurrent processing, so as to enhance the overall system efficiency.

[0046] 下面结合图3详细说明根据本发明的大数据量批处理系统的处理原理。 [0046] FIG. 3 in detail below in connection with the principle of processing large amounts of data of the batch processing system according to the present invention is described. 图3示出了根据本发明的实施例的大数据量批处理的原理图。 FIG. FIG. 3 shows a large amount of data in accordance with principles of the embodiments of the present invention batch.

[0047] 如图3所示,在中间件(对应图2中的中间件单元)读取数据的过程中加入两级缓存结构,优化数据读取;在中间件发起持久化处理过程中采用独立事务,优化减轻数据库端压力。 [0047] As illustrated, the process of reading data in the middleware (in FIG. 2, corresponding to intermediate units) was added two buffer structure, to optimize the data read 3; middleware initiating persistence process independent transactions, optimize the database side to reduce the pressure. 从图中可以看到整个过程如下: From the figure we can see the whole process as follows:

[0048]1.向一级缓存装置发起查询请求。 [0048] 1 initiates a query request to a cache apparatus.

[0049] 2. 一级缓存装置向数据库查询满足查询条件的主键,数据库返回所有主键集合到一级缓存结构。 [0049] 2. a cache device to satisfy a database query to the primary key of the query, the database returns a set of all the primary key to a buffer structure.

[0050] 3. 二级缓存装置向一级缓存装置请求获取一级分页主键集合。 [0050] 3. The apparatus requests the secondary cache to cache means acquires a primary tab key set.

[0051] 4. 二级缓存装置接收由一级缓存装置返回的二级分页主键集合。 [0051] two tab receiving primary key returned by means of a set of cache apparatus 4. The secondary cache.

[0052] 5.中间件向二级缓存装置请求获取二级分页主键结合。 [0052] The request to the secondary cache middleware apparatus acquires two primary tab bonding.

[0053] 6. 二级缓存装置向中间件返回二级分页主键集合。 [0053] 6. The device returns the secondary cache tab two primary keyset to the broker.

[0054] 7.中间件根据二级分页主键向数据库结合查询待处理数据。 [0054] 7. The intermediate data to be processed in conjunction with a query to the database according to the two primary tab key.

[0055] 8.采用独立事务处理计算数据。 [0055] 8. The process of calculating the data using an independent transaction.

[0056] 9.持久化二级分页数据。 [0056] 9. The two tab persistent data.

[0057] 其中,第2步到第4步为第一级分页循环处理过程;第5步到第9步为第二级分页循环处理过程,同时内部采用独立事务,减轻数据库端处理压力。 [0057] wherein steps 2 through 4 as a first stage process paging cycle; Step 5 to Step 9. The second stage of the process cycle paging, and the internal independent transactions, database reduce the processing pressure side. 系统中一级缓存装置内部利用数据库临时表来实现,同时内部包含自动适应底层数据库的自识别装置212,适用于各类数据库。 Internal cache means temporary database tables using a system implemented while the internal self-identification comprising means for automatically adapts to the underlying database 212, for all types of database. 二级缓存结构采用在中间件级别建立内存级缓存,暂存数据主键信息。 Structure was established by the secondary cache memory level intermediate level cache, temporary data primary key information.

[0058] 如果只采用一级缓存装置防止中间件内存溢出,必须对每页主键数据量做更细粒度的控制,例如必须要处理总量1000万的数据量,为保证内存不溢出,必须做到每页最多5000条记录,对应的主键也是5000条,那么就需要2000页,也就是2000次分页查询。 [0058] If only one cache memory means for preventing overflow middleware, must do more fine-grained control over the amount of data per primary key, for example, must deal with the total amount of data of 10 million, in order to ensure the memory does not overflow, must be done Up to 5000 records per page, corresponding to the primary key is 5000, then we need to 2000, which is 2000 times paging query.

[0059] 而现在如果采用两级缓存结构,同样二级缓存每页最多5000条数据,对应的主键也是5000条,但由于一级缓存装置返回的只是主键,每一个主键只是一个固定长度的字符串,占用内存较少,所以它的每页主键数据可以达到例如4万条记录,由于二级缓存分页取数完全是在内存中完成,不会有远程查询,那么一级缓存带来的查询开销只有:1000万/4万=250,也就是说总共只有250次分页查询。 [0059] Now if two cache configuration, up to 5000 data page, the corresponding primary key 5000 is the same two cache, but the cache device returns only the main key, each key is only a primary fixed length character string occupies less memory, so it can reach per primary key data records, for example, 40,000, because the number of the secondary cache fetch paging is done entirely in memory, there will be no remote query, then the query cache to bring only cost: 10 million / 40,000 = 250, meaning that a total of only 250 paged query. 相比只有一级缓存装置的2000次分页查询,采用二级缓存结构的方法对数据库端的压力会更小,同时还降低了中间件与数据库之间的网络传输流量,从而进一步的提升整个系统大数据量批处理的能力。 Compared to only one caching apparatus 2000 times paging query using the secondary cache configuration method will be less pressure on the database side, while reducing the network traffic between the intermediate transmission and database to further enhance overall system Large the amount of data the ability to batch.

[0060] 接下来结合图4和图5来详细说明根据本发明的大数据量批处理方法。 [0060] Next, in conjunction with FIGS. 4 and 5 described in detail in accordance with the batch method of the present invention, large amounts of data.

[0061] 图4示出了根据本发明的实施例的大数据量批处理方法的流程图。 [0061] FIG. 4 shows a flowchart of a large amount of data of the embodiment of the present invention is a batch method.

[0062] 如图4所示,根据本发明的实施例的大数据量批处理方法,包括以下步骤:步骤402,中间件单元向一级缓存装置发送查询请求,数据库返回符合查询请求的主键集合至一级缓存装置;步骤404,一级缓存装置根据主键集合生成一级分页主键集合并将一级分页主键集合返回至二级缓存装置;步骤406,二级缓存装置根据一级分页主键集合生成二级分页主键集合并将二级分页主键集合返回至中间件单元;步骤408,中间件单元根据二级分页主键集合向数据库查询待处理数据并再对待处理数据进行计算处理后,向数据库发送持久化数据请求。 [0062] As shown in FIG. 4, according to the large amount of data of the batch method of the present embodiment of the invention, comprising the following steps: Step 402, the middleware unit sends a query to a buffer means, a database query request back to the main line with a set of keys to a buffer means; step 404, a cache apparatus generates a key set based on the master key set and the one primary tab tab primary keyset returns to the secondary cache means; step 406, the secondary cache set generation apparatus according to a primary key tab two primary tab set key set and returned to the middleware unit two primary tab key; after step 408, the middleware unit set to the database query data to be processed according to two primary tab key and data to be processed for re-calculation processing, sent to the persistent database data request.

[0063] 通过上述技术方案,在中间件读取数据的过程中加入两级缓存结构,大大优化数据读取,解决了中间件内存溢出的技术问题。 [0063] Through the above technical solution, was added during the intermediate two read data buffer structure, greatly improving the data read, solves the technical problem intermediate memory overflow.

[0064] 上述技术方案中,优选的,所述步骤404具体包括:设置所述一级缓存装置的一级缓存阈值;在所述主键集合的数据量小于等于所述一级缓存阈值时,直接将所述一级分页主键集合返回至所述二级缓存装置;在所述主键集合的数据量大于所述一级缓存阈值时,建立并插入临时表,对所述临时表进行分页并将获取的主键返回至所述二级缓存装置。 [0064] The above-described aspect, preferably, the step 404 specifically comprises: setting a threshold level cache buffer means; amount of data in the primary key set is less than equal to the threshold level cache directly the primary key of a tab set back to the secondary cache means; key set in said main data is greater than said threshold level cache, create and insert the temporary table, the acquired temporary table and tab the primary key is returned to the secondary cache means.

[0065] 如果只有一级缓存结构来解决中间件内存溢出的问题,则必须对每页主键数据量做更细粒度的控制,当采用了两级缓存结构之后,由于一级缓存返回的只是主键,每一个主键只是一个固定长度的字符串,占用内存较少,所以可大大提高一级缓存结构每页的主键数据总量。 [0065] If only one cache structure to solve the problem of middleware memory overflow, you must do more fine-grained control over the amount of data per primary key, when using two levels of cache structure, the cache returns only the primary key each primary key only a fixed length string, takes up less memory, it is possible to greatly increase the total amount of the primary key of a cache page structure.

[0066] 上述技术方案中,优选的,所述步骤406具体包括:设置所述二级缓存装置的二级缓存阈值;在所述一级分页主键的数据量小于等于所述二级缓存阈值时,直接将所述二级分页主键集合返回至所述中间件单元;在所述主键集合的数据量大于所述二级缓存阈值时,将所述二级分页主键集合暂存于内存,从所述内存中取出每一页主键数据,根据所述每一页主键数据查询所述待处理数据。 [0066] The above-described aspect, preferably, the step 406 specifically comprises: setting a threshold value of the two secondary cache buffer means; the amount of data in a page is less than or equal to the primary key of the secondary cache threshold directly to said secondary page back to the primary keyset middleware unit; key set in said main data is greater than a threshold value when the secondary cache, the two primary keyset tab temporarily stored in the memory, from the remove each page of said memory primary key data, querying the data to be processed in accordance with the data of each page of the primary key.

[0067] 基于中间件实际处理数据的占用量设置二级缓存装置的二级缓存阈值,合理设置各级缓存结构的存储阈值能够最大限度的提升系统的处理效率。 [0067] The actual processing based on the occupied amount of intermediate data buffer means provided two secondary cache threshold value, a reasonable set storage buffer structure threshold levels to maximize the efficiency of the lift system.

[0068] 上述技术方案中,优选的,所述步骤408具体包括:在所述中间件单元建立独立事务,对所述待处理数据添加中间件单元级别主键锁,对所述待处理数据进行处理,在处理结束后,对所述中间件单元级别锁进行解锁。 [0068] The above-described aspect, preferably, the step 408 specifically comprises: establishing individual transactions in the middleware unit, add unit middleware level of the primary key of the lock of data to be processed, the processing data to be processed , after the processing, the middleware unit level lock to unlock.

[0069] 每一页数据采用独立事务处理,也就是说每页数据处理完毕后事务立即提交,而不是仅在整个算法最外层起一个事务,不会对数据库中所有数据进行长时间加锁锁定,从而提升数据库整体并发处理能力,降低了数据库端的压力。 [0069] each page of data using an independent transaction, that transaction immediately after submitting each page of data processed, and not only in the outermost layer of the algorithm from a transaction, not all the data in the database locked for a long time lock, so as to enhance the overall database concurrency processing power, reducing the pressure on the database side.

[0070] 上述任一技术方案中,优选的,所述步骤404还可以包括,在所述一级缓存装置处,采用自识别装置自适应多类型数据库。 [0070] In any of the above aspect, preferably, the step 404 may further comprise, in said buffer means at one, self adaptive multi-type recognition means database. [0071] 如图5所示,大数据量批处理过程可大致分为3个过程:1) 一级缓存结构处理查询请求;2) 二级缓存结构处理查询请求;3)使用独立事务提交最终数据 [0071] As shown in FIG. 5, a batch process large amount of data can be roughly divided into three processes: 1) a buffer structure query request process; 2) handling queries secondary cache structure; 3) separate the final transaction commits data

[0072] I) 一级缓存结构处理查询请求。 [0072] I) a buffer structure query request process.

[0073]1. 一级缓存结构接收查询请求后,首先执行SQL语句获得结果集。 After [0073] 1. a buffer structure query request receiving first SQL statement is executed to obtain a result set.

[0074] 2.遍历结果集,如果结果集的数据量总大小未超过一级缓存阀值,直接返回结果集。 [0074] 2. traverse the result set, if the amount of the total data size of the result set does not exceed the threshold level cache directly return a result set.

[0075] 3.如果结果集的数据量总大小超过一级缓存阀值,临时表缓存装置处理SQL语句。 [0075] 3. If the total amount of data of the result set size exceeds the threshold level cache, temporary table cache means for processing SQL statements.

[0076] 4.临时表缓存装置根据底层数据源类型,自动创建各种数据库类型插入临时表SQL语句。 [0076] 4. The device according to the temporary table cache underlying data source type, various databases automatically creates a temporary table type insertion SQL statement.

[0077] 临时表的字段如下:编号(自增长型)、主键、传入临时表缓存SQL中的字段。 Field [0077] temporary table is as follows: Number (Self-growth), the primary key, passing temporary tables in SQL cache field. 其中,编号是为了后续分页使用,并且每种数据库自增长型字段实现技术有差异,所以在此会根据数据库类型自动区别处理,最终形成插叙临时表的SQL语句,类似dnsert intotemp (select rownumFrom..·)。 Wherein, subsequent to the paging number is used, and each field in the database self-growth technology are different, so the database based on this type of automatic discriminating process, eventually interleaved temporary table SQL statement, similar dnsert intotemp (select rownumFrom .. ·).

[0078] 5. 一级缓存结构从内部临时表中分页取出待处理数据,分页的原理就是利用临时表中编号字段,由于编号字段是自增长型(例如,12,3,4…),所以分页取数的SQL类似select pk from temp where no>=land no〈=50......。 [0078] The structure of a cache page table extracted from the interior of the temporary data to be processed, the principle is to use the temporary paging table number field, a number field since self-growth (e.g., 12,3,4 ...), so taking the number of tabs similar to SQL select pk from temp where no> = land no <= 50 .......

[0079] 6.最后将分页取出的主键数据集合传给二级缓存结构(即二级缓存装置)。 [0079] 6. Finally, the paging data set extracted primary key passed to the secondary cache structure (i.e., secondary cache apparatus).

[0080] 2) 二级缓存结构处理查询请求。 [0080] 2) Secondary structure of the process queries cache.

[0081]1. 二级缓存结构接收一级缓存结构返回的主键数据。 [0081] 1. Primary structure of the receiving secondary cache a cache data structure returned.

[0082] 2.如果结果集的数据量总大小未超过二级缓存阀值,则直接返回主键结果集。 [0082] 2. If the amount of the total data size of the result set does not exceed the threshold level two cache, the process directly returns to the main key set result.

[0083] 3.如果结果集的数据量总大小超过二级缓存阀值,则将主键数据暂存在内存中。 [0083] 3. If the total amount of data of the result set size exceeds a threshold level two cache, then the primary key data is temporarily stored in memory.

[0084] 4. 二次分页从内存级缓存中取出每一页主键数据。 [0084] 4. Remove the secondary tab of each page from the master key data level cache memory.

[0085] 5.根据每一页主键数据去数据库中查询出待处理数据。 [0085] The primary key for each page of data to the database query the data to be processed.

[0086] 3 )使用独立事务提交数据。 [0086] 3) Use a separate transaction commits data.

[0087]1.在中间件层创建独立事务。 [0087] 1. Create a separate transaction middleware layer.

[0088] 2.对待处理数据加中间件级别主键锁。 [0088] 2. The data to be processed plus intermediate level of the primary key lock.

[0089] 3.对数据进行计算、最后持久化数据。 [0089] 3. calculate the data, and finally persistent data.

[0090] 4.释然锁。 [0090] 4. The lock relieved.

[0091] 4)各缓存结构阀值设置。 [0091] 4) Each cache structure threshold settings.

[0092]1. 一级缓存结构默认分页数据阀值20000。 [0092] a. A data buffer structure default paging 20,000 threshold.

[0093] 2. 二级缓存结构默认分页数据阀值5000。 [0093] 2. The structure of the secondary cache 5000 thresholds default paging data.

[0094] 3. 一级缓存结构和二级缓存结构的阀值都可以根据中间件的硬件条件动态设置。 [0094] 3. Threshold level cache structure and secondary structure of the cache can be dynamically set according middleware hardware conditions.

[0095] 4. 一级缓存结构阀值的设置主要考虑主键字符内存占用量。 [0095] 4. a buffer structure provided mainly consider the threshold primary key character memory footprint.

[0096] 5. 二级缓存结构阀值的设置主要考虑中间件中实际处理数据的占用量。 [0096] The structure of the secondary cache occupancy threshold set main consideration the actual amount of intermediate processing data. 合理设置各级缓存结构的阀值能够最大的提升系统的处理效率。 Set a reasonable threshold cache structure at all levels of processing efficiency can be the biggest upgrade of the system.

[0097] 因此,根据本发明的大数据量批处理方法能够大大提高系统对大数据量业务的处理速度,最大程度上平衡使用中间件和数据库资源,在降低各自负载的境况下,又充分利用各自资源,以达到系统性能的最大提升。 [0097] Thus, according to the large amount of data of the batch process of the present invention can greatly improve the system performance on large data service, and the middleware database resource balancing maximum extent, reduction in the respective load situation, and make full use of their resources in order to achieve the maximum increase in system performance. 综上,该方法这使得信息系统能够更好的适应更多、条件更苛刻的网络环境、大数据量环境,能够使客户在更大的业务数据场景下运行系统。 In summary, the method makes it possible to better adapt to the information system more, more demanding conditions of the network environment, the environment large amount of data, systems allow clients to run at higher data traffic scenarios.

[0098] 以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。 [0098] The foregoing is only preferred embodiments of the present invention, it is not intended to limit the invention to those skilled in the art, the present invention may have various changes and variations. 凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 Any modification within the spirit and principle of the present invention, made, equivalent substitutions, improvements, etc., should be included within the scope of the present invention.

Claims (10)

1. 一种大数据量批处理系统,其特征在于,包括:中间件单元、一级缓存装置和二级缓存装置,其中, 所述中间件单元用于向所述一级缓存装置发送查询请求,以及接收来自所述二级缓存装置的二级分页主键集合,根据所述二级分页主键集合向数据库查询待处理数据并在对所述待处理数据进行计算处理后,向所述数据库发送持久化数据请求; 所述一级缓存装置用于向所述数据库查询符合所述查询请求的主键集合,以及根据所述主键集合生成一级分页主键集合并将所述一级分页主键集合返回至所述二级缓存装置; 所述二级缓存装置用于根据所述一级分页主键集合生成二级分页主键集合并将所述二级分页主键集合返回至所述中间件单元。 A large amount of data batch processing system, comprising: a middleware unit, a buffer means and secondary buffer means, wherein the means for transmitting to the intermediate-level cache device query request and receiving two primary tab key from said secondary buffer means set, set according to the two primary tab key to query the database data to be processed and to be processed after the data calculation processing to transmit the persistent database data request; means for said cache to said primary database query key set conforming to the query request, and generates a set of tabs and said one primary tab key set based on the primary key to return to the primary key set said secondary cache means; said secondary buffer means for generating a set of two keys according to said one primary tab tab primary and said secondary tab key set primary keyset returns to the intermediate unit.
2.根据权利要求1所述的大数据量批处理系统,其特征在于,还包括:第一设置单元,设置所述一级缓存装置的一级缓存阈值; 所述一级缓存装置还用于在所述主键集合的数据量小于等于所述一级缓存阈值时,直接将所述一级分页主键集合返回至所述二级缓存装置,以及在所述主键集合的数据量大于所述一级缓存阈值时,建立并插入临时表,对所述临时表进行分页并将获取的主键返回至所述二级缓存装置。 The batch processing large amounts of data system according to claim 1, characterized in that, further comprising: a first setting unit that sets a threshold value of the cache buffer means; said cache means to further when the amount of data in the main key is less than or equal to the set threshold level cache, a page directly to the primary key set means returns to said secondary cache, and the data in the primary key is larger than the one set when the buffer threshold, created and inserted into the temporary table, the primary key and the acquired temporary paging table is returned to the secondary cache means.
3.根据权利要求1所述的大数据量批处理系统,其特征在于,还包括: 第二设置单元,设置所述二级缓存装置的二级缓存阈值; 所述二级缓存装置还用于在所述一级分页主键的数据量小于等于所述二级缓存阈值时,直接将所述二级分页主键集合返回至所述中间件单元,以及在所述主键集合的数据量大于所述二级缓存阈值时,将所述二级分页主键集合暂存于内存,从所述内存中取出每一页主键数据,根据所述每一页主键数据查询所述待处理数据。 The batch processing large amounts of data system according to claim 1, characterized in that, further comprising: a second setting unit that sets a threshold value of the two secondary cache buffer means; means the secondary cache is further configured to when the data amount of one page is smaller than or equal to the primary key of the secondary cache threshold value, said secondary tab directly back to the primary keyset middleware unit, and the data in the master key set is larger than the two when a threshold level cache, the two primary keyset tab temporarily stored in memory, each page of the primary key data taken from the memory, the query data to be processed in accordance with the data of each page of the primary key.
4.根据权利要求3所述的大数据量批处理系统,其特征在于,所述中间件单元包括: 事务建立子单元,用于建立独立事务; 加锁子单元,用于对所述待处理数据添加中间件单元级别主键锁,对所述待处理数据进行处理,在处理结束后,对所述中间件单元级别锁进行解锁。 4. The batch processing large amounts of data system of claim 3, wherein the intermediate unit comprises: establishing a transaction sub-unit for establishing individual transactions; lock sub-unit, configured to be processed Add primary data key lock level middleware unit, the data to be processed is processed, after the processing, the middleware unit level lock to unlock.
5.根据权利要求1至4中任一项所述的大数据量批处理系统,其特征在于,还包括:自识别装置,使所述一级缓存装置自适应多类型数据库。 The large amount of data to batch processing system as claimed in any one of claims 1 to 4, characterized in that, further comprising: self-identifying means, said buffer means a plurality of adaptive type database.
6. 一种大数据量批处理方法,其特征在于,包括以下步骤: 步骤402,中间件单元向一级缓存装置发送查询请求,数据库返回符合所述查询请求的主键集合至所述一级缓存装置; 步骤404,所述一级缓存装置根据所述主键集合生成一级分页主键集合并将所述一级分页主键集合返回至二级缓存装置; 步骤406,所述二级缓存装置根据所述一级分页主键集合生成二级分页主键集合并将所述二级分页主键集合返回至所述中间件单元; 步骤408,所述中间件单元根据所述二级分页主键集合向所述数据库查询待处理数据并再对所述待处理数据进行计算处理后,向所述数据库发送持久化数据请求。 A large amount of data batch method, characterized by comprising the following steps: Step 402, the middleware unit sends a query to a cache device, in line with the primary key database returns the query request to the primary cache set apparatus; a step 404, the cache means based on said primary key set generating a set of primary keys tab and a tab of the primary keyset returns to the secondary cache means; step 406, the secondary cache unit according to the a tab set key generation master key two primary tab and said secondary set of primary tab key set back to the middleware unit; step 408, the middleware unit to be set to the database query based on the two primary key tab after processing the data to be processed and then the processed data is calculated, to the persistent database sends a data request.
7.根据权利要求6所述的大数据量批处理方法,其特征在于,所述步骤404具体包括:设置所述一级缓存装置的一级缓存阈值;在所述主键集合的数据量小于等于所述一级缓存阈值时,直接将所述一级分页主键集合返回至所述二级缓存装置; 在所述主键集合的数据量大于所述一级缓存阈值时,建立并插入临时表,对所述临时表进行分页并将获取的主键返回至所述二级缓存装置。 The large amount of data according to the batch method as claimed in claim 6, wherein the step 404 specifically comprises: setting a threshold level cache buffer means; amount of data in the primary key set is less than or equal when the threshold level cache, a page directly to the primary keyset returns to the secondary cache means; key set in said main data is greater than said threshold level cache, establishing a temporary table and inserts, to the primary key and the acquired temporary paging table is returned to the secondary cache means.
8.根据权利要求6所述的大数据量批处理方法,其特征在于,所述步骤406具体包括:设置所述二级缓存装置的二级缓存阈值; 在所述一级分页主键的数据量小于等于所述二级缓存阈值时,直接将所述二级分页主键集合返回至所述中间件单元; 在所述主键集合的数据量大于所述二级缓存阈值时,将所述二级分页主键集合暂存于内存,从所述内存中取出每一页主键数据,根据所述每一页主键数据查询所述待处理数据。 8. The method of batch processing large amounts of data according to claim 6, wherein the step 406 specifically comprises: setting a threshold value of the two secondary cache buffer means; a data amount of the primary key tab is less than or equal to the threshold value of the secondary cache, the two tab directly back to the primary keyset middleware unit; data set in the primary key is larger than the threshold value the secondary cache, the two tab primary keyset temporarily stored in memory, each page of the primary key data taken from the memory, the query data to be processed in accordance with the data of each page of the primary key.
9.根据权利要求6所述的大数据量批处理方法,其特征在于,所述步骤408具体包括:在所述中间件单元建立独立事务,对所述待处理数据添加中间件单元级别主键锁,对所述待处理数据进行处理,在处理结束后,对所述中间件单元级别锁进行解锁。 Batch processing large amounts of data as claimed in claim 6, wherein the step 408 specifically comprises: establishing individual transactions in the middleware unit, the middleware unit adding the level of the primary keying data to be processed , the data to be processed is processed, after the processing, the middleware unit level lock to unlock.
10.根据权利要求6至9中任一项所述的大数据量批处理方法,其特征在于,所述步骤404还包括,在所述一级缓存装置处,采用自识别装置自适应多类型数据库。 10. The method of large data batches 6-1 according to any of claims 9, wherein said step 404 further comprises, in said buffer means at one, self adaptive multi-type identifying means database.
CN201210480063.2A 2012-11-22 2012-11-22 Batch processing large amounts of data systems and large data batch method CN103020151B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210480063.2A CN103020151B (en) 2012-11-22 2012-11-22 Batch processing large amounts of data systems and large data batch method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210480063.2A CN103020151B (en) 2012-11-22 2012-11-22 Batch processing large amounts of data systems and large data batch method

Publications (2)

Publication Number Publication Date
CN103020151A true CN103020151A (en) 2013-04-03
CN103020151B CN103020151B (en) 2015-12-02

Family

ID=47968755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210480063.2A CN103020151B (en) 2012-11-22 2012-11-22 Batch processing large amounts of data systems and large data batch method

Country Status (1)

Country Link
CN (1) CN103020151B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218179A (en) * 2013-04-23 2013-07-24 深圳市京华科讯科技有限公司 Second-level system acceleration method based on virtualization
CN103888378A (en) * 2014-04-09 2014-06-25 北京京东尚科信息技术有限公司 Data exchange system and method based on cache mechanism
CN103886022A (en) * 2014-02-24 2014-06-25 上海上讯信息技术股份有限公司 Paging-query querying device and method based on primary key fields
CN104424319A (en) * 2013-09-10 2015-03-18 镇江金钛软件有限公司 Method for temporarily storing general data
CN104866434A (en) * 2015-06-01 2015-08-26 北京圆通慧达管理软件开发有限公司 Multi-application-oriented data storage system and data storage and calling method
CN106407020A (en) * 2016-11-23 2017-02-15 青岛海信移动通信技术股份有限公司 Database processing method of mobile terminal and mobile terminal thereof
CN106407019A (en) * 2016-11-23 2017-02-15 青岛海信移动通信技术股份有限公司 Database processing method of mobile terminal and mobile terminal thereof
CN104111962B (en) * 2013-04-22 2018-09-18 Sap欧洲公司 Enhanced transaction has batch operation cache

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120181A1 (en) * 2003-12-02 2005-06-02 Oracle International Corporation Invalidating cached data using secondary keys
US20060259479A1 (en) * 2005-05-12 2006-11-16 Microsoft Corporation System and method for automatic generation of suggested inline search terms
CN101216840A (en) * 2008-01-21 2008-07-09 金蝶软件(中国)有限公司 Data enquiry method and data enquiry system
CN101860449A (en) * 2009-04-09 2010-10-13 华为技术有限公司 Data query method, device and system
CN201993755U (en) * 2011-01-30 2011-09-28 上海振华重工(集团)股份有限公司 Data filtration, compression and storage system of real-time database

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120181A1 (en) * 2003-12-02 2005-06-02 Oracle International Corporation Invalidating cached data using secondary keys
US20060259479A1 (en) * 2005-05-12 2006-11-16 Microsoft Corporation System and method for automatic generation of suggested inline search terms
CN101216840A (en) * 2008-01-21 2008-07-09 金蝶软件(中国)有限公司 Data enquiry method and data enquiry system
CN101860449A (en) * 2009-04-09 2010-10-13 华为技术有限公司 Data query method, device and system
CN201993755U (en) * 2011-01-30 2011-09-28 上海振华重工(集团)股份有限公司 Data filtration, compression and storage system of real-time database

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111962B (en) * 2013-04-22 2018-09-18 Sap欧洲公司 Enhanced transaction has batch operation cache
CN103218179A (en) * 2013-04-23 2013-07-24 深圳市京华科讯科技有限公司 Second-level system acceleration method based on virtualization
CN104424319A (en) * 2013-09-10 2015-03-18 镇江金钛软件有限公司 Method for temporarily storing general data
CN103886022A (en) * 2014-02-24 2014-06-25 上海上讯信息技术股份有限公司 Paging-query querying device and method based on primary key fields
CN103886022B (en) * 2014-02-24 2019-01-18 上海上讯信息技术股份有限公司 A kind of query facility and its method carrying out paging query based on major key field
CN103888378A (en) * 2014-04-09 2014-06-25 北京京东尚科信息技术有限公司 Data exchange system and method based on cache mechanism
CN103888378B (en) * 2014-04-09 2017-08-25 北京京东尚科信息技术有限公司 A system and method for exchanging data caching mechanism based on
CN104866434B (en) * 2015-06-01 2017-10-03 明算科技(北京)股份有限公司 Multi-application for the data storage system and data storage, method call
CN104866434A (en) * 2015-06-01 2015-08-26 北京圆通慧达管理软件开发有限公司 Multi-application-oriented data storage system and data storage and calling method
CN106407020A (en) * 2016-11-23 2017-02-15 青岛海信移动通信技术股份有限公司 Database processing method of mobile terminal and mobile terminal thereof
CN106407019A (en) * 2016-11-23 2017-02-15 青岛海信移动通信技术股份有限公司 Database processing method of mobile terminal and mobile terminal thereof

Also Published As

Publication number Publication date
CN103020151B (en) 2015-12-02

Similar Documents

Publication Publication Date Title
US6711632B1 (en) Method and apparatus for write-back caching with minimal interrupts
CN101620609B (en) Multi-tenant data storage and access method and device
Urhan et al. Xjoin: A reactively-scheduled pipelined join operatorý
US9009104B2 (en) Checkpoint-free in log mining for distributed information sharing
JP4420325B2 (en) Transaction memory management unit
CN103425734B (en) The controller stores a database encoded triplets, methods and systems
US8538985B2 (en) Efficient processing of queries in federated database systems
US20050283471A1 (en) Multi-tier query processing
JP5006348B2 (en) Multi-cache coordination for the response output cache
CN103177056B (en) Storing database tables mixed both rows and columns stored in memory
US8805984B2 (en) Multi-operational transactional access of in-memory data grids in a client-server environment
CN102521405B (en) Massive structured data storage and query methods and systems supporting high-speed loading
EP2191370A1 (en) Transaction aggregation to increase transaction processing throughput
CN102122289B (en) Dispatching conflicting data changes
CN102682052B (en) Query data stored on the filtered data
CN100410930C (en) Providing a useable version of the data item
CN101741986B (en) Page cache method for mobile communication equipment terminal
CN102362273A (en) Dynamic hash table for efficient data access in relational database system
US8953602B2 (en) Network data storing system and data accessing method
US9495394B2 (en) Pluggable session context
JPH06103132A (en) Parallel control method
CN102640151B (en) Transformed data recording method and system
US20090150618A1 (en) Structure for handling data access
CN100462979C (en) Distributed indesx file searching method, searching system and searching server
CN102521406B (en) Distributed query method and system for complex task of querying massive structured data

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
COR Change of bibliographic data
C14 Grant of patent or utility model