TW201626254A - Big data real-time storage and real-time access in NoSQL - Google Patents

Big data real-time storage and real-time access in NoSQL Download PDF

Info

Publication number
TW201626254A
TW201626254A TW104100670A TW104100670A TW201626254A TW 201626254 A TW201626254 A TW 201626254A TW 104100670 A TW104100670 A TW 104100670A TW 104100670 A TW104100670 A TW 104100670A TW 201626254 A TW201626254 A TW 201626254A
Authority
TW
Taiwan
Prior art keywords
data
batch
instant
reading
data storage
Prior art date
Application number
TW104100670A
Other languages
Chinese (zh)
Other versions
TWI522827B (en
Inventor
jun-wei Xie
xiao-feng Ye
Original Assignee
Chunghwa Telecom Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chunghwa Telecom Co Ltd filed Critical Chunghwa Telecom Co Ltd
Priority to TW104100670A priority Critical patent/TWI522827B/en
Application granted granted Critical
Publication of TWI522827B publication Critical patent/TWI522827B/en
Publication of TW201626254A publication Critical patent/TW201626254A/en

Links

Abstract

A big data real-time storage and real-time access in NoSQL is disclosed, which utilizes the attribute in current NoSQL to store multiple lines in a big data sheet, wherein each line is structured with a primary key and columns in any number. The disclosed records only one data storage record in a data storage record sheet for the relevant records in each batch big data storage process, wherein a data storage record contains the primary key of the first data in the batch, the primary key of the last data, and the total data number in the batch. The data sheet is to be used in data access, which only requires access in time sequence and analyze the data storage record to run a batched big data accessing process and coordinate with an auto primary keys ranking of the NoSQL, the data access time can then be shortened. Furthermore, the disclosed provides a distributed multi-tasking access data mechanism based on a real-time operation process efficacy, and a real-time access program module that enables multiple user ends program to be accessed at the same time when data receiving interface is in use, thereby allowing each user end to access non-overlapped data segment, and increasing the instantaneity of data access.

Description

用於非關聯式資料庫之巨量資料即時儲存與讀取方法 Instant storage and reading method for huge amount of data for non-associated database

本發明屬於G06F 17/30資訊檢索及其資料庫結構技術領域,特別是一種用於非關聯式資料庫之巨量資料即時儲存與讀取方法。本發明是一控制方法,作為大量資料儲存與讀取的方法,並可依據即時運算處理效能需求,提供分散式多工讀取資料機制。 The invention belongs to the technical field of G06F 17/30 information retrieval and database structure, in particular to a method for instant storage and reading of huge amount of data for non-associated database. The invention is a control method, as a method for storing and reading a large amount of data, and can provide a distributed multiplexed reading data mechanism according to the real-time computing processing performance requirement.

傳統關聯式資料庫,在面對大量資料寫入以及大量資料儲存時,會遭遇寫入效能不足,以及擴充不易或擴充成本高昂的問題。若採用In-memory Database技術,雖然可以滿足寫入效能要求,但有資料無法大量儲存的限制,且一旦系統關(當)機後資料就會遺失,無法保存。 Traditional relational databases, in the face of large amounts of data writing and large amounts of data storage, suffer from insufficient write performance and the difficulty of expansion or high expansion costs. If the In-memory Database technology is used, although the write performance requirement can be met, there is a limit that the data cannot be stored in a large amount, and once the system is turned off (the machine), the data is lost and cannot be saved.

有別於傳統關聯式資料庫的非關聯式資料庫,或稱NoSQL資料庫,具有水平擴充能力的特性,只要增加新的伺服器節點,就可以不斷擴充資料庫系統的容量,滿足長期大量儲存的需求,且可以滿足大量資料寫入的效能要求,以及對大量歷史資料分析的需求。NoSQL技術採用的Key-Value資料模式,是將每一筆資料的結構簡化到只有一個Key值對應到一個Value值,每一筆資料之間沒有關連性,所以,可以任意切割或調整,也可以分散到不同的伺服器中建立副本。 A non-associative database, or NoSQL database, which is different from the traditional relational database, has the feature of horizontal expansion capability. As long as a new server node is added, the capacity of the database system can be continuously expanded to meet long-term mass storage. The requirements, and can meet the performance requirements of a large number of data writing, as well as the need for a large amount of historical data analysis. The Key-Value data mode adopted by NoSQL technology simplifies the structure of each data to only one Key value corresponding to a Value value. There is no correlation between each data, so it can be arbitrarily cut or adjusted, or distributed to Make a copy in a different server.

但是,NoSQL技術適合以Key值作為寫入和查詢的操作標的,無法直接滿足多人同時大批寫入、讀取資料的應用方式,如大批sensor 資料寫入,與大批sensor資料讀取分析,且分析模組必須可以各自讀取不重覆的資料區塊,進行解析。本案發明人鑑於上述習用方式所衍生的各項缺點,乃亟思加以改良創新,並經多年苦心孤詣潛心研究後,終於成功研發完成本件用於非關聯式資料庫之巨量資料即時儲存與讀取方法。 However, NoSQL technology is suitable for the key value as the operation target of writing and querying. It cannot directly satisfy the application mode of writing and reading data by many people at the same time, such as a large number of sensors. Data is written, and a large number of sensor data are read and analyzed, and the analysis module must be able to read the unrepeated data blocks and parse them. In view of the shortcomings derived from the above-mentioned conventional methods, the inventor of the present invention has improved and innovated, and after years of painstaking research, he finally successfully developed and stored this huge amount of data for non-associated database for instant storage and reading. method.

本發明之目的是提供一種用於非關聯式資料庫之巨量資料即時儲存與讀取方法,透過軟體的方式充分利用NoSQL資料庫的優點,解決大批資料寫入、大批資料讀取分析的應用需求。藉由NoSQL特性,本發明能夠以低廉的成本動態擴充,滿足長期儲存歷史資料,提供後續分析應用的需求。同時,實現一控制方法,讓資料可以讓多個用戶端程式同時讀取,各自讀取不重覆的資料片段,進行分析。 The object of the present invention is to provide a method for instant storage and reading of huge amounts of data for a non-associated database, which fully utilizes the advantages of the NoSQL database through software, and solves the application of bulk data writing and bulk data reading and analysis. demand. With the NoSQL feature, the present invention can be dynamically expanded at a low cost, meeting long-term storage history data, and providing a need for subsequent analysis applications. At the same time, a control method is implemented, so that the data can be read by multiple client programs at the same time, and each of the non-repeated data segments is read and analyzed.

達成上述發明目的之用於非關聯式資料庫之巨量資料即時儲存與讀取方法,係運用非關聯式資料庫本身的屬性,每一行的結構同樣具有一個主要鍵值Key和任意數量的列欄位,基於此屬性,本發明提出一種用於非關聯式資料庫之巨量資料即時儲存與讀取方法,將每批次大量資料儲存作業僅以一筆資料儲存紀錄儲存於一資料表中,資料儲存紀錄包含該批次第一筆資料之主鍵值、最後一筆資料之主鍵值及該批次完成儲存資料總筆數,此儲存紀錄資料表由另一觀點可視為一批次資料索引紀錄資料表,此資料表提供即時巨量資料讀取所使用,當讀取資料時僅需依時間順序讀取並解析資料儲存紀錄然後再組合成一批次資料查詢條件後,進行批次大量資料讀取作業,可提升巨量資料查詢及讀取效率。 The method for instant storage and reading of huge amounts of data for a non-associated database that achieves the above object is to use the attributes of the non-associative database itself, and the structure of each row also has a primary key value Key and an arbitrary number of columns. Field, based on this attribute, the present invention provides a method for instantaneous storage and reading of huge amounts of data for a non-associated database, and stores a large amount of data storage operations in a batch in a data table. The data storage record includes the primary key value of the first data of the batch, the primary key value of the last data, and the total number of stored data in the batch. The storage record data table can be regarded as a batch data index by another viewpoint. Record data sheet, which is used for real-time huge data reading. When reading data, it only needs to read and parse the data storage records in chronological order and then combine them into a batch of data query conditions. Reading jobs can increase the amount of data query and read efficiency.

110‧‧‧即時儲存程式模組 110‧‧‧ Instant Storage Module

120‧‧‧資料接收介面 120‧‧‧ data receiving interface

130‧‧‧非關聯式資料庫 130‧‧‧Unrelated database

140‧‧‧即時讀取程式模組 140‧‧‧ Instant Reader Module

150‧‧‧資料讀取介面 150‧‧‧ data reading interface

160‧‧‧資料儲存紀錄控制模組 160‧‧‧ Data Storage Record Control Module

210~240‧‧‧步驟流程 210~240‧‧‧Step process

310~340‧‧‧步驟流程 310~340‧‧‧Step process

圖一為本發明用於非關聯式資料庫之巨量資料即時儲存與讀取方法之電腦系統架構示意圖;圖二為該用於非關聯式資料庫之巨量資料即時儲存方法之流程圖;以及圖三為該用於非關聯式資料庫之巨量資料即時讀取方法之流程圖。 1 is a schematic diagram of a computer system architecture for an instant storage and reading method of a huge amount of data for a non-associated database; FIG. 2 is a flow chart of the method for storing a huge amount of data for a non-associated database; And Figure 3 is a flow chart of the method for instantly reading huge amounts of data for a non-associated database.

請參考圖一之本發明的系統架構示意圖,主要包含即時儲存程式模組110、資料儲存紀錄控制模組160與即時讀取程式模組140三個模組,分別提供資料接收介面120及資料讀取介面150,讓資料提供者可以透過資料接收介面120將即時資料儲存至非關聯式資料庫130中,讓資料接收者可以透過資料讀取介面150將即時資料由非關聯式資料庫130中讀取。 Please refer to FIG. 1 for a schematic diagram of the system architecture of the present invention, which mainly includes three modules: an instant storage program module 110, a data storage record control module 160 and an instant reading program module 140, respectively providing a data receiving interface 120 and reading data. The interface 150 is configured to allow the data provider to store the real-time data into the non-associated database 130 through the data receiving interface 120, so that the data recipient can read the real-time data from the unrelated database 130 through the data reading interface 150. take.

即時儲存程式模組110主要為提供一資料接收介面120負責處理接收接批次大量的結構化資料物件(JAVA物件),當完成接收資料後,開始逐筆進行資料的型別轉換,將結構化資料物件轉換為非結構化資料,並依據資料發生的時間來做為該筆資料的主鍵值(ROW KEY),最後寫入非關聯式資料庫130中相對應之資料表,以完成非關聯式資料庫130之巨量資料即時儲存。 The instant storage program module 110 mainly provides a data receiving interface 120 for processing and receiving a large number of structured data objects (JAVA objects), and when the receiving data is completed, the data type conversion is started one by one, and the data is structured. The data object is converted into unstructured data, and the primary key value (ROW KEY) of the data is used according to the time when the data occurs, and finally the corresponding data table in the non-associated database 130 is written to complete the non-association. The huge amount of data in the database 130 is stored immediately.

資料儲存紀錄控制模組160負責將每批次大量資料儲存作業相關紀錄,僅以一筆資料儲存紀錄儲存於一資料儲存紀錄資料表中,其中一資料儲存紀錄包含該批次之第一筆資料之主鍵值、最後一筆資料之主鍵值及該批次資料總數,此資料表可提供資料讀取時所使用,當讀取資料時,僅需依據時間順序來讀取並解析資料儲存紀錄,然後再組合成一批次資料查詢條件進行批次大量資料讀取作業。 The data storage record control module 160 is responsible for storing a large number of data storage operation related records for each batch, and storing only one data storage record in a data storage record data table, wherein one data storage record contains the first data of the batch. The primary key value, the primary key value of the last data, and the total amount of the batch data. This data table can be used to read the data. When reading the data, only the data storage record needs to be read and parsed according to the chronological order. Then combined into a batch of data query conditions for bulk data reading operations.

即時讀取程式模組140為提供一資料讀取介面150,負責處理讀取儲存在非關聯式資料庫130之批次大量的非結構性資料,依據資料儲存紀錄控制模組160所得之查詢條件來進行相對應資料表查詢資料作業,取得資料後逐筆進行資料型別轉換,將非結構化資料轉換為結構化資料物件(JAVA物件),以完成非關聯式資料庫130之巨量資料即時讀取。 The instant-reading program module 140 is configured to provide a data reading interface 150 for processing a large amount of non-structural data stored in the non-associated database 130, and querying the query conditions based on the data storage record control module 160. To perform the data query operation of the corresponding data table, obtain the data type and convert the data type into pieces, and convert the unstructured data into the structured data object (JAVA object) to complete the huge amount of data of the unrelated database 130. Read.

由另一觀點來看,資料儲存紀錄資料表可視為一批次資料索引紀錄資料表,因此資料儲存紀錄控制模組可視為本發明之核心,透過對批次資料主鍵值的管控及資料表本身具有的索引功能,只須執行一次的查詢作業就可將該批次資料全部且精準的由非關聯式資料庫130讀取出來,可達到巨量資料即時存取功效並提供資料備份及保證資料不遺失功能。 From another point of view, the data storage record data table can be regarded as a batch of data index record data table. Therefore, the data storage record control module can be regarded as the core of the invention, through the control and data table of the primary key value of the batch data. The index function itself has only one execution of the query operation, and the batch data can be completely and accurately read by the non-associated database 130, which can achieve the huge amount of data instant access and provide data backup and guarantee. The information is not lost.

圖二是依照本發明之一實施例所繪示之非關聯式資料庫之巨量資料即時儲存方法的流程圖,以下將說明如何將一批次大量資料寫入非關聯式資料庫的詳細步驟。在本實施例中,非關聯式資料庫具有一特定資料表格用以儲存該批次資料儲存相關紀錄。 2 is a flow chart of a method for storing huge amounts of data in a non-associated database according to an embodiment of the present invention. The following is a detailed step of how to write a batch of large amounts of data into a non-associative database. . In this embodiment, the non-associated database has a specific data table for storing the batch data storage related records.

請參閱圖二,首先步驟210係將結構化資料物件轉換成為非結構化資料,在此之前,資料提供者必須使用即時儲存程式模組110提供之資料接收介面120依據結構化資料欄位所設計的一結構化資料物件程式,完成資料封裝作業,並將資料傳遞至步驟210,然後透過物件將欄位資料逐筆取出後,利用非關聯式資料庫所提供之資料儲存函式,將結構化資料轉換為以行為單位的非結構化資料。 Referring to FIG. 2, first step 210 is to convert the structured data object into unstructured data. Before that, the data provider must use the data receiving interface 120 provided by the instant storage program module 110 to design according to the structured data field. a structured data object program, completes the data encapsulation operation, and passes the data to step 210, and then extracts the field data through the object piece by piece, and then uses the data storage function provided by the non-relevant database to structure Data is converted to unstructured data in units of conduct.

接著在步驟220中,產生該筆非結構化資料所需之主鍵值,步驟220係透過即時儲存程式模組110將主鍵元素以一設定檔案儲存,主鍵元 素包含資料產生時間及原先在結構化資料的主鍵欄位。再加上由步驟210所完成轉換之非結構化資料可組成一筆具有主鍵值的非結構化資料,並等待被資料儲存紀錄控制模組160以行的格式儲存至非關聯式資料庫130中。 Next, in step 220, the primary key value required for the unstructured data is generated, and in step 220, the primary key element is stored in the configuration file by the instant storage program module 110. The prime contains the data generation time and the primary key field of the original structured data. In addition, the unstructured data converted by the step 210 can form an unstructured data having a primary key value, and waits to be stored in the non-associated database 130 by the data storage record control module 160 in a row format. .

在步驟230中,由資料儲存紀錄控制模組160對每批次大量資料的每一筆資料進行判斷,首先將進行判斷是否為第一筆資料,若為第一筆資料將其主鍵值記錄下來,為第一筆資料之主鍵值,然後再進行判斷是否已將該批次所有資料完成結構化資料物件轉換成為非結構化資料,若還有資料則重複步驟220;若無資料則表示該筆資料為最後一筆資料,並將其主鍵值記錄下來後,為最後一筆資料之主鍵值,進行步驟240。 In step 230, the data storage record control module 160 determines each piece of data of each batch of data, firstly determines whether it is the first piece of data, and records the primary key value for the first piece of data. , is the primary key value of the first data, and then judge whether the structured data object of the batch has been converted into unstructured data, if there is any data, repeat step 220; if there is no data, it means After the pen data is the last data, and the primary key value is recorded, it is the primary key value of the last data, and step 240 is performed.

最後在步驟240中,資料儲存紀錄控制模組160將每批次大量由即時儲存程式模組110所產生之資料儲存作業,僅以一筆資料儲存紀錄儲存於非關聯資料庫資之資料儲存紀錄資料表中,資料儲存紀錄表本身也屬於非關聯式資料表,每筆資料儲存紀錄包含該批次第一筆主鍵、最後一筆主鍵、該批次寫入資料總數及狀態值,主鍵元素為該筆資料儲存紀錄資料產生時間。 Finally, in step 240, the data storage record control module 160 stores a large amount of data storage operations generated by the instant storage program module 110 for each batch, and stores only one data storage record in the data storage record data of the non-related data storage. In the table, the data storage record table itself is also a non-associated data table. Each data storage record contains the first primary key, the last primary key of the batch, the total number of data written in the batch, and the status value. The primary key element is the pen. Data storage record data generation time.

舉例來說,當有一批次1000筆資料要進行儲存作業時,且主鍵值元素設定為日期(年月日)加上流水編號,則透過本發明的方法會產生第一筆主鍵值為201311010001而最後一筆主鍵值為201311011000,狀態值預設為”0”,在步驟240中會新增一筆資料儲存紀錄其內容為(資料產生時間,20131101000,201311011000,1000,0)。 For example, when there is a batch of 1000 data to be stored, and the primary key value element is set to the date (year, month, day) plus the serial number, the first primary key value is generated by the method of the present invention. 201311010001 and the last primary key value is 201311011000, the status value is preset to "0", and in step 240, a new data storage record is added (the data generation time, 20131101000, 201311011000, 1000, 0).

當即時資料大量被寫入非關聯式資料庫之後同樣需要被即時讀取出來進行後續相關的應用,例如提供即時運算處理。圖三是依照本 發明之一實施例所繪示之利用非關聯式資料庫之巨量資料即時讀取方法的流程圖,首先步驟如310所示,資料儲存紀錄控制模組160依當下時間順序讀取一資料儲存紀錄資料表,讀取規則為至多兩筆三分鐘之內,且尚未被讀取之時間記錄為最早的資料儲存紀錄,至多兩筆,其詳細方法為,首先查詢資料儲存紀錄資料表中最新一筆資料的時間,再逐一比對每一筆紀錄的時間後過濾出三分鐘之內的有效紀錄,同時將有效紀錄的狀態欄位值透過NoSql資料庫內建的函式(例如HBase中的incrementColumnValue())進行鎖定(LOCK)欄位後再自動遞增的動作,接著再判斷該狀態欄位值是否為”1“(表示尚未被讀取),若為”1“,則將所查詢得到的資料儲存紀錄傳遞給步驟320,反之則忽略該筆資料。此過濾有效紀錄之法可以達到一控制方法,可以讓多個用戶端程式同時讀取,各自讀取不重覆的資料片段,達成資料分散處理目的。 When a large amount of real-time data is written into a non-associative database, it also needs to be read out immediately for subsequent related applications, such as providing immediate arithmetic processing. Figure 3 is in accordance with this A flow chart of a method for instantly reading a huge amount of data using a non-associated database is shown in an embodiment of the present invention. First, as shown in step 310, the data storage record control module 160 reads a data storage in the current time sequence. Record data sheet, the reading rule is up to two minutes within three minutes, and the time that has not been read is recorded as the earliest data storage record, up to two strokes. The detailed method is to first query the latest data in the data storage record data sheet. The time of the data, then compare the time of each record to filter out the effective record within three minutes, and pass the status field value of the valid record through the built-in function of NoSql database (such as incrementColumnValue() in HBase The action of automatically incrementing after the LOCK field is performed, and then determining whether the status field value is "1" (indicating that it has not been read yet), and if it is "1", storing the queried data. The record is passed to step 320, otherwise the data is ignored. The method of filtering effective records can achieve a control method, which allows multiple client programs to simultaneously read, and each reads non-repeated data segments to achieve data dispersion processing purposes.

在步驟320中進行批次大量資料查詢條件解析作業並產生一查詢條件物件係由資料儲存紀錄控制模組160將資料儲存紀錄進行批次大量資料查詢條件解析作業並產生一查詢條件物件,再傳遞給步驟330即時讀取程式使用。 In step 320, the batch large-scale data query condition analysis operation is performed and a query condition object is generated. The data storage record control module 160 performs the data storage record for batch data query condition analysis operation and generates a query condition object, and then transmits the data. The step 330 is used to read the program.

在步驟330中取得一查詢條件物件之後,利用一非關聯式資料庫所提供之資料查詢函式與查詢條件物件結合後進行批次大量資料查詢作業。 After obtaining a query condition object in step 330, the data query function provided by a non-associative database is combined with the query condition object to perform batch batch data query operation.

以上述舉例來說,若在步驟310中所查詢的資料儲存紀錄其內容為(資料產生時間,20131101000,201311011000,1000,1),經過步驟320解析之後會取得該批次的第一筆主鍵值為20131101000以及最後一筆主鍵值為 201311011000,再透過步驟330組合一個非關聯式資料庫的查詢條件為(STARTROW=20131101000,STOPROW=201311011000),然後進行批次大量資料查詢作業。 For example, if the content of the data storage record queried in step 310 is (data generation time, 20131101000, 201311011000, 1000, 1), after the step 320 is parsed, the first primary key of the batch is obtained. The value is 20131101000 and the last primary key value is 201311011000, and then through step 330, the query condition of combining a non-associated database is (STARTROW=20131101000, STOPROW=201311011000), and then batch batch data query operation is performed.

最後由步驟340完成將批次大量的非結構化資料轉換結構化資料物件依據上述之一結構化資料物件(JAVA物件)程式提供資料提供者進行資料封裝作業,透過非關聯式資料庫所提供之資讀取函式逐筆取出後將以行為單位的非結構化資料欄位資料轉換為一結構化資料物件。 Finally, in step 340, the batch of unstructured data is converted into structured data objects, and the data provider is provided by the data provider according to one of the structured data objects (JAVA object) programs, and is provided by the non-associated database. After the reading function is taken out one by one, the unstructured data field data of the behavior unit is converted into a structured data object.

本發明所提供之用於非關聯式資料庫之巨量資料即時儲存與讀取方法,與其他習用技術相互比較時,更具備下列優點: The instant storage and reading method for the huge amount of data provided by the invention for the non-associated database has the following advantages when compared with other conventional technologies:

1.簡易使用介面,透過結構化資料物件轉換非結構化資料功能,讓使用者無須具備非關聯式資料庫相關技能,即可使用結構化資料物件進行非關聯式資料庫之巨量資料即時儲存與讀取作業。 1. Easy-to-use interface to convert unstructured data through structured data objects, allowing users to use structured data objects to store huge amounts of data in unrelated databases without the need for non-relevant database-related skills. With the read job.

2.提供分散式多工存取資料機制,依據即時運算處理效能需求可提供分散式多工存取資料機制,即時讀取程式模組可視當下的資料接收介面使用數量,讓多個用戶端程式同時讀取,各自讀取不重覆的資料片段,以增加資料讀取的即時性。 2. Provide decentralized multiplexed access data mechanism, which can provide distributed multiplexed access data mechanism according to real-time computing processing performance requirements. The instant reading program module can display the number of data receiving interfaces used in the current, and allow multiple client programs. Simultaneous reading, each reading non-repeated data segments to increase the immediacy of data reading.

3.所有資料能完整保存,能兼顧歷史資料分析的應用場景,同時也確保資料接收端能取得最新的、未處理過的資料,解決以記憶體資料庫(In-memory Database)實作的巨量資料即時存取機制,於系統關(當)機後資料就遺失的缺點。 3. All data can be completely preserved, and the application scenarios of historical data analysis can be taken into consideration. At the same time, the data receiving end can obtain the latest and unprocessed data, and solve the huge problem realized by the memory database (In-memory Database). The short-term access mechanism of the quantity data, the shortcoming of the data is lost after the system is shut down.

上列詳細說明乃針對本發明之一可行實施例進行具體說明,惟該實施例並非用以限制本發明之專利範圍,凡未脫離本發明技藝精 神所為之等效實施或變更,均應包含於本案之專利範圍中。 The detailed description above is a detailed description of one of the possible embodiments of the present invention, but the embodiment is not intended to limit the scope of the invention. The equivalent implementation or change of God shall be included in the scope of the patent in this case.

綜上所述,本案不僅於技術思想上確屬創新,並具備習用之傳統方法所不及之上述多項功效,已充分符合新穎性及進步性之法定發明專利要件,爰依法提出申請,懇請貴局核准本件發明專利申請案,以勵發明,至感德便。 To sum up, this case is not only innovative in terms of technical thinking, but also has many of the above-mentioned functions that are not in the traditional methods of the past. It has fully complied with the statutory invention patent requirements of novelty and progressiveness, and applied for it according to law. Approved this invention patent application, in order to invent invention, to the sense of virtue.

110‧‧‧即時儲存程式模組 110‧‧‧ Instant Storage Module

120‧‧‧資料接收介面 120‧‧‧ data receiving interface

130‧‧‧非關聯式資料庫 130‧‧‧Unrelated database

140‧‧‧即時讀取程式模組 140‧‧‧ Instant Reader Module

150‧‧‧資料讀取介面 150‧‧‧ data reading interface

160‧‧‧資料儲存紀錄控制模組 160‧‧‧ Data Storage Record Control Module

Claims (9)

一種用於非關聯式資料庫之巨量資料即時儲存與讀取方法,其步驟為:a.結構化資料物件轉換非結構化資料(210);b.產生資料主鍵值(220);c.產生批次資料儲存紀錄與相關索引(230);d.將每批次大量資料儲存作業,僅以一筆資料儲存紀錄儲存於一資料儲存紀錄資料表中,同時將批次資料儲存於非關聯式資料庫中(240);e.讀取一資料儲存紀錄資料表(310);f解析批次大量資料查詢條件並產生一查詢條件(320);g.利用一非關聯式資料庫所提供之資料查詢函式與查詢條件物件結合後進行批次大量資料查詢(330);h.批次大量非結構化資料轉換結構化資料物件(340)。 A method for instant storage and reading of huge amounts of data for a non-associative database, the steps of which are: a. converting structured objects to unstructured data (210); b. generating primary key values (220); Generate batch data storage records and related indexes (230); d. store a large amount of data in each batch, store only one data storage record in a data storage record data table, and store batch data in non-associated In the database (240); e. reading a data storage record data table (310); f parsing a batch of large data query conditions and generating a query condition (320); g. using a non-relevant database The data query function is combined with the query condition object to perform batch data query (330); h. batch large amount of unstructured data conversion structured data object (340). 如申請專利範圍第1項所述之一種用於非關聯式資料庫之巨量資料即時儲存與讀取方法,其中結構化資料物件轉換非結構化資料(210)係由即時儲存程式模組(110)所執行,該即時儲存程式模組(110)提供資料接收介面(120),負責接收資料提供者所提供之結構化原始資料物件,並將原始資料物件轉換為非結構化資料。 An instant storage and reading method for a huge amount of data for a non-associated database, as described in claim 1, wherein the structured data object conversion unstructured data (210) is an instant storage program module ( 110) The instant storage program module (110) provides a data receiving interface (120) for receiving the structured original data object provided by the data provider and converting the original data object into unstructured data. 如申請專利範圍第1項所述之一種用於非關聯式資料庫之巨量資料即時儲存與讀取方法,其中產生資料主鍵值(220)係由即時儲存程式模組(110)所執行,係先將主鍵元素以設定檔案儲存,當結構化資料物件被接 收後,透過主鍵產生方法取得主鍵元素產生該結構化資料物件使用之主鍵,轉換為非結構化資料後再經由資料儲存紀錄控制模組(160)將該筆資料以行的格式儲存於非關聯資料庫(130)。 An instant data storage and reading method for a non-relevant database according to claim 1, wherein the generating data primary key value (220) is executed by the instant storage program module (110). , the first key element is stored in the set file, when the structured data object is connected After receiving, the primary key element is used to generate the primary key used to generate the structured data object, converted into unstructured data, and then stored in the line format by the data storage record control module (160). Database (130). 如申請專利範圍第3項所述之一種用於非關聯式資料庫之巨量資料即時儲存與讀取方法,其中主鍵元素包含資料產生時間及原先在結構化資料的主鍵欄位。 An instant data storage and reading method for a non-relevant database according to claim 3, wherein the primary key element includes a data generation time and a primary key field of the original structured data. 如申請專利範圍第1項所述之一種用於非關聯式資料庫之巨量資料即時儲存與讀取方法,其中產生批次資料儲存紀錄與相關索引(230)係由資料儲存紀錄控制模組(160)所執行,係將每批次大量資料進行資料判斷,首先將進行判斷是否為第一筆資料,若為第一筆資料將其主鍵值記錄下來,為第一筆資料之主鍵值,然後再進行判斷是否已將該批次所有資料完成結構化資料物件轉換成為非結構化資料,若還有資料則重複結構化資料物件轉換非結構化資料(210));若無資料則表示該筆資料為最後一筆資料,並將其主鍵值記錄下來後,為最後一筆資料之主鍵值。 An instant storage and reading method for a huge amount of data for a non-associated database, as described in claim 1, wherein the batch data storage record and related index (230) are data storage record control modules. (160) Execution, the data is judged by a large amount of data in each batch. First, it is judged whether it is the first data. If the primary data is recorded for the first data, it is the primary key of the first data. Value, and then judge whether the materialized data object of the batch has been converted into unstructured data, and if there is still data, the structural data object is converted into unstructured data (210)); if there is no data Indicates that the data is the last data and records the primary key value, which is the primary key value of the last data. 如申請專利範圍第1項所述之一種用於非關聯式資料庫之巨量資料即時儲存與讀取方法,其中將每批次大量資料儲存作業僅以一筆資料儲存紀錄儲存於一資料儲存紀錄資料表中(240),係由資料儲存紀錄控制模組(160)所執行,係將每批次大量由即時儲存程式模組(110)所產生之資料儲存作業僅以一筆資料儲存紀錄儲存於一非關聯資料庫資料儲存紀錄資料表中,資料儲存紀錄包含第一筆主鍵、最後一筆主鍵、該批次寫入資料總數及狀態值,其中主鍵元素以該筆資料儲存紀錄資料產生時間。 An instant storage and reading method for a huge amount of data for a non-associated database, as described in claim 1, wherein each batch of large-scale data storage operations is stored in a data storage record with only one data storage record. The data sheet (240) is executed by the data storage record control module (160). The data storage operations generated by the batch of instant storage module (110) are stored in a data storage record. In a non-relevant database data storage record data table, the data storage record includes a first primary key, a last primary key, a total number of data written in the batch, and a status value, wherein the primary key element stores the recorded data generation time by using the data. 如申請專利範圍第1項所述之一種用於非關聯式資料庫之巨量資料即時儲存與讀取方法,其中讀取一資料儲存紀錄資料表(310)係由即時讀取程式模組(140)與資料儲存紀錄控制模組(160)所執行,係透過資料儲存紀錄控制模組(160)先依當下時間順序讀取一資料儲存紀錄資料表,讀取規則為一段可設定時間之內尚未被讀取之時間記錄最早的資料儲存紀錄,至多兩筆。 An instant storage and reading method for a huge amount of data for a non-associated database, as described in claim 1, wherein reading a data storage record data table (310) is an instant reading program module ( 140) Executing with the data storage record control module (160), the data storage record control module (160) first reads a data storage record data table in the current time sequence, and the reading rule is within a set period of time. The earliest data storage record is recorded at the time when it has not been read, up to two times. 如申請專利範圍第1項所述之一種用於非關聯式資料庫之巨量資料即時儲存與讀取方法,其中解析批次大量資料查詢條件並產生一查詢條件(320)係由資料儲存紀錄控制模組(160)所執行,依據資料儲存紀錄進行批次大量資料查詢條件解析作業並產生一查詢條件物件提供即時讀取程式模組(140)進行批次大量資料查詢作業使用。 An instant storage and reading method for a huge amount of data for a non-associated database, as described in claim 1, wherein the batch of large data query conditions is parsed and a query condition (320) is generated by the data storage record. The control module (160) executes the batch data query condition analysis operation according to the data storage record and generates a query condition object to provide an instant reading program module (140) for batch large data query operation. 如申請專利範圍第1項所述之一種用於非關聯式資料庫之巨量資料即時讀取方法,其中批次大量的非結構化資料轉換結構化資料物件(340)係由即時讀取程式模組(140)所執行,即時讀取程式模組包含一資料讀取介面(150),負責提供資料要求者進行資料封裝作業,透過非關聯式資料庫所提供之資讀取函式逐筆取出後,以行為單位的非結構化資料欄位資料轉換為一結構化資料物件。 An instant reading method for a huge amount of data for a non-associated database, as described in claim 1, wherein a large number of unstructured data conversion structured data objects (340) are instant reading programs. Executed by the module (140), the instant reading program module includes a data reading interface (150), which is responsible for providing the data requester to perform data encapsulation operations, and the reading function provided by the non-relevant database is written one by one. After being taken out, the unstructured data field data in the unit of action is converted into a structured data object.
TW104100670A 2015-01-09 2015-01-09 Real-time storage and real-time reading of huge amounts of data for non-related databases TWI522827B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW104100670A TWI522827B (en) 2015-01-09 2015-01-09 Real-time storage and real-time reading of huge amounts of data for non-related databases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW104100670A TWI522827B (en) 2015-01-09 2015-01-09 Real-time storage and real-time reading of huge amounts of data for non-related databases

Publications (2)

Publication Number Publication Date
TWI522827B TWI522827B (en) 2016-02-21
TW201626254A true TW201626254A (en) 2016-07-16

Family

ID=55810441

Family Applications (1)

Application Number Title Priority Date Filing Date
TW104100670A TWI522827B (en) 2015-01-09 2015-01-09 Real-time storage and real-time reading of huge amounts of data for non-related databases

Country Status (1)

Country Link
TW (1) TWI522827B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI706260B (en) * 2018-05-29 2020-10-01 香港商阿里巴巴集團服務有限公司 Index establishment method and device based on mobile terminal NoSQL database

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153643B (en) * 2016-03-02 2021-02-19 阿里巴巴集团控股有限公司 Data table connection method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI706260B (en) * 2018-05-29 2020-10-01 香港商阿里巴巴集團服務有限公司 Index establishment method and device based on mobile terminal NoSQL database

Also Published As

Publication number Publication date
TWI522827B (en) 2016-02-21

Similar Documents

Publication Publication Date Title
CN108536761B (en) Report data query method and server
US20180137134A1 (en) Data snapshot acquisition method and system
Kraska Finding the needle in the big data systems haystack
EP2778972B1 (en) Shared cache used to provide zero copy memory mapped database
US11294973B2 (en) Codeless information service for abstract retrieval of disparate data
CN110795499B (en) Cluster data synchronization method, device, equipment and storage medium based on big data
CN106611044B (en) SQL optimization method and equipment
CA2902821A1 (en) System for metadata management
US10002142B2 (en) Method and apparatus for generating schema of non-relational database
CN102426609A (en) Index generation method and index generation device based on MapReduce programming architecture
EP2763055B1 (en) A telecommunication method and mobile telecommunication device for providing data to a mobile application
CN106844682A (en) Method for interchanging data, apparatus and system
US9600559B2 (en) Data processing for database aggregation operation
US10311045B2 (en) Aggregation/evaluation of heterogenic time series data
WO2023029275A1 (en) Data association analysis method and apparatus, and computer device and storage medium
CN111046036A (en) Data synchronization method, device, system and storage medium
Mehmood et al. Performance analysis of not only SQL semi-stream join using MongoDB for real-time data warehousing
CN105630934A (en) Data statistic method and system
Gohil et al. Efficient ways to improve the performance of HDFS for small files
CN105930354B (en) Storage model conversion method and device
Suriarachchi et al. Big provenance stream processing for data intensive computations
TWI436222B (en) Real - time multi - dimensional analysis system and method on cloud
TWI522827B (en) Real-time storage and real-time reading of huge amounts of data for non-related databases
CN112364033A (en) Data retrieval system
CN111008198A (en) Service data acquisition method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees