CN112527836B - Big data query method based on T-BOX platform - Google Patents
Big data query method based on T-BOX platform Download PDFInfo
- Publication number
- CN112527836B CN112527836B CN202011424744.8A CN202011424744A CN112527836B CN 112527836 B CN112527836 B CN 112527836B CN 202011424744 A CN202011424744 A CN 202011424744A CN 112527836 B CN112527836 B CN 112527836B
- Authority
- CN
- China
- Prior art keywords
- task
- tasks
- data
- database
- query
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/252—Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
A big data query method based on a T-BOX platform relates to the technical field of new energy automobile data query. The problems that a terminal in the existing T-BOX platform accesses a web server frequently and the query data volume is large are solved. The task request received by the web server is made into a task and stored in an Oracle database, uncompleted tasks in the Oracle database are retrieved regularly, a retrieval result data set is sent to a task distribution system, and the task distribution system sequences the tasks to obtain a task sequence; sending the task sequence to a stack-based message channel, extracting tasks in the message channel through multithreading, dividing the tasks into a plurality of subtasks according to the query time length of the tasks, and executing the subtasks in multithreading to access a hbase database to obtain target data; encoding the obtained target data to generate a csv file, and compressing the csv file into a compression packet; and storing the compressed packet into a mongo database, storing the ID of the compressed packet into an oracle database, and marking the finished task. The method is suitable for data query of the T-BOX platform.
Description
Technical Field
The invention relates to the technical field of new energy automobile data query.
Background
A terminal in a T-BOX (Telematics BOX, vehicle-mounted T-BOX for short) platform uploads one piece of data every second, 1 trolley uploads 86400 pieces of data every day, and if a ten thousand trolleys access to the platform, nearly 9 hundred million pieces of data are stored in a hbase (distributed and nematic open source database) every day. Because the amount of data in the hbase database is particularly large, if the web server directly accesses the hbase query data, the web server may have a long waiting time, which greatly reduces the performance of the deployed project.
Disclosure of Invention
The invention aims to solve the problems that a terminal in the existing T-BOX platform frequently accesses a web server and the query data volume is large. The invention discloses a big data query method based on a T-BOX platform.
The invention relates to a big data query method based on a T-BOX platform, which specifically comprises the following steps:
step one, a task request received by a web server of a T-BOX platform is made into a task and stored in an Oracle database, and the creation time, the urgency and the priority of each task are marked;
step two, searching uncompleted tasks in the oracle database at regular time, and sending a search result data set to a task distribution system;
thirdly, the task distribution system sorts the tasks in the result data set according to the creation time, the urgency and the priority to obtain a task sequence;
step four, sending the task sequence to a stack-based message channel, extracting tasks in the message channel through multithreading, dividing the tasks into a plurality of subtasks according to the query time length of the tasks, executing the subtasks in multithreading to access the hbase database, and obtaining a group of target data by each subprogram;
encoding the acquired target data to generate a csv file, adding the csv file until all target data acquired by all subtasks of one task are encoded, and compressing a plurality of csv files into a compressed packet;
step six, storing the compressed packet into a mongo database, storing the ID of the compressed packet into an oracle database, and marking the completed task;
and step seven, sending the compressed packet to the client corresponding to the ID, and regularly querying the completed tasks in the database by the client to complete the query of one task.
Furthermore, in the second step, after the incomplete tasks in the oracle database are searched, the searched incomplete tasks are also marked with search completion.
Further, the method for sending the task sequence to the stack-based message channel in step four is as follows: the sequence of tasks from the tail to the head is put into the stack-based message channel.
Further, the step four further includes a step of performing cleaning and filtering on the target data acquired by each subroutine, and the specific method of cleaning and filtering is as follows: and deleting n continuous pieces of data which have the same time and have the data value of 255 or 65535.
The method of the invention adopts error-prone parts such as multithreading, cache queues and the like, which are all realized by using a spring integration framework, wherein an inherent mature and stable module in the spring integration framework is an integrated mode, and is used for carrying out message transmission between systems by an event-driven message framework, and guiding data to a next required place through a pipeline. The method can be called spring integration, and ensures the stability of the program. When the csv file generated by the retrieval result is stored in the mongo database, objects do not need to be created, the cleaned and converted data can be directly stored in the mongo database in a streaming mode, the file ID is automatically generated, the memory is rapidly released, and when other platforms also need to check the data uploaded by the equipment in real time, the current system can be provided for other platforms for use only by modifying the port of a ZMQ (ZeroMQ interface) for receiving the data and the configuration of an oracle database.
Drawings
FIG. 1 is a schematic block diagram of the method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The first embodiment is as follows: the present embodiment is described below with reference to fig. 1, and the method for querying big data based on a T-BOX platform in the present embodiment specifically includes:
step one, making a task request received by a web server of a T-BOX platform into a task, storing the task request in an Oracle database, and marking the creation time, the urgency and the priority of each task;
step two, searching uncompleted tasks in the oracle database at regular time, and sending a search result data set to a task distribution system;
thirdly, the task distribution system sorts the tasks in the result data set according to the creation time, the urgency and the priority to obtain a task sequence;
step four, sending the task sequence to a stack-based message channel, extracting tasks in the message channel through multithreading, dividing the tasks into a plurality of subtasks according to the query time length of the tasks, executing the subtasks in multithreading to access the hbase database, and obtaining a group of target data by each subprogram;
encoding the acquired target data to generate a csv file until all target data acquired by all subtasks of one task are encoded, adding the csv file, and compressing a plurality of csv files into a compression packet;
step six, storing the compressed packet into a mongo database, storing the ID of the compressed packet into an oracle database, and marking the completed task;
and step seven, sending the compressed packet to the client corresponding to the ID, and regularly querying the completed tasks in the database by the client to complete the query of one task.
In this embodiment, the mongo Database is a Database based on distributed file storage, and the Oracle Database is Oracle Database, also called Oracle RDBMS, or Oracle for short. Is a relational database management system of the oracle culture company. Comma Separated Values (CSV, sometimes also called character Separated Values because Separated characters may not be commas) whose files store tabular data (numeric and text) in plain text form.
Furthermore, in the second step, after the incomplete tasks in the oracle database are searched, the searched inter-completion tasks are also subjected to search completion marking.
Further, the method for sending the task sequence to the stack-based message channel in step four is as follows: the sequence of tasks from the tail to the head is put into the stack-based message channel.
Further, the step four further includes a step of performing cleaning and filtering on the target data acquired by each subroutine, and the specific method of cleaning and filtering is as follows: and deleting n continuous pieces of data which have the same time and have the data value of 255 or 65535.
In the embodiment, the request of the web server is made into tasks through the task management system and is stored in the Oracle database, each task has the urgency, the level of a user for creating the task and the task creation time, and when data retrieval is performed in the step 3, the urgency, the user role level and the task creation time of the task are used as priority rules of the task retrieval.
According to the invention, unfinished tasks in the oracle database are inquired at regular time through the timed tasks in the task management system and according to the task execution time set in the configuration file (the task execution time is written in the configuration file, the modification at any time is convenient, the program does not need to be compiled again, and the execution is performed once within 30 seconds currently set). In order to avoid repeated query of a timing task on data (web request) in a database (if the same data is queried twice, hbase is queried twice, which results in data repetition), when query is performed for 30 seconds, a task state is temporarily changed (for example, a new web request, a flag bit in the database is 0, 0 is changed to 1 during query, only the flag bit is queried to be 1 during query, and data with a flag bit of 1 is updated to be 2 after query is completed), so that when a next 30-second timing task is executed, since data with a flag bit of 0 is changed to be 1, re-query cannot be performed). And submitting the query result to the task distribution system.
And through a task distribution system, calculating the execution sequence of each task according to a rule input when the task is created in the step 1 by the query result (the task levels are A, B and C, the user levels are high, medium and low, the sequence is according to A > B > C, high, medium and low, the time difference value of the system is smaller, and the higher the level is), putting the task with the later execution sequence into a message channel (as the stack is characterized in that the task with the earlier execution sequence is executed first, the task with the earlier execution sequence can be ensured to be executed first, because the data quantity in the hbase database is large, the query cannot be executed within 30s, if the ordered request is directly submitted to a large data retrieval program, a plurality of requests are accumulated, a program similar to a data transfer station is obtained, the web request is temporarily stored and called by multithreading, and the multithreading has the advantages that the data can be obtained by configuring a thread pool and quickly going to the stack according to the quantity of the threads, the thread is released, and the other web requests can be continuously queried after the hbase is executed. And then, the stack data is operated by multiple threads concurrently, and the data (the sequenced web requests) is submitted to a retrieval program (step 4) to be retrieved in the hbase database.
The retrieved data is firstly cleaned to filter out error data (the time is the same in 10 continuous data, and the value of the data is 255 or 65535), converted (in order to save hbase storage space and reduce server pressure, character type column names are converted into numerical type during data storage, after data is read, each column name of the numerical type needs to be converted into corresponding character type), arranged (according to ascending and descending order of a designated column), and then csv files are generated.
The big data storage is stored according to months, and a table is built every month, so when data retrieval is carried out, a plurality of months need to be calculated within a time range according to the starting time and the ending time of a task, and then retrieval is carried out in different tables. When data in a table is searched, if data in one month is queried at a time, memory overflow can be caused, so data in each month table is queried by days, each month is divided into a plurality of days, the queried data is directly written into a csv file, and meanwhile, memory space is released. In the same task, when data is queried again, the csv file generated last time needs to be opened, the data found this time is added to the csv file generated last time, and then the memory space is released. Until the data of each day in the starting and ending time of the task is successfully retrieved and added into the csv file, each subtask can inquire various data (for example, the data can be original data or offset data after operation), after each data generates the csv file, a plurality of csv files are decompressed into a zip compression packet, the compression file is stored in a mongo database, the file ID is stored in an oracle database, and the task state is updated.
And (3) the task management system queries the completed tasks in the oracle database task table according to the time rules (different from the task execution time mentioned in the step (2), and every ten seconds is currently set by the system) configured in the configuration file. (client side refreshes web page regularly according to time rule in configuration file, the time rule refers to the completed task in query database.)
The invention provides a mode for displaying big data by creating tasks, aiming at solving the problems of large query data volume and frequent access times. The web server does not directly access the hbase, but packs the query request into a message through a task system, puts the message into a message channel, multithreads and concurrently operates stack data through Spring Integration, distributes tasks to different data query programs, generates a csv file in a file adding mode after the query result is cleaned, converted and arranged, saves the file to a mongo database, saves the file ID to an oracle database, and informs the web server of the task of the generated file.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that features described in different dependent claims and herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.
Claims (4)
1. A big data query method based on a T-BOX platform is characterized by specifically comprising the following steps:
step one, making a task request received by a web server of a T-BOX platform into a task, storing the task request in an Oracle database, and marking the creation time, the urgency and the priority of each task;
step two, searching uncompleted tasks in the oracle database at regular time, and sending a search result data set to a task distribution system;
thirdly, the task distribution system sorts the tasks in the result data set according to the creation time, the urgency and the priority to obtain a task sequence;
step four, sending the task sequence to a stack-based message channel, extracting tasks in the message channel through multithreading, dividing the tasks into a plurality of subtasks according to the query time length of the tasks, executing the subtasks in multithreading to access the hbase database, and obtaining a group of target data by each subprogram;
encoding the acquired target data to generate a csv file, adding the csv file until all target data acquired by all subtasks of one task are encoded, and compressing a plurality of csv files into a compressed packet;
step six, storing the compressed packet into a mongo database, storing the ID of the compressed packet into an oracle database, and marking the completed task;
and step seven, the compressed packet is sent to the client corresponding to the ID, and the client queries the completed tasks in the database at regular time to complete the query of one task.
2. The big data query method based on the T-BOX platform as claimed in claim 1, wherein in the second step, after the incomplete tasks in the oracle database are searched, the search completion flag is further performed on the searched incomplete tasks.
3. The big data query method based on the T-BOX platform as claimed in claim 1, wherein the task sequence is sent to the stack-based message channel in the fourth step by: the sequence of the tasks from the tail to the head is put into the stack-based message channel.
4. The big data query method based on the T-BOX platform as claimed in claim 1, wherein the step four further comprises a step of cleaning and filtering the target data obtained by each subroutine, and the specific method of cleaning and filtering is as follows: and deleting n continuous pieces of data which have the same time and have the data value of 255 or 65535.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011424744.8A CN112527836B (en) | 2020-12-08 | 2020-12-08 | Big data query method based on T-BOX platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011424744.8A CN112527836B (en) | 2020-12-08 | 2020-12-08 | Big data query method based on T-BOX platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112527836A CN112527836A (en) | 2021-03-19 |
CN112527836B true CN112527836B (en) | 2022-12-30 |
Family
ID=74998304
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011424744.8A Active CN112527836B (en) | 2020-12-08 | 2020-12-08 | Big data query method based on T-BOX platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112527836B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113568986B (en) * | 2021-07-06 | 2024-05-10 | 东风汽车集团股份有限公司 | Remote networking terminal production data matching method and system |
CN114301846A (en) * | 2021-12-28 | 2022-04-08 | 中国电信股份有限公司 | Communication method, communication apparatus, storage medium, and processor |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102855277A (en) * | 2012-07-23 | 2013-01-02 | 中国联合网络通信集团有限公司 | Data center system and data processing method |
CN107025074A (en) * | 2017-04-25 | 2017-08-08 | 航天科技控股集团股份有限公司 | A kind of picture storage method based on recorder platform |
CN108536842A (en) * | 2018-04-13 | 2018-09-14 | 航天科技控股集团股份有限公司 | A kind of file automatic archiving method and system based on intelligent management platform |
CN109062697A (en) * | 2018-08-07 | 2018-12-21 | 北京超图软件股份有限公司 | It is a kind of that the method and apparatus of spatial analysis service are provided |
CN109739818A (en) * | 2018-12-28 | 2019-05-10 | 浪潮软件股份有限公司 | A kind of portable high-throughput big data acquisition method and system |
CN110046287A (en) * | 2019-03-19 | 2019-07-23 | 厦门市美亚柏科信息股份有限公司 | A kind of the data query method, apparatus and storage medium unrelated with type of database |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160241676A1 (en) * | 2015-02-18 | 2016-08-18 | Dashcube LLC | Method and apparatus for storing, accessing and displaying past application states |
US11403318B2 (en) * | 2015-10-01 | 2022-08-02 | Futurewei Technologies, Inc. | Apparatus and method for managing storage of a primary database and a replica database |
US11416528B2 (en) * | 2016-09-26 | 2022-08-16 | Splunk Inc. | Query acceleration data store |
-
2020
- 2020-12-08 CN CN202011424744.8A patent/CN112527836B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102855277A (en) * | 2012-07-23 | 2013-01-02 | 中国联合网络通信集团有限公司 | Data center system and data processing method |
CN107025074A (en) * | 2017-04-25 | 2017-08-08 | 航天科技控股集团股份有限公司 | A kind of picture storage method based on recorder platform |
CN108536842A (en) * | 2018-04-13 | 2018-09-14 | 航天科技控股集团股份有限公司 | A kind of file automatic archiving method and system based on intelligent management platform |
CN109062697A (en) * | 2018-08-07 | 2018-12-21 | 北京超图软件股份有限公司 | It is a kind of that the method and apparatus of spatial analysis service are provided |
CN109739818A (en) * | 2018-12-28 | 2019-05-10 | 浪潮软件股份有限公司 | A kind of portable high-throughput big data acquisition method and system |
CN110046287A (en) * | 2019-03-19 | 2019-07-23 | 厦门市美亚柏科信息股份有限公司 | A kind of the data query method, apparatus and storage medium unrelated with type of database |
Non-Patent Citations (5)
Title |
---|
MongoDB--Spring Data MongoDB详细的操作手册(增删改查);ccww;《https://zhuanlan.zhihu.com/p/85675213》;20191008;1-4 * |
MongoDb-大数据查询优化;喝醉的咕咕鸟;《https://blog.csdn.net/weixin_43549578/article/details/106104099》;20200513;1-2 * |
Performance Analysis of RDBMS and Hadoop Components with Their File Formats for the Development of Recommender Systems;Anchal Gupta等;《2018 3rd International Conference for Convergence in Technology (I2CT)》;20181111;1-6 * |
中小企业会计档案云存储平台研究;李伟;《中国优秀硕士学位论文全文数据库经济与管理科学辑》;20170215(第2期);J152-3810 * |
云计算下基于优化XGBoost的网约车供需预测研究;李泽宇;《中国优秀硕士学位论文全文数据库信息科技辑》;20190115(第1期);I140-572 * |
Also Published As
Publication number | Publication date |
---|---|
CN112527836A (en) | 2021-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9870382B2 (en) | Data encoding and corresponding data structure | |
CN102521406B (en) | Distributed query method and system for complex task of querying massive structured data | |
CN102521405B (en) | Massive structured data storage and query methods and systems supporting high-speed loading | |
US8712972B2 (en) | Query optimization with awareness of limited resource usage | |
CN103473239B (en) | A kind of data of non relational database update method and device | |
CN109255055B (en) | Graph data access method and device based on grouping association table | |
CN112527836B (en) | Big data query method based on T-BOX platform | |
CN111460023A (en) | Service data processing method, device, equipment and storage medium based on elastic search | |
CN111506621B (en) | Data statistical method and device | |
CN105117417A (en) | Read-optimized memory database Trie tree index method | |
CN106909554B (en) | Method and device for loading database text table data | |
CN105843933B (en) | The index establishing method of distributed memory columnar database | |
CN104731945A (en) | Full-text searching method and device based on HBase | |
CN106095863A (en) | A kind of multidimensional data query and storage system and method | |
CN108874930A (en) | File attribute information statistical method, device, system, equipment and storage medium | |
Chambi et al. | Optimizing druid with roaring bitmaps | |
CN103092886B (en) | A kind of implementation method of data query operation, Apparatus and system | |
CN104408128B (en) | A kind of reading optimization method indexed based on B+ trees asynchronous refresh | |
CN105302915A (en) | High-performance data processing system based on memory calculation | |
CN103116641A (en) | Acquisition method of ordering statistical data and ordering device | |
KR101955376B1 (en) | Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method | |
CN104462080A (en) | Index structure creating method and system with group statistics for search results | |
US8700822B2 (en) | Parallel aggregation system | |
Liroz-Gistau et al. | Dynamic workload-based partitioning algorithms for continuously growing databases | |
CN116089364A (en) | Storage file management method and device, AI platform and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |