CN112527836B - Big data query method based on T-BOX platform - Google Patents

Big data query method based on T-BOX platform Download PDF

Info

Publication number
CN112527836B
CN112527836B CN202011424744.8A CN202011424744A CN112527836B CN 112527836 B CN112527836 B CN 112527836B CN 202011424744 A CN202011424744 A CN 202011424744A CN 112527836 B CN112527836 B CN 112527836B
Authority
CN
China
Prior art keywords
task
tasks
data
database
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011424744.8A
Other languages
Chinese (zh)
Other versions
CN112527836A (en
Inventor
姜海峰
刘明月
姜军
陈玉锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Hi Tech Holding Group Co Ltd
Original Assignee
Aerospace Hi Tech Holding Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Hi Tech Holding Group Co Ltd filed Critical Aerospace Hi Tech Holding Group Co Ltd
Priority to CN202011424744.8A priority Critical patent/CN112527836B/en
Publication of CN112527836A publication Critical patent/CN112527836A/en
Application granted granted Critical
Publication of CN112527836B publication Critical patent/CN112527836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A big data query method based on a T-BOX platform relates to the technical field of new energy automobile data query. The problems that a terminal in the existing T-BOX platform accesses a web server frequently and the query data volume is large are solved. The task request received by the web server is made into a task and stored in an Oracle database, uncompleted tasks in the Oracle database are retrieved regularly, a retrieval result data set is sent to a task distribution system, and the task distribution system sequences the tasks to obtain a task sequence; sending the task sequence to a stack-based message channel, extracting tasks in the message channel through multithreading, dividing the tasks into a plurality of subtasks according to the query time length of the tasks, and executing the subtasks in multithreading to access a hbase database to obtain target data; encoding the obtained target data to generate a csv file, and compressing the csv file into a compression packet; and storing the compressed packet into a mongo database, storing the ID of the compressed packet into an oracle database, and marking the finished task. The method is suitable for data query of the T-BOX platform.

Description

Big data query method based on T-BOX platform
Technical Field
The invention relates to the technical field of new energy automobile data query.
Background
A terminal in a T-BOX (Telematics BOX, vehicle-mounted T-BOX for short) platform uploads one piece of data every second, 1 trolley uploads 86400 pieces of data every day, and if a ten thousand trolleys access to the platform, nearly 9 hundred million pieces of data are stored in a hbase (distributed and nematic open source database) every day. Because the amount of data in the hbase database is particularly large, if the web server directly accesses the hbase query data, the web server may have a long waiting time, which greatly reduces the performance of the deployed project.
Disclosure of Invention
The invention aims to solve the problems that a terminal in the existing T-BOX platform frequently accesses a web server and the query data volume is large. The invention discloses a big data query method based on a T-BOX platform.
The invention relates to a big data query method based on a T-BOX platform, which specifically comprises the following steps:
step one, a task request received by a web server of a T-BOX platform is made into a task and stored in an Oracle database, and the creation time, the urgency and the priority of each task are marked;
step two, searching uncompleted tasks in the oracle database at regular time, and sending a search result data set to a task distribution system;
thirdly, the task distribution system sorts the tasks in the result data set according to the creation time, the urgency and the priority to obtain a task sequence;
step four, sending the task sequence to a stack-based message channel, extracting tasks in the message channel through multithreading, dividing the tasks into a plurality of subtasks according to the query time length of the tasks, executing the subtasks in multithreading to access the hbase database, and obtaining a group of target data by each subprogram;
encoding the acquired target data to generate a csv file, adding the csv file until all target data acquired by all subtasks of one task are encoded, and compressing a plurality of csv files into a compressed packet;
step six, storing the compressed packet into a mongo database, storing the ID of the compressed packet into an oracle database, and marking the completed task;
and step seven, sending the compressed packet to the client corresponding to the ID, and regularly querying the completed tasks in the database by the client to complete the query of one task.
Furthermore, in the second step, after the incomplete tasks in the oracle database are searched, the searched incomplete tasks are also marked with search completion.
Further, the method for sending the task sequence to the stack-based message channel in step four is as follows: the sequence of tasks from the tail to the head is put into the stack-based message channel.
Further, the step four further includes a step of performing cleaning and filtering on the target data acquired by each subroutine, and the specific method of cleaning and filtering is as follows: and deleting n continuous pieces of data which have the same time and have the data value of 255 or 65535.
The method of the invention adopts error-prone parts such as multithreading, cache queues and the like, which are all realized by using a spring integration framework, wherein an inherent mature and stable module in the spring integration framework is an integrated mode, and is used for carrying out message transmission between systems by an event-driven message framework, and guiding data to a next required place through a pipeline. The method can be called spring integration, and ensures the stability of the program. When the csv file generated by the retrieval result is stored in the mongo database, objects do not need to be created, the cleaned and converted data can be directly stored in the mongo database in a streaming mode, the file ID is automatically generated, the memory is rapidly released, and when other platforms also need to check the data uploaded by the equipment in real time, the current system can be provided for other platforms for use only by modifying the port of a ZMQ (ZeroMQ interface) for receiving the data and the configuration of an oracle database.
Drawings
FIG. 1 is a schematic block diagram of the method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The first embodiment is as follows: the present embodiment is described below with reference to fig. 1, and the method for querying big data based on a T-BOX platform in the present embodiment specifically includes:
step one, making a task request received by a web server of a T-BOX platform into a task, storing the task request in an Oracle database, and marking the creation time, the urgency and the priority of each task;
step two, searching uncompleted tasks in the oracle database at regular time, and sending a search result data set to a task distribution system;
thirdly, the task distribution system sorts the tasks in the result data set according to the creation time, the urgency and the priority to obtain a task sequence;
step four, sending the task sequence to a stack-based message channel, extracting tasks in the message channel through multithreading, dividing the tasks into a plurality of subtasks according to the query time length of the tasks, executing the subtasks in multithreading to access the hbase database, and obtaining a group of target data by each subprogram;
encoding the acquired target data to generate a csv file until all target data acquired by all subtasks of one task are encoded, adding the csv file, and compressing a plurality of csv files into a compression packet;
step six, storing the compressed packet into a mongo database, storing the ID of the compressed packet into an oracle database, and marking the completed task;
and step seven, sending the compressed packet to the client corresponding to the ID, and regularly querying the completed tasks in the database by the client to complete the query of one task.
In this embodiment, the mongo Database is a Database based on distributed file storage, and the Oracle Database is Oracle Database, also called Oracle RDBMS, or Oracle for short. Is a relational database management system of the oracle culture company. Comma Separated Values (CSV, sometimes also called character Separated Values because Separated characters may not be commas) whose files store tabular data (numeric and text) in plain text form.
Furthermore, in the second step, after the incomplete tasks in the oracle database are searched, the searched inter-completion tasks are also subjected to search completion marking.
Further, the method for sending the task sequence to the stack-based message channel in step four is as follows: the sequence of tasks from the tail to the head is put into the stack-based message channel.
Further, the step four further includes a step of performing cleaning and filtering on the target data acquired by each subroutine, and the specific method of cleaning and filtering is as follows: and deleting n continuous pieces of data which have the same time and have the data value of 255 or 65535.
In the embodiment, the request of the web server is made into tasks through the task management system and is stored in the Oracle database, each task has the urgency, the level of a user for creating the task and the task creation time, and when data retrieval is performed in the step 3, the urgency, the user role level and the task creation time of the task are used as priority rules of the task retrieval.
According to the invention, unfinished tasks in the oracle database are inquired at regular time through the timed tasks in the task management system and according to the task execution time set in the configuration file (the task execution time is written in the configuration file, the modification at any time is convenient, the program does not need to be compiled again, and the execution is performed once within 30 seconds currently set). In order to avoid repeated query of a timing task on data (web request) in a database (if the same data is queried twice, hbase is queried twice, which results in data repetition), when query is performed for 30 seconds, a task state is temporarily changed (for example, a new web request, a flag bit in the database is 0, 0 is changed to 1 during query, only the flag bit is queried to be 1 during query, and data with a flag bit of 1 is updated to be 2 after query is completed), so that when a next 30-second timing task is executed, since data with a flag bit of 0 is changed to be 1, re-query cannot be performed). And submitting the query result to the task distribution system.
And through a task distribution system, calculating the execution sequence of each task according to a rule input when the task is created in the step 1 by the query result (the task levels are A, B and C, the user levels are high, medium and low, the sequence is according to A > B > C, high, medium and low, the time difference value of the system is smaller, and the higher the level is), putting the task with the later execution sequence into a message channel (as the stack is characterized in that the task with the earlier execution sequence is executed first, the task with the earlier execution sequence can be ensured to be executed first, because the data quantity in the hbase database is large, the query cannot be executed within 30s, if the ordered request is directly submitted to a large data retrieval program, a plurality of requests are accumulated, a program similar to a data transfer station is obtained, the web request is temporarily stored and called by multithreading, and the multithreading has the advantages that the data can be obtained by configuring a thread pool and quickly going to the stack according to the quantity of the threads, the thread is released, and the other web requests can be continuously queried after the hbase is executed. And then, the stack data is operated by multiple threads concurrently, and the data (the sequenced web requests) is submitted to a retrieval program (step 4) to be retrieved in the hbase database.
The retrieved data is firstly cleaned to filter out error data (the time is the same in 10 continuous data, and the value of the data is 255 or 65535), converted (in order to save hbase storage space and reduce server pressure, character type column names are converted into numerical type during data storage, after data is read, each column name of the numerical type needs to be converted into corresponding character type), arranged (according to ascending and descending order of a designated column), and then csv files are generated.
The big data storage is stored according to months, and a table is built every month, so when data retrieval is carried out, a plurality of months need to be calculated within a time range according to the starting time and the ending time of a task, and then retrieval is carried out in different tables. When data in a table is searched, if data in one month is queried at a time, memory overflow can be caused, so data in each month table is queried by days, each month is divided into a plurality of days, the queried data is directly written into a csv file, and meanwhile, memory space is released. In the same task, when data is queried again, the csv file generated last time needs to be opened, the data found this time is added to the csv file generated last time, and then the memory space is released. Until the data of each day in the starting and ending time of the task is successfully retrieved and added into the csv file, each subtask can inquire various data (for example, the data can be original data or offset data after operation), after each data generates the csv file, a plurality of csv files are decompressed into a zip compression packet, the compression file is stored in a mongo database, the file ID is stored in an oracle database, and the task state is updated.
And (3) the task management system queries the completed tasks in the oracle database task table according to the time rules (different from the task execution time mentioned in the step (2), and every ten seconds is currently set by the system) configured in the configuration file. (client side refreshes web page regularly according to time rule in configuration file, the time rule refers to the completed task in query database.)
The invention provides a mode for displaying big data by creating tasks, aiming at solving the problems of large query data volume and frequent access times. The web server does not directly access the hbase, but packs the query request into a message through a task system, puts the message into a message channel, multithreads and concurrently operates stack data through Spring Integration, distributes tasks to different data query programs, generates a csv file in a file adding mode after the query result is cleaned, converted and arranged, saves the file to a mongo database, saves the file ID to an oracle database, and informs the web server of the task of the generated file.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that features described in different dependent claims and herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.

Claims (4)

1. A big data query method based on a T-BOX platform is characterized by specifically comprising the following steps:
step one, making a task request received by a web server of a T-BOX platform into a task, storing the task request in an Oracle database, and marking the creation time, the urgency and the priority of each task;
step two, searching uncompleted tasks in the oracle database at regular time, and sending a search result data set to a task distribution system;
thirdly, the task distribution system sorts the tasks in the result data set according to the creation time, the urgency and the priority to obtain a task sequence;
step four, sending the task sequence to a stack-based message channel, extracting tasks in the message channel through multithreading, dividing the tasks into a plurality of subtasks according to the query time length of the tasks, executing the subtasks in multithreading to access the hbase database, and obtaining a group of target data by each subprogram;
encoding the acquired target data to generate a csv file, adding the csv file until all target data acquired by all subtasks of one task are encoded, and compressing a plurality of csv files into a compressed packet;
step six, storing the compressed packet into a mongo database, storing the ID of the compressed packet into an oracle database, and marking the completed task;
and step seven, the compressed packet is sent to the client corresponding to the ID, and the client queries the completed tasks in the database at regular time to complete the query of one task.
2. The big data query method based on the T-BOX platform as claimed in claim 1, wherein in the second step, after the incomplete tasks in the oracle database are searched, the search completion flag is further performed on the searched incomplete tasks.
3. The big data query method based on the T-BOX platform as claimed in claim 1, wherein the task sequence is sent to the stack-based message channel in the fourth step by: the sequence of the tasks from the tail to the head is put into the stack-based message channel.
4. The big data query method based on the T-BOX platform as claimed in claim 1, wherein the step four further comprises a step of cleaning and filtering the target data obtained by each subroutine, and the specific method of cleaning and filtering is as follows: and deleting n continuous pieces of data which have the same time and have the data value of 255 or 65535.
CN202011424744.8A 2020-12-08 2020-12-08 Big data query method based on T-BOX platform Active CN112527836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011424744.8A CN112527836B (en) 2020-12-08 2020-12-08 Big data query method based on T-BOX platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011424744.8A CN112527836B (en) 2020-12-08 2020-12-08 Big data query method based on T-BOX platform

Publications (2)

Publication Number Publication Date
CN112527836A CN112527836A (en) 2021-03-19
CN112527836B true CN112527836B (en) 2022-12-30

Family

ID=74998304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011424744.8A Active CN112527836B (en) 2020-12-08 2020-12-08 Big data query method based on T-BOX platform

Country Status (1)

Country Link
CN (1) CN112527836B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113568986B (en) * 2021-07-06 2024-05-10 东风汽车集团股份有限公司 Remote networking terminal production data matching method and system
CN114301846A (en) * 2021-12-28 2022-04-08 中国电信股份有限公司 Communication method, communication apparatus, storage medium, and processor

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855277A (en) * 2012-07-23 2013-01-02 中国联合网络通信集团有限公司 Data center system and data processing method
CN107025074A (en) * 2017-04-25 2017-08-08 航天科技控股集团股份有限公司 A kind of picture storage method based on recorder platform
CN108536842A (en) * 2018-04-13 2018-09-14 航天科技控股集团股份有限公司 A kind of file automatic archiving method and system based on intelligent management platform
CN109062697A (en) * 2018-08-07 2018-12-21 北京超图软件股份有限公司 It is a kind of that the method and apparatus of spatial analysis service are provided
CN109739818A (en) * 2018-12-28 2019-05-10 浪潮软件股份有限公司 A kind of portable high-throughput big data acquisition method and system
CN110046287A (en) * 2019-03-19 2019-07-23 厦门市美亚柏科信息股份有限公司 A kind of the data query method, apparatus and storage medium unrelated with type of database

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160241676A1 (en) * 2015-02-18 2016-08-18 Dashcube LLC Method and apparatus for storing, accessing and displaying past application states
US11403318B2 (en) * 2015-10-01 2022-08-02 Futurewei Technologies, Inc. Apparatus and method for managing storage of a primary database and a replica database
US11416528B2 (en) * 2016-09-26 2022-08-16 Splunk Inc. Query acceleration data store

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855277A (en) * 2012-07-23 2013-01-02 中国联合网络通信集团有限公司 Data center system and data processing method
CN107025074A (en) * 2017-04-25 2017-08-08 航天科技控股集团股份有限公司 A kind of picture storage method based on recorder platform
CN108536842A (en) * 2018-04-13 2018-09-14 航天科技控股集团股份有限公司 A kind of file automatic archiving method and system based on intelligent management platform
CN109062697A (en) * 2018-08-07 2018-12-21 北京超图软件股份有限公司 It is a kind of that the method and apparatus of spatial analysis service are provided
CN109739818A (en) * 2018-12-28 2019-05-10 浪潮软件股份有限公司 A kind of portable high-throughput big data acquisition method and system
CN110046287A (en) * 2019-03-19 2019-07-23 厦门市美亚柏科信息股份有限公司 A kind of the data query method, apparatus and storage medium unrelated with type of database

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
MongoDB--Spring Data MongoDB详细的操作手册(增删改查);ccww;《https://zhuanlan.zhihu.com/p/85675213》;20191008;1-4 *
MongoDb-大数据查询优化;喝醉的咕咕鸟;《https://blog.csdn.net/weixin_43549578/article/details/106104099》;20200513;1-2 *
Performance Analysis of RDBMS and Hadoop Components with Their File Formats for the Development of Recommender Systems;Anchal Gupta等;《2018 3rd International Conference for Convergence in Technology (I2CT)》;20181111;1-6 *
中小企业会计档案云存储平台研究;李伟;《中国优秀硕士学位论文全文数据库经济与管理科学辑》;20170215(第2期);J152-3810 *
云计算下基于优化XGBoost的网约车供需预测研究;李泽宇;《中国优秀硕士学位论文全文数据库信息科技辑》;20190115(第1期);I140-572 *

Also Published As

Publication number Publication date
CN112527836A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
US9870382B2 (en) Data encoding and corresponding data structure
CN102521406B (en) Distributed query method and system for complex task of querying massive structured data
CN102521405B (en) Massive structured data storage and query methods and systems supporting high-speed loading
US8712972B2 (en) Query optimization with awareness of limited resource usage
CN103473239B (en) A kind of data of non relational database update method and device
CN109255055B (en) Graph data access method and device based on grouping association table
CN112527836B (en) Big data query method based on T-BOX platform
CN111460023A (en) Service data processing method, device, equipment and storage medium based on elastic search
CN111506621B (en) Data statistical method and device
CN105117417A (en) Read-optimized memory database Trie tree index method
CN106909554B (en) Method and device for loading database text table data
CN105843933B (en) The index establishing method of distributed memory columnar database
CN104731945A (en) Full-text searching method and device based on HBase
CN106095863A (en) A kind of multidimensional data query and storage system and method
CN108874930A (en) File attribute information statistical method, device, system, equipment and storage medium
Chambi et al. Optimizing druid with roaring bitmaps
CN103092886B (en) A kind of implementation method of data query operation, Apparatus and system
CN104408128B (en) A kind of reading optimization method indexed based on B+ trees asynchronous refresh
CN105302915A (en) High-performance data processing system based on memory calculation
CN103116641A (en) Acquisition method of ordering statistical data and ordering device
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
CN104462080A (en) Index structure creating method and system with group statistics for search results
US8700822B2 (en) Parallel aggregation system
Liroz-Gistau et al. Dynamic workload-based partitioning algorithms for continuously growing databases
CN116089364A (en) Storage file management method and device, AI platform and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant