CN112527836B

CN112527836B - Big data query method based on T-BOX platform

Info

Publication number: CN112527836B
Application number: CN202011424744.8A
Authority: CN
Inventors: 姜海峰; 刘明月; 姜军; 陈玉锋
Original assignee: Aerospace Hi Tech Holding Group Co Ltd
Current assignee: Aerospace Hi Tech Holding Group Co Ltd
Priority date: 2020-12-08
Filing date: 2020-12-08
Publication date: 2022-12-30
Anticipated expiration: 2040-12-08
Also published as: CN112527836A

Abstract

A big data query method based on a T-BOX platform relates to the technical field of new energy automobile data query. The problems that a terminal in the existing T-BOX platform accesses a web server frequently and the query data volume is large are solved. The task request received by the web server is made into a task and stored in an Oracle database, uncompleted tasks in the Oracle database are retrieved regularly, a retrieval result data set is sent to a task distribution system, and the task distribution system sequences the tasks to obtain a task sequence; sending the task sequence to a stack-based message channel, extracting tasks in the message channel through multithreading, dividing the tasks into a plurality of subtasks according to the query time length of the tasks, and executing the subtasks in multithreading to access a hbase database to obtain target data; encoding the obtained target data to generate a csv file, and compressing the csv file into a compression packet; and storing the compressed packet into a mongo database, storing the ID of the compressed packet into an oracle database, and marking the finished task. The method is suitable for data query of the T-BOX platform.

Description

Big data query method based on T-BOX platform

Technical Field

The invention relates to the technical field of new energy automobile data query.

Background

A terminal in a T-BOX (Telematics BOX, vehicle-mounted T-BOX for short) platform uploads one piece of data every second, 1 trolley uploads 86400 pieces of data every day, and if a ten thousand trolleys access to the platform, nearly 9 hundred million pieces of data are stored in a hbase (distributed and nematic open source database) every day. Because the amount of data in the hbase database is particularly large, if the web server directly accesses the hbase query data, the web server may have a long waiting time, which greatly reduces the performance of the deployed project.

Disclosure of Invention

The invention aims to solve the problems that a terminal in the existing T-BOX platform frequently accesses a web server and the query data volume is large. The invention discloses a big data query method based on a T-BOX platform.

The invention relates to a big data query method based on a T-BOX platform, which specifically comprises the following steps:

step one, a task request received by a web server of a T-BOX platform is made into a task and stored in an Oracle database, and the creation time, the urgency and the priority of each task are marked;

step two, searching uncompleted tasks in the oracle database at regular time, and sending a search result data set to a task distribution system;

thirdly, the task distribution system sorts the tasks in the result data set according to the creation time, the urgency and the priority to obtain a task sequence;

step four, sending the task sequence to a stack-based message channel, extracting tasks in the message channel through multithreading, dividing the tasks into a plurality of subtasks according to the query time length of the tasks, executing the subtasks in multithreading to access the hbase database, and obtaining a group of target data by each subprogram;

encoding the acquired target data to generate a csv file, adding the csv file until all target data acquired by all subtasks of one task are encoded, and compressing a plurality of csv files into a compressed packet;

step six, storing the compressed packet into a mongo database, storing the ID of the compressed packet into an oracle database, and marking the completed task;

and step seven, sending the compressed packet to the client corresponding to the ID, and regularly querying the completed tasks in the database by the client to complete the query of one task.

Furthermore, in the second step, after the incomplete tasks in the oracle database are searched, the searched incomplete tasks are also marked with search completion.

Further, the method for sending the task sequence to the stack-based message channel in step four is as follows: the sequence of tasks from the tail to the head is put into the stack-based message channel.

Further, the step four further includes a step of performing cleaning and filtering on the target data acquired by each subroutine, and the specific method of cleaning and filtering is as follows: and deleting n continuous pieces of data which have the same time and have the data value of 255 or 65535.

The method of the invention adopts error-prone parts such as multithreading, cache queues and the like, which are all realized by using a spring integration framework, wherein an inherent mature and stable module in the spring integration framework is an integrated mode, and is used for carrying out message transmission between systems by an event-driven message framework, and guiding data to a next required place through a pipeline. The method can be called spring integration, and ensures the stability of the program. When the csv file generated by the retrieval result is stored in the mongo database, objects do not need to be created, the cleaned and converted data can be directly stored in the mongo database in a streaming mode, the file ID is automatically generated, the memory is rapidly released, and when other platforms also need to check the data uploaded by the equipment in real time, the current system can be provided for other platforms for use only by modifying the port of a ZMQ (ZeroMQ interface) for receiving the data and the configuration of an oracle database.

Drawings

FIG. 1 is a schematic block diagram of the method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The first embodiment is as follows: the present embodiment is described below with reference to fig. 1, and the method for querying big data based on a T-BOX platform in the present embodiment specifically includes:

step one, making a task request received by a web server of a T-BOX platform into a task, storing the task request in an Oracle database, and marking the creation time, the urgency and the priority of each task;

encoding the acquired target data to generate a csv file until all target data acquired by all subtasks of one task are encoded, adding the csv file, and compressing a plurality of csv files into a compression packet;

In this embodiment, the mongo Database is a Database based on distributed file storage, and the Oracle Database is Oracle Database, also called Oracle RDBMS, or Oracle for short. Is a relational database management system of the oracle culture company. Comma Separated Values (CSV, sometimes also called character Separated Values because Separated characters may not be commas) whose files store tabular data (numeric and text) in plain text form.

Furthermore, in the second step, after the incomplete tasks in the oracle database are searched, the searched inter-completion tasks are also subjected to search completion marking.

In the embodiment, the request of the web server is made into tasks through the task management system and is stored in the Oracle database, each task has the urgency, the level of a user for creating the task and the task creation time, and when data retrieval is performed in the step 3, the urgency, the user role level and the task creation time of the task are used as priority rules of the task retrieval.

According to the invention, unfinished tasks in the oracle database are inquired at regular time through the timed tasks in the task management system and according to the task execution time set in the configuration file (the task execution time is written in the configuration file, the modification at any time is convenient, the program does not need to be compiled again, and the execution is performed once within 30 seconds currently set). In order to avoid repeated query of a timing task on data (web request) in a database (if the same data is queried twice, hbase is queried twice, which results in data repetition), when query is performed for 30 seconds, a task state is temporarily changed (for example, a new web request, a flag bit in the database is 0, 0 is changed to 1 during query, only the flag bit is queried to be 1 during query, and data with a flag bit of 1 is updated to be 2 after query is completed), so that when a next 30-second timing task is executed, since data with a flag bit of 0 is changed to be 1, re-query cannot be performed). And submitting the query result to the task distribution system.

And through a task distribution system, calculating the execution sequence of each task according to a rule input when the task is created in the step 1 by the query result (the task levels are A, B and C, the user levels are high, medium and low, the sequence is according to A > B > C, high, medium and low, the time difference value of the system is smaller, and the higher the level is), putting the task with the later execution sequence into a message channel (as the stack is characterized in that the task with the earlier execution sequence is executed first, the task with the earlier execution sequence can be ensured to be executed first, because the data quantity in the hbase database is large, the query cannot be executed within 30s, if the ordered request is directly submitted to a large data retrieval program, a plurality of requests are accumulated, a program similar to a data transfer station is obtained, the web request is temporarily stored and called by multithreading, and the multithreading has the advantages that the data can be obtained by configuring a thread pool and quickly going to the stack according to the quantity of the threads, the thread is released, and the other web requests can be continuously queried after the hbase is executed. And then, the stack data is operated by multiple threads concurrently, and the data (the sequenced web requests) is submitted to a retrieval program (step 4) to be retrieved in the hbase database.

The retrieved data is firstly cleaned to filter out error data (the time is the same in 10 continuous data, and the value of the data is 255 or 65535), converted (in order to save hbase storage space and reduce server pressure, character type column names are converted into numerical type during data storage, after data is read, each column name of the numerical type needs to be converted into corresponding character type), arranged (according to ascending and descending order of a designated column), and then csv files are generated.

The big data storage is stored according to months, and a table is built every month, so when data retrieval is carried out, a plurality of months need to be calculated within a time range according to the starting time and the ending time of a task, and then retrieval is carried out in different tables. When data in a table is searched, if data in one month is queried at a time, memory overflow can be caused, so data in each month table is queried by days, each month is divided into a plurality of days, the queried data is directly written into a csv file, and meanwhile, memory space is released. In the same task, when data is queried again, the csv file generated last time needs to be opened, the data found this time is added to the csv file generated last time, and then the memory space is released. Until the data of each day in the starting and ending time of the task is successfully retrieved and added into the csv file, each subtask can inquire various data (for example, the data can be original data or offset data after operation), after each data generates the csv file, a plurality of csv files are decompressed into a zip compression packet, the compression file is stored in a mongo database, the file ID is stored in an oracle database, and the task state is updated.

And (3) the task management system queries the completed tasks in the oracle database task table according to the time rules (different from the task execution time mentioned in the step (2), and every ten seconds is currently set by the system) configured in the configuration file. (client side refreshes web page regularly according to time rule in configuration file, the time rule refers to the completed task in query database.)

The invention provides a mode for displaying big data by creating tasks, aiming at solving the problems of large query data volume and frequent access times. The web server does not directly access the hbase, but packs the query request into a message through a task system, puts the message into a message channel, multithreads and concurrently operates stack data through Spring Integration, distributes tasks to different data query programs, generates a csv file in a file adding mode after the query result is cleaned, converted and arranged, saves the file to a mongo database, saves the file ID to an oracle database, and informs the web server of the task of the generated file.

Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that features described in different dependent claims and herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.

Claims

1. A big data query method based on a T-BOX platform is characterized by specifically comprising the following steps:

and step seven, the compressed packet is sent to the client corresponding to the ID, and the client queries the completed tasks in the database at regular time to complete the query of one task.

2. The big data query method based on the T-BOX platform as claimed in claim 1, wherein in the second step, after the incomplete tasks in the oracle database are searched, the search completion flag is further performed on the searched incomplete tasks.

3. The big data query method based on the T-BOX platform as claimed in claim 1, wherein the task sequence is sent to the stack-based message channel in the fourth step by: the sequence of the tasks from the tail to the head is put into the stack-based message channel.

4. The big data query method based on the T-BOX platform as claimed in claim 1, wherein the step four further comprises a step of cleaning and filtering the target data obtained by each subroutine, and the specific method of cleaning and filtering is as follows: and deleting n continuous pieces of data which have the same time and have the data value of 255 or 65535.