Summary of the invention
The object of the present invention is to provide the efficient storage method of a kind of mass data, provide a kind of highly effective gathering dispatching method of mass data warehouse-in for the applied characteristic of IT operation and maintenance tools, the efficiency and the accuracy that improve data loading.
Object of the present invention adopts following scheme to realize: the efficient storage method of mass data, comprises the following steps:
1), the configuration rule of configuration storage method;
2), log-on data memory controller carry out initialization;
3), data storage controller detects warehouse-in whether the information that needs warehouse-in in information cache, according to configuration rule, the information of needs warehouse-in is carried out to the encapsulation of warehouse-in task and management and running;
4), warehouse-in operational terminal manager carries out allocation schedule according to configuration rule to warehouse-in task;
5), warehouse-in operational terminal manager taking-up task from warehouse-in task buffer memory, and give the warehouse-in terminal of this free time and carry out;
6), warehouse-in terminal is carried out in-stockroom operation;
7), notification data memory controller is scanned into library information unloading file by warehouse-in operational terminal manager, and the library information that enters of unloading is re-started to scheduling;
8), when in controller closing process, the library information to be entered in unloading warehouse-in information cache, warehouse-in task buffer memory and by the library information to be entered of warehouse-in operational terminal refusal.
The present invention more specifically scheme is: the efficient storage method of mass data, comprises following key step:
1), the configuration rule of configuration storage method, the described configuration to configuration rule comprises setting: the every batch of optimum warehouse-in Information Number, wait for that optimal information counts that maximum duration, maximum can be carried number of tasks, warehouse-in mission failure number of retries, enter library information unloading strategy, entered library information unloading file, error message unloading file;
2), log-on data memory controller, in start-up course, data storage controller is initialization warehouse-in information cache first, then be scanned into library information unloading file and whether have the not information of warehouse-in, if had, these information are re-loaded in warehouse-in information cache, in order to scheduling;
3), after data storage controller starts, receive library information to be entered, controller detects in warehouse-in information cache whether have the information that needs warehouse-in, and whether the quantity that judges information has reached every batch of default optimum warehouse-in Information Number, if reached optimal number, be divided into a batch data, be packaged into a warehouse-in task according to optimal number, submit to warehouse-in operational terminal manager; If do not reach optimal number, wait for according to rule, count in maximum duration at wait optimal information, if data number reaches every batch of optimum warehouse-in Information Number in buffer memory, distribute a warehouse-in task by optimal data number, otherwise wait is encapsulated as total data a warehouse-in task, submits to warehouse-in operational terminal manager after finishing;
4), in the time that warehouse-in operational terminal manager receives warehouse-in task, according to configuration rule, judge whether task quantity in warehouse-in task buffer queue has reached default maximum and can carry number of tasks, if do not had, this task is put in warehouse-in task buffer memory; Can carry number of tasks if exceeded maximum, enter library information unloading strategy according to what set in configuration rule, directly this task is dumped in library information unloading file, and/or the part task in random unloading warehouse-in task buffer memory is to entering in library information unloading file in proportion;
5), warehouse-in operational terminal manager distributes warehouse-in task, first judge in warehouse-in task buffer memory and whether have unappropriated task, if had, judge whether the warehouse-in terminal under this warehouse-in operational terminal manager is all in operation, if available free warehouse-in terminal, from warehouse-in task buffer memory, take out a task, and give this free time warehouse-in terminal and carry out; If there is no idle warehouse-in terminal, wait for, until have warehouse-in terminal to be released, reallocation warehouse-in task;
6), when idle warehouse-in terminal receives after warehouse-in task, carry out immediately in-stockroom operation, if in the process of implementation, owing to putting abnormal that information errors causes in storage, warehouse-in terminal can be filtered this abnormal information, and continue to carry out unenforced enter library information the library information that enters of mistake is dumped in error message unloading file; If what cause because of reasons such as network, data base management system (DBMS), disk I/O is abnormal, put terminal in storage according to default configuration rule retry some or the in-stockroom operation of time, if still can not normal storage, the executed all operations of rollback, and the library information that enters in this warehouse-in task is dumped to into library information unloading file;
7), when the warehouse-in terminal of warehouse-in operational terminal manager administration is during in low load condition, notification data memory controller is scanned into library information unloading file by warehouse-in operational terminal manager, look into the library information that enters that sees if there is unloading, if had, these information are re-loaded in warehouse-in information cache, re-start scheduling;
8), when in warehouse-in operational terminal manager closing process, warehouse-in operational terminal manager stops submitting task to warehouse-in operational terminal manager, put the library information that enters in information cache in storage, due to concurrent reason, the task of being put in storage operational terminal manager refusal all dumps in library information unloading file; Warehouse-in operational terminal manager stops receiving new task, stops, to warehouse-in terminal distribution task, all unallocated task in task buffer memory being dumped in library information unloading file; Each warehouse-in terminal stops receiving an assignment, but continues to carry out uncompleted task, pending complete, the state of deactivating.
Adopt in this way, compared with prior art, mass data warehouse-in process has been increased to management and running, can significantly improve the extensibility of system, change to adapt to different monitored systems and scale thereof; Can realize the flexible configuration of putaway rule, to adapt to the concrete business characteristic of different monitored systems, easily adjust stock management according to the actual conditions of monitored system; And improved data loading accuracy and integrality, improved the reliability of data loading work, avoided the delayed impact of overload to system by unloading mechanism; Also improve the service efficiency of system, can be better and the concurrent work such as supervisory system, flow system, diagnostic system.
Adopt the efficient collection scheduling device of one of the efficient storage method of mass data of the present invention to be: a kind of collection scheduling device efficiently, includes data storage controller, warehouse-in information cache, warehouse-in operational terminal manager, warehouse-in task buffer memory, puts terminal in storage, enters library information unloading file, configuration rule, error message unloading file.
Data storage controller: be responsible for receiving, distributing library information to be entered, in start-up course, scan and be loaded into the library information that enters of unloading in library information unloading file; After data storage controller starts, according to the data number in warehouse-in information cache, according to configuration rule, the library information that enters in warehouse-in information cache is transformed and is packaged into warehouse-in task with optimum way, and submit to warehouse-in operational terminal manager; In system closing process, data storage controller by warehouse-in information cache, also can comprise the warehouse-in task that is rejected submission and dump in library information unloading file.
Warehouse-in operational terminal manager: after system starts, warehouse-in operational terminal manager receives the warehouse-in task that data storage controller is submitted to, the whether available free terminal of operational terminal under checking subsequently, if do not had, check according to configuration rule whether the task quantity in task buffer memory has reached maximum carrying quantity, in the time having reached the maximum carrying of task quantity, then according to the unloading strategy unloading warehouse-in task in configuration rule; When the affiliated warehouse-in operational terminal of this controller is during in low load condition, this manager should notification data memory controller, is scanned into the library information that enters that whether has the not warehouse-in of unloading in library information unloading file; In this system closing process, this warehouse-in operational terminal manager stops receiving new task and allocating task arrives warehouse-in operational terminal, and in preservation task buffer memory, unappropriated warehouse-in task is to entering library information unloading file.
Warehouse-in terminal: the task that warehouse-in terminal receiving management device distributes, carry out in-stockroom operation, in the process of implementation, if find the wrong library information that enters, by the warehouse-in information filtering of this mistake, and dump to error message unloading file, what then continuation was not finished enters library information, if it is abnormal that storage occurs, such as IO, database service is abnormal etc., all database manipulations of rollback, and the warehouse-in task that this operational terminal is being carried out dumps in library information unloading file.
Configuration rule: the configuration information of stored configuration rule, comprise every batch of optimum warehouse-in Information Number, wait for that optimal information counts that maximum duration, maximum can be carried number of tasks, warehouse-in mission failure number of retries, enter library information unloading strategy, entered library information unloading file, error message unloading file, for system operation provides criterion, make under specific circumstances corresponding reaction for all parts foundation is provided.
Adopt the collection scheduling device of this scheme, can be embodied as mass data stock management process has increased management and running, and extensibility, the customization of system are high, can adapt to different monitored systems and scale thereof and change; Putaway rule flexible configuration, can adapt to the concrete business characteristic of different monitored systems, easily adjusts stock management according to the actual conditions of monitored system; And the reliability of the accuracy of data loading and integrality, data loading work is high, has adopted unloading mechanism to have avoided the delayed impact of overload to system; Strong with the ability of the concurrent work such as supervisory system, flow system, diagnostic system.
Embodiment
Disclosed all features in this instructions, or step in disclosed all methods or process, except mutually exclusive feature and/or step, all can combine by any way.
As shown in Figure 1, with in IT O&M field, preserve the magnanimity performance index data that collected by different acquisition terminal.First configuration rule, define every batch of optimum warehouse-in information content for 1000, wait for optimal information count maximum duration for 3 seconds, maximum can carry 500 of number of tasks, in the time that the task amount in the warehouse-in task buffer memory that warehouse-in operational terminal manager is managed exceeds maximum number of tasks, the task 30% in random unloading task buffer memory to entering in library information unloading file, warehouse-in mission failure number of retries 3 times, to enter library information unloading file be message.dump, error message unloading file error_message.dump.
In system starting process, first data storage controller scans message.dump file, checks the library information that enters wherein whether having by unloading, if had, these information is loaded in the warehouse-in information cache of controller.
After system starts, whether data storage controller will exist into library information in inspection buffer memory, if, do not continue to wait for, if had, according in configuration rule, judge whether this cache information number has reached 1000, if reached, extract 1000 and enter library information, and be a task by these 1000 warehouse-in Information encapsulations, submit to warehouse-in operational terminal manager, if and for reaching 1000 data, wait for 3 seconds, in this waiting process of 3 seconds, if in buffer memory, data number has reached 1000 before 3 seconds, controller is not being waited for, immediately these 1000 data are extracted, be encapsulated as a warehouse-in task, submit to warehouse-in operational terminal manager, on the contrary, after 3 second stand-by period finished, appoint and so do not reach 1000 data demands, controller all extracts the library informations that enters all in buffer memory, be encapsulated as a job invocation warehouse-in operational terminal manager.
Warehouse-in terminal management device receives after task, first whether oepration at full load of the operational terminal under judgement, if do not had, directly puts into this task in task buffer memory, etc. to be allocated, if operational terminal all in working order, according to configuration rule, judge whether the data in task buffer memory have reached 500, if reached 500, at random 30% in 500 tasks are dumped in library information unloading file, then newly submitting to of task is put in task buffer memory, wait to be allocated.
Operational terminal under warehouse-in operational terminal manager is available free is, put operational terminal manager in storage, a task of submitting to the earliest in task buffer memory is distributed to this idle operational terminal and carry out, if there is no idle operational terminal, wait for that resource discharges.
When operational terminal receives warehouse-in task, carry out with that warehouse-in task, when carrying out in warehouse-in task process, find warehouse-in information errors, such as data type is not when abnormal to, the sql error etc. carried out, operational terminal filters out these error messages, and these information are dumped in file error_message.dump, and continue not carry out and do not carry out, preserve not other information of warehouse-in.
Operational terminal in the process of implementation, if run into that IO is abnormal, database service is not activated, Network Abnormal etc. is when abnormal, operational terminal will be according to configuration rule retry 3 subtasks, if appoint so occur identical abnormal, the all operations of rollback, and by this task all enter library information dump in message.dump file.
When the warehouse-in operational terminal under warehouse-in operational terminal manager is during in underrun, now put operational terminal manager notification controller in storage, in scanning message.dump file whether unloading a part enter library information, if had, be re-loaded in the buffer memory of controller, Reseal task, and re-execute these warehouse-in tasks.
When in the method system closing process, controller stops receiving the new library information that enters, and stop submission task to warehouse-in operational terminal manager, preserve warehouse-in in information cache all enter library information, and the warehouse-in task of being refused by warehouse-in operational terminal manager is in file message.dump, warehouse-in operational terminal manager stops receiving new task, stop distributing the task in buffer memory to arrive warehouse-in operational terminal, and all tasks in task buffer memory are dumped in message.dump file, warehouse-in operational terminal continues not being finished of task, occur when abnormal according to original logic unloading file.