WO2021051531A1

WO2021051531A1 - Method and apparatus for processing multi-cluster job record, and device and storage medium

Info

Publication number: WO2021051531A1
Application number: PCT/CN2019/117086
Authority: WO
Inventors: 林琪琛
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-09-19
Filing date: 2019-11-11
Publication date: 2021-03-25
Also published as: CN110795257B; CN110795257A

Abstract

The present application relates to the field of big data. Provided are a method and apparatus for processing a multi-cluster job record, and a device and a storage medium. The method comprises: processing job record data generated by a plurality of clusters to acquire data to be processed; creating a topic, a producer and a consumer by means of a distributed message system Kafka in a message queue service system; classifying the data to be processed by means of the Kafka so as to acquire target data, and constructing a blockchain system according to the producer, the topic and the target data; inputting the target data into a repository by means of the blockchain system; inputting, by means of a unified management website system, the target data in the repository into a cache region of a MySQL database; and converting the target data in the cache region into hypertext markup language data, and inputting the hypertext markup language data into a static hypertext markup language page file. By using the present solution, the problem of concurrent crash of a multi-cluster job management system can be solved.

Description

Method, device, equipment and storage medium for processing multi-cluster job records

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 19, 2019, the application number is 201910884887.8, and the invention title is "Methods, Apparatus, Equipment, and Storage Media for Processing Multi-cluster Operation Records", and its entire contents Incorporated in the application by reference.

Technical field

This application relates to the field of data processing, and in particular to methods, devices, equipment and storage media for processing multi-cluster job records.

Background technique

In the current cluster job management, generally, the task job data generated by multiple clusters is obtained, the job job data is input to the unified management website, and the task type of the task job data is detected through the system of the unified management website. The task job data is classified according to the task type to obtain classified data, and the classified data is input into a plurality of storage repositories respectively according to the task type.

The inventor realizes that because multiple clusters directly centralize task job data to a unified management website, it is easy to cause too many portal requests and channel congestion during parallel processing, which may easily lead to concurrent collapse of the multi-cluster job management system.

Summary of the invention

The present application provides a method, device, equipment, and storage medium for processing multi-cluster job records, which can solve the problem of concurrent crashes of the multi-cluster job management system.

In a first aspect, this application provides a method for processing multi-cluster job records, including: obtaining job record data generated by multiple cluster running tasks, and detecting the running status of the tasks, and when it is detected that the running status is a preset At the trigger point, a trigger instruction is sent to the created trigger. The trigger receives the trigger instruction and converts the data format of the job record data into a JSON format to obtain the data to be processed, wherein the preset Trigger points include the start, pause, or end operating status of multiple cluster running tasks; call the distributed messaging system Kafka in the message queue service system, and when the Kafka receives a topic creation command, call the topic creation Script, and create a topic through the topic creation script; create a producer through the Kafka according to the cluster corresponding to the to-be-processed data, and create a consumer through the Kafka according to the unified management website system; The data is input to the Kafka, and the Kafka classifies the to-be-processed data according to the theme and the producer to obtain target data; the target data is processed according to the producer and the theme Block division to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and use the linked multiple blocks and the consumers as the data storage layer, wherein the zone The protocol is used to link each of the blocks in an orderly manner from back to front through a chain and point to the previous block, and to link the created blockchain system to the Kafka, so that the Kafka is used in the blockchain system; a blockchain system is constructed according to the zoning protocol and the data storage layer, and the target data is input to the repository through the blockchain system in accordance with the HTTP request method , And trigger a read instruction, where the Kafka includes a repository, and the number of repositories includes multiple; when the unified management website system receives the read instruction, the data storage layer outputs all The target data in the repository, and the target data is input into the buffer area of the MySQL database; the target data in the buffer area is converted into hypertext markup language data, and the hypertext markup language data Write to the built static hypertext markup language page file.

In a second aspect, the present application provides an apparatus for processing multi-cluster job records, including: a transceiver module for receiving job record data generated by multiple cluster running tasks; a detection module for detecting the running status of the tasks When it is detected that the operating state is the preset trigger point, a trigger instruction is sent to the created trigger, the trigger receives the trigger instruction, and converts the data format of the job record data received by the transceiver module into JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of running states of the cluster running tasks that are started or paused or ended; the calling module is used to call the message queue service system In the distributed messaging system Kafka, when the Kafka receives a topic creation command, the topic creation script is invoked, and the topic is created through the topic creation script; the Kafka creates a producer according to the cluster corresponding to the to-be-processed data, and Create consumers through the Kafka according to the unified management website system; a classification module for inputting the to-be-processed data acquired by the detection module into the Kafka invoked by the invoking module, and using the Kafka according to The subject created by the invoking module and the producer classify the to-be-processed data to obtain target data; the dividing module is used to create according to the producer created by the invoking module and the invoking module The subject of the subject divides the target data obtained by the classification module into blocks to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and link multiple blocks with The block and the consumer serve as a data storage layer, wherein the zoning protocol is used to link each block in an orderly manner from back to front through a chain and point to the previous block, as well as all the blocks to be created. The blockchain system is linked to the Kafka, so that the Kafka can be used in the blockchain system; a building module is used to build the data storage layer according to the zoning protocol and the data storage layer obtained by the division module A blockchain system, through which the target data is input to the repository according to the HTTP request mode, and triggering a read instruction, wherein the Kafka includes a repository, and the number of the repository includes Multiple; receiving module for when the unified management website system receives the read instruction triggered by the building module, output the building module input into the repository through the data storage layer Target data, and input the target data into the cache area of the MySQL database; convert the target data in the cache area into hypertext markup language data, and control the cache area through an output control function to obtain the hypertext markup Language data, and input the hypertext markup language data into the constructed static hypertext markup language page file through the created read-write function.

In a third aspect, the present application provides a computer device, which includes at least one connected processor, a memory, a display, and an input and output unit, wherein the memory is used to store program code, and the processor is used to call the memory To execute the method described in the first aspect above.

In a fourth aspect, the present application provides a computer-readable storage medium having computer instructions stored in the computer-readable storage medium. When the computer instructions run on a computer, the computer executes the above-mentioned first aspect. method.

In the technical solution provided by this application, the job record data generated by multiple cluster running tasks is processed to obtain the to-be-processed data; the distributed message system Kafka in the message queue service system is used to create topics, producers and consumers; through The Kafka classifies the to-be-processed data to obtain target data, and constructs a blockchain system according to the producer, the subject, and the target data; and inputs the target data through the blockchain system To the repository; input the target data in the repository into the cache area of the MySQL database through the unified management website system; convert the target data in the cache area into hypertext markup language data, and mark the hypertext The language data is input to the static hypertext markup language page file. Due to the adoption of the architecture of big data clusters, message queue servers, unified management websites and zoning chain systems, on the one hand, the Kafka system in the message queue server is used as a message queue and combined with the blockchain system for distributed data storage, and multi-node concurrency Processing data caching decouples the system, reduces the pressure of collecting job record data of multiple big data clusters at the same time, avoids the congestion of the job record data of multiple big data clusters at the same time, and achieves high fault tolerance, high speed cache, high efficiency and high Throughput processing effect; on the other hand, by statically processing the hypertext markup language on the target data input to the cache area of the MySQL database to increase the access speed and operation speed, and reduce the load of the server; in summary, this application It can achieve the effects of low-cost, high-efficiency, high-accuracy and multi-dimensional handling of the concurrent crash of the system. Therefore, the present application can effectively prevent and deal with the concurrent crash of the multi-cluster job management system.

Description of the drawings

FIG. 1 is a schematic flowchart of a method for evaluating cloud host resources in an embodiment of this application;

2 is a schematic structural diagram of an apparatus for evaluating cloud host resources in an embodiment of the application;

FIG. 3 is a schematic structural diagram of a computer device in an embodiment of the application.

detailed description

This application provides a method, device, equipment, and storage medium for processing multi-cluster job records, which can be used in an enterprise multi-cluster job management platform to manage and query job operation records generated by multiple big data clusters.

In order to solve the above technical problems, this application mainly provides the following technical solutions:

Please refer to Figure 1, the following is an example of a method for processing multi-cluster job records provided in this application. The method includes a big data cluster layer, a message queue server, and a unified management website system architecture. The method is executed by a computer device. It can be a server or a terminal. When the device 20 shown in FIG. 2 is an application or an executing program, the terminal is a terminal on which the device 20 shown in FIG. 2 is installed. This application does not limit the type of execution subject, including:

101. Obtain the job record data generated by multiple cluster running tasks and the running status of the detection task. When the running status is detected as the preset trigger point, the trigger command is sent to the created trigger, and the trigger receives the trigger command. The data format of the job record data is converted to the JSON format to obtain the data to be processed.

Wherein, the preset trigger point includes the running state of starting or suspending or ending when multiple clusters are running tasks. Use the T-SQL statement of create trigger trigger_name on{}as sql_statemen to create a trigger based on the trigger point of the task's start, pause, and end, so that it detects that the running state of the big data cluster task is start, pause, or end , Trigger the execution of the processing script, and convert the data format of the job record data to the JSON format to obtain the data to be processed. When the task start is detected, the data to be processed is the running account, job content, submission time, start time, project and task operation initiator data; when the task is detected to be suspended, the data to be processed is the running account , Job content, submission time, start time, belonging project, task operation starter, operation suspension time and task operation suspension data; when the end of the task is detected, the pending data obtained is the running account, job content, and submission time , Start time, belonging project, task operation starter, operation end time and running result data. The job record data generated by running tasks in multiple clusters is stored in the MySQL database connected to multiple clusters. After reading the job record data from the MySQL database connected to multiple clusters, the data format is converted to JSON format. To facilitate the processing of structured data.

Optionally, in some embodiments of the present application, before converting the data format of the job record data into the JSON format, the method of the present application further includes: performing data compression on the job record data; and performing data compression on the job record data after data compression Perform state detection to obtain state information, analyze the state information through the cache coherency protocol to obtain the first data and the second data. The state information includes the modified state, the exclusive state, the shared state, and the invalid state. One data includes job operation data that has strong requirements for consistent caching; the second data includes job operation data that has strong requirements for consistent cache; call the Cache local cache interface to generate a cache builder CacheBuilder object for the first data, and assemble it The first data automatic loading function, and the first key-value pair data of the first data is obtained; the first key-value pair data is automatically loaded into the physical memory cache through the CacheBuilder object and the automatic loading function; the CacheLoader subclass object is created, When it is detected that the get data operation fails, the first key-value pair data is automatically loaded into the physical memory cache through the CacheLoader subclass object; the cache architecture component of the high-speed cache system Memcached and the data structure server Redis is built, among which, the cache architecture component Including a cache server; obtaining the first hash value of the node of the cache architecture component, and obtaining the second key-value pair data of the second data, and obtaining the second hash value of the second key-value pair data; according to the first hash The second key-value pair data is stored in the cache server of the cache architecture component through the cache architecture component to obtain the final job record data. By compressing the acquired job record data to reduce the amount of job record data in the process of transmission or being transferred; through the local cache and comprehensive distributed cache of the job record data after data compression, combined with the high speed of the distributed cache Cache, high performance, dynamic scalability, high availability, and ease of use can reduce the read and write pressure and load of the server while reducing the data storage pressure and channel pressure of the system, and standardize the cached data and improve the hit rate of the cache , So as to achieve the effect of quickly and accurately caching the job record data to prevent concurrent system crashes.

102. Call the distributed messaging system Kafka in the message queue server. When Kafka receives the topic creation command, it calls the topic creation script, and creates the topic through the topic creation script.

After obtaining the data to be processed, call the distributed messaging system Kafka in the message queue server and initiate the topic creation command "bin/kafka-topics.sh--create--zookeeper localhost:--replication-factor N--partitions M –Topic running_result", the content of the topic creation command includes: the topic is running_result, there are M partitions, each partition needs to be allocated N copies, and the topic creation script including the command line running part and the background (controller) logic running part is called. The background (controller) logic running part monitors the corresponding directory node under the distributed application coordination service zookeeper. The command line running part creates a new data node when receiving the theme creation command to trigger the background (controller) logic running part. The theme has been created . Create topics to facilitate the summary of the input data to be processed.

103. Create producers through Kafka according to the cluster corresponding to the data to be processed, and create consumers through Kafka according to the unified management website system.

The cluster is the provider of the data to be processed, that is, the producer; the unified management website system is the consumer of the data to be processed, that is, the consumer. The consumer end (unified management website system) runs automatically and has the function of monitoring the update of the topic in Kafka. Through Kafka's producer and consumer mode, the effect of parallel processing of job record data generated by the cluster and balance of system load is achieved.

104. Input the to-be-processed data into Kafka, and use Kafka to classify the to-be-processed data according to topics and producers to obtain target data.

Through Kafka, according to different producers (ie clusters), the to-be-processed data is classified into the to-be-processed data corresponding to different producers. Among the to-be-processed data for the producer classification, the to-be-processed data is reclassified according to the theme to obtain Target data. Among them, the data to be processed can be classified by summarizing the data into topics: classify events in a fixed order in the same topic, and use the same partition key; for different entities, and one entity depends on another entity Event, classify the event in the same topic; classify the events whose throughput is higher than the first preset throughput threshold into different topics, and classify the events that are lower than the second preset throughput threshold into the same topic Topic. By categorizing the data to be processed according to the subject and the producer, the data can be obtained quickly and accurately and the concurrent processing of the data can be facilitated.

Optionally, in some embodiments of the present application, the above-mentioned tasks include events. The above-mentioned classification of data to be processed by Kafka according to topics and producers to obtain target data includes: obtaining the order correlation degree of events, and obtaining The throughput of the event, and the entity type that identifies the event, and obtains the correlation between the entity types. Among them, the entity type is used for an address corresponding to a user; according to the sequential correlation, throughput and correlation, it is classified according to the preset The strategy classifies the to-be-processed data into topics to obtain the first classification data, where the preset classification strategy includes satisfying the order that the degree of relevance is greater than the first preset threshold, the throughput is less than the second preset threshold, and the degree of relevance is greater than the first The data to be processed under at least one of the three preset thresholds are classified into the same topic; the first classification data is marked, where the marked content includes the order correlation degree, throughput, entity type, and entity type corresponding to the data to be processed The degree of relevance and the name of the subject; in the marked first classification data, the classification is carried out according to the type of the producer, and the type of the producer of the marked first classification data is marked to obtain the target data. By categorizing the data to be processed in accordance with this rule, all task events are prevented from being classified into one topic; by rationally categorizing into multiple topics to ensure that multiple data can be obtained on the basis of the order and integrity of the data to be processed. Events and the data to be processed corresponding to the user.

Optionally, in some embodiments of the present application, the above-mentioned tasks include events. After the above-mentioned Kafka classifies the data to be processed according to the subject and the producer and before the target data is obtained, the method of the present application further includes: The data to be processed is initialized, and the length of the linear hash table is set according to the classification type of the data to be processed after classification; the key value of the data to be processed after classification is obtained, and the word frequency of the data item of the data to be processed after classification is calculated -Inverse text frequency index TF-IDF value, to obtain the target key value corresponding to the data item whose TF-IDF value is greater than the fourth preset threshold, where the data to be processed includes the data item; the target key value is not greater than the linear scatter The remainder obtained by dividing the value of the length of the list is used as the address of the linear hash table, the target key value is used as the header of the linear hash table, and the address of the linear hash table is used as the number of the linear hash table to obtain the linear hash table; random Generate a preset number of strings of the same length, and perform statistics and analysis on the linear hash table through the preset string function to obtain hash distribution information and average bucket length information, where hash distribution information includes the use of buckets The average bucket length information includes the average length of all used buckets; it is determined whether the hash distribution information meets the first preset condition, and the average bucket length information meets the second preset condition, where the first preset condition includes The ratio of the number of used barrels to the total number of barrels is the first preset range value, and the second preset condition includes the value of the average length of all used barrels as the second preset range value; if the judgment result is both If yes, take the linear hash table corresponding to the judgment result as the final linear hash table; fill the target key code value into the final linear hash table, and output the final linear hash table in the form of a linked list to obtain the target data. By sorting the target key code values according to the TF-IDF value, in order to quickly and accurately classify the data to be processed; by using the linear hash table, the access speed is not affected by the total amount of access elements, and it is suitable for databases with large amounts of data and High-efficiency features to improve the query speed of the function record data, and improve the query of the function record data without affecting the query speed of the function record data while solving the concurrent crash problem of the multi-cluster job management system speed. By performing linear hash table processing on the data to be processed, the performance of the system and the scalability of the system can be improved at a low cost.

Optionally, in some embodiments of the present application, the method of the present application includes a transmission channel, and the above-mentioned inputting the to-be-processed data into Kafka includes: performing data compression on the to-be-processed data; judging whether the transmission status of the transmission channel is normal; if If the judgment result is yes, the data to be processed after data compression is input into Kafka, and the data to be processed into Kafka is marked as sent; if the judgment result is no, the data to be processed after data compression is input to the first A MySQL database, and mark the pending data input to the first MySQL database as unsent; call the created polling script, and use the polling script to poll the first MySQL database according to the preset time; when polling When detecting that the first MySQL database has unsent pending data, and polling detects that the transmission status of the transmission channel is normal, input the pending data marked as unsent into the first MySQL database; polling detects the first MySQL Whether the database has received the pending data marked as unsent; if the test result is yes, replace the unsent mark in the pending data marked as unsent with the sent mark; if the test result is no, the mark is not updated It is the unsent mark in the unsent pending data. By marking the data to be processed, it is possible to avoid repeated processing of the data to be processed, thereby increasing the load of the system, thereby helping to prevent concurrent crashes of the multi-cluster job management system.

Optionally, in some embodiments of the present application, the above-mentioned classification of the data to be processed by Kafka according to the subject and the producer to obtain the target data includes: obtaining characteristic information of the running state of the task corresponding to the data to be processed; The feature information sorts and classifies the data to be processed to obtain the classification data and mark the classification type of the classification data. Among them, the classification type of the classification data includes task start data, task operation data, and task end data; the classification data is classified according to the classification type. Establish the correspondence between the classification data and the subject, and mark the correspondence between the classification data to obtain the target data.

105. Divide the target data into blocks according to the producer and theme to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and use the linked multiple blocks and consumers as the data storage layer.

Among them, the zoning protocol is used to link each block in an orderly manner from back to front through the chain and point to the previous block, and to link the created blockchain system to Kafka, so that Kafka can be applied to the blockchain system in. Block division in the message queue server to create a blockchain system. Different producers divide the target data into different blocks, and one producer corresponds to one block, so that the data of the block can be managed according to the producer. On the basis of dividing the blocks according to the producer, divide the target data into different blocks with different themes, and divide the target data into different blocks according to different themes, so that the data of the blocks can be divided according to the theme. To manage. Use multiple linked blocks and consumers as the data storage layer to facilitate the storage of target data and the acquisition of target data by consumers, as well as the application of blockchain system links to the unified management website system. Distributed node storage and processing of target data through block division can effectively handle the concurrent crash of multi-cluster job management systems.

106. Construct a blockchain system according to the zoning protocol and the data storage layer, and input the target data into the repository through the blockchain system according to the HTTP request method, and trigger the read instruction.

Among them, Kafka includes repositories, and the number of repositories includes multiple. The blockchain system includes the application layer, and the application layer includes the unified management website system. There are a variety of request methods for http request methods, which specify resource methods for different operations according to different methods, including GET request method, HEAD request method, POST request method, PUT request method, DELETE request method, CONNECT request method, OPTIONS The request method, the TRACE request method and the PATCH request method. The HTTP request method uses the PUT request method to facilitate the transmission of the latest data of the specified target data to the repository in the message queue server. Multiple repositories are set up in Kafka to store target data categorically, and are stored in corresponding repositories according to the producers and topics in the target data, so as to facilitate the management and acquisition of target data. The Kafka system in the message queue server is used as a message queue and combined with the blockchain system for distributed data storage, and the data cache is processed concurrently by multiple nodes, which decouples the system and slows down the collection of job record data from multiple big data clusters at the same time Pressure to avoid the congestion of simultaneous collection of job record data of multiple big data clusters, and achieve high fault tolerance, high-speed caching, high efficiency and high throughput processing effects.

107. When the unified management website system receives the read instruction, it outputs the target data in the repository through the data storage layer, and inputs the target data into the cache area of the MySQL database.

Read and store the target data in the Kafka system into the cache area of the MySQL database of the unified management website system. The Kafka system is monitored through the unified management website system, and the target data is captured and stored in a timely manner; the captured target data is input into the cache area of the MySQL database to facilitate subsequent reading of the target data and slow down the MySQL database The storage pressure.

Optionally, in some embodiments of the present application, before the target data is input into the cache area of the MySQL database, a preset data consumption frequency is set, and the target data is input into the cache area of the MySQL database according to the preset data consumption frequency. By inputting the target data into the cache area of the MySQL database according to the preset data consumption frequency, the input of the target data has a certain buffer, thereby reducing the storage pressure of the MySQL data.

Optionally, in some embodiments of the present application, when the above-mentioned unified management website system receives a read instruction, it outputs the target data in the repository through the data storage layer, and inputs the target data into the cache area of the MySQL database. , Including: the unified management website system calls the listener script, through the listener script to detect whether the application layer in the blockchain system has received a read instruction; when the detection result is no, the application layer in the blockchain system is re- Detection; when the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed tag to obtain the marked target data; Convert the marked target data into a JSON object, and parse the JSON object into the first data object; identify whether there is a data object with the same content as the first data object in the second data object of the MySQL database; if the identification result is yes, then Delete the data object with the same content as the second data object from the first data object to obtain the first target data object; obtain the subject and producer information marked in the label of the first target data object; according to the subject and producer information, Fill the first target data object into the cache area of the MySQL database; if the recognition result is no, obtain the subject and producer information marked in the label of the first data object; according to the subject and producer information, the first data object Fill to the cache area of the MySQL database. Through the unified management of the website system, Kafka monitors whether to receive updated target data to reduce the risk of repeated data capture and storage; object conversion through the target data, so that the target data can be stored in the MySQL database; through the theme and producer information The sub-target data are respectively filled into the multiple buffer areas set in the MySQL database to facilitate the classified management and acquisition of the data. In summary, the multi-cluster job management system can improve the management efficiency of the job record data.

Optionally, in some embodiments of the present application, after the above-mentioned target data is entered into the cache area of the MySQL database, the method of the present application further includes: sending a startup instruction to the hidden system that has been set up, and the hidden system receives the startup instruction, Start the hidden protocol. The hidden system includes the hidden protocol, and the hidden protocol includes the protocol involving failure, destruction and deletion, human ethics and morality; when the hidden system detects that the input information is contrary to the hidden protocol, the data in the MySQL database is copied and backed up In the hidden system, the hidden system enters the authentication state, where the information includes fault instructions, destruction and deletion instructions, and files with Trojan horse programs; when the hidden system entering the authentication state detects that the input access request has management authority, it outputs a password input request ; When the hidden system that enters the authentication state detects that the entered password information is correct and that the number of inputs has not reached the limit, it accepts the access request; when the hidden system that enters the authentication state detects that the number of inputs has reached the limit, it does not accept Access requests, and permanently archive the copied and backed-up data. By copying and backing up the target data stored in the MySQL database in a hidden system, and setting a hidden protocol to prevent the source data of the target data from being obtained and the source of the target data guaranteed when the device fails or is hacked or destroyed or deleted The security of data further improves the security and availability of the multi-cluster job management system.

108. Convert the target data in the cache area into hypertext markup language data, and write the hypertext markup language data into the constructed static hypertext markup language page file.

Convert the target data in the cache area and write it into the constructed static hypertext markup language page file, and perform hypertext markup language static processing on the target data stored in the cache area of the MySQL database to increase the access speed and running speed , And reduce the burden on the server, thereby effectively solving the problem of concurrent collapse of the multi-cluster job management system.

Optionally, in some embodiments of the present application, before converting the target data in the cache area into hypertext markup language data, the method of the present application further includes: detecting whether the database transaction in the MySQL database is in an executing state; If yes, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and add the updated data of the target data subsequently input to the cache area of the MySQL database to the initial data, where the Locktable statement includes the WRITE keyword Locktable statement; get the data with preset fields in the target data of the buffer area, and get the field size of the data with preset fields. The preset fields include fields for Join, Where judgment and Orderby sorting, and use In the fields of the MAX() command, MIN() command and Orderby command; according to the field size of the data with the preset field and the field size of the data with the preset field, the index is created according to the preset rule, where the preset rule includes the same field Create an index for target data of a large size and create an index for target data that contains duplicate values that do not exceed the fifth preset threshold; check whether the type of the data table in the MySQL database is defined as the InnoDB type; if not, the type is not the InnoDB type Add TYPE=INNODB to the Createtable statement in the data table to obtain the InnoDB type table; if so, obtain the InnoDB type data table, and use the InnoDB type data table as the InnoDB type table; pass the alter table command Create foreign keys for InnoDB type tables. By combining the creation of locked tables, the use of foreign keys, and the creation of indexes, the MySQL database is optimized to improve database performance in terms of maintaining the integrity of the target data and ensuring the relevance of the target data, so as to release the storage database of the system and slow down the storage of the database Pressure provides space and speed support for the concurrent processing of the system, so as to effectively prevent and deal with the problem of concurrent crashes of the multi-cluster job management system.

Optionally, in some embodiments of the present application, after the above-mentioned hypertext markup language data is written into the constructed static hypertext markup language page file, the method of the present application further includes: when the unified management website system recognizes When the login request entered by the user is correct, the login request is accepted; when the server in the unified management website receives the query request entered by the user, it obtains the characteristic information of the query request; converts the characteristic information into a search statement, and then uses the search statement to check the MySQL database. Filter the data to obtain the data corresponding to the query request; perform statistics and analysis on the data corresponding to the query request, and generate and output visual charts. By outputting corresponding visual charts according to user needs, it is convenient for users to read the job record data, so as to improve the usability of the multi-cluster job management system.

Compared with the existing mechanism, the embodiments of this application, on the one hand, decouple the system, alleviate the pressure of collecting job record data of multiple big data clusters at the same time, and avoid the congestion of the job record data of multiple big data clusters at the same time. , To achieve high fault tolerance, high-speed cache, high efficiency and high throughput processing effect; on the other hand, to increase access speed and operating speed, and reduce the load of the server; combined with the above, this application can achieve low cost, high efficiency, high accuracy With the effect of the concurrent crash of the multi-directional processing system, the present application can effectively prevent and deal with the concurrent crash of the multi-cluster job management system.

The technical features mentioned in any optional embodiment or optional implementation in the embodiment corresponding to FIG. 1 or the embodiment corresponding to FIG. 1 are also applicable to those corresponding to FIG. 2 and FIG. 3 in this application. In the embodiment, the similarities will not be repeated in the following.

The foregoing describes a method for processing multi-cluster job records in the present application, and the following describes a device that executes the foregoing method for processing multi-cluster job records.

FIG. 2 shows a schematic structural diagram of a device 20 for processing multi-cluster job records, which can be applied to an enterprise multi-cluster job management platform to manage and query job operation records generated by multiple big data clusters. The apparatus 20 in the embodiment of the present application can implement the method for processing multi-cluster job records executed in the embodiment corresponding to FIG. 1 or any optional embodiment or optional implementation in the embodiment corresponding to FIG. 1 A step of. The functions implemented by the device 20 can be implemented by hardware, or can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-mentioned functions, and the modules may be software and/or hardware. The device 20 may include a transceiver module 201, a detection module 202, a calling module 203, a classification module 204, a division module 205, a construction module 206, and a receiving module 207, the transceiver module 201, the detection module 202, the calling module 203, the classification module 204, and the division module 205. The function implementation of the building module 206 and the receiving module 207 may refer to the operation performed in the embodiment corresponding to FIG. 1 or any optional embodiment or optional implementation manner in the embodiment corresponding to FIG. 1, which will not be repeated here. . The detection module 202 can be used to control the transceiving operation of the transceiving module 201, the classification module 204 can be used to control the acquisition operation of the detection module 202 and the creation operation of the calling module 203, and the division module 205 can be used to control the creation operation of the calling module 203 and the creation operation of the classification module 204. For obtaining operation, the construction module 206 can be used to control the obtaining operation of the division module 205, and the receiving module 207 can be used to control the trigger operation and input operation of the construction module 206.

In some embodiments, the transceiver module 201 is used to receive job record data generated by multiple cluster running tasks; the detection module 202 is used to detect the running status of the task. When the running status is detected as a preset trigger point, the The trigger sends the trigger instruction, the trigger receives the trigger instruction, and converts the data format of the job record data received by the transceiver module 201 into JSON format to obtain the data to be processed; the call model 203 is used to call the distribution in the message queue service system Message system Kafka, when Kafka receives a topic creation command, it calls the topic creation script, and creates the topic through the topic creation script; Kafka creates a producer based on the cluster corresponding to the data to be processed, and Kafka creates consumption based on the unified management website system The classification module 204 is used to input the to-be-processed data obtained by the detection module 202 into Kafka, and to classify the data to be processed by Kafka according to the topic created by the call model 203 and the producer to obtain the target data; the classification module 205 uses According to the producer created by the calling module 203 and the theme created by the calling module 203, the target data obtained by the classification module 204 is divided into blocks to obtain multiple blocks, and the multiple blocks are linked according to the created zoning protocol, and Multiple linked blocks and consumers are used as the data storage layer; the construction module 206 is used to construct a blockchain system according to the zoning protocol and the data storage layer obtained by the division module 205, and use the blockchain system to transfer the data according to the HTTP request. The target data is input to the repository, and the reading instruction is triggered; the receiving module 207 is used to input the target in the repository through the data storage layer output building module 206 when the unified management website system receives the reading instruction triggered by the building module 206 Data, and input the target data into the buffer area of the MySQL database; convert the target data in the buffer area into hypertext markup language data, control the buffer area through the output control function to obtain the hypertext markup language data, and pass the created read The write function inputs the hypertext markup language data into the constructed static hypertext markup language page file.

Among them, the preset trigger point includes the start or pause or end of the running state of multiple cluster running tasks; the zoning protocol is used to link each block from back to front through the chain and point to the previous block in an orderly manner, and Link the created blockchain system to Kafka so that Kafka can be used in the blockchain system; Kafka includes repositories, and the number of repositories includes multiple.

Optionally, the above-mentioned classification module 204 is also used to: obtain the sequence association degree of the events, obtain the throughput of the event, and identify the entity type of the event, and obtain the association degree between the entity types, where the entity type is used for one The address corresponds to a user; according to the order of relevance, throughput and relevance, the data to be processed is classified into topics according to the preset classification strategy to obtain the first classification data, where the preset classification strategy includes meeting the order of association At least one of the data to be processed with a degree greater than the first preset threshold, throughput less than the second preset threshold, and relevance greater than the third preset threshold is classified into the same topic; marking the first classification data, where the mark The content includes the order correlation degree, throughput, entity type, the correlation degree between entity types and the name of the topic corresponding to the data to be processed; the first classification data of the mark is classified according to the type of producer, and the mark is marked The type of the producer of the first classification data to obtain the target data.

Optionally, the above-mentioned classification module 204 is further configured to: initialize the classified data to be processed, and set the length of the linear hash table according to the classification type of the classified data to be processed; the key to obtaining the classified data to be processed Code value, calculate the word frequency-inverse text frequency index TF-IDF value of the classified data item to be processed, and obtain the target key code value corresponding to the data item whose TF-IDF value is greater than the fourth preset threshold, wherein, to be processed The data includes data items; the remainder obtained by dividing the target key value by a value not greater than the length of the linear hash table is used as the address of the linear hash table, and the target key value is used as the head of the linear hash table, and the value of the linear hash table is used as the head of the linear hash table. The address is used as the number of the linear hash table to obtain the linear hash table; a preset number of strings of the same length are randomly generated, and the linear hash table is counted and analyzed through the preset string function to obtain hash distribution information and average Bucket length information, where the hash distribution information includes the usage rate of the bucket, and the average bucket length information includes the average length of all used buckets; determine whether the hash distribution information meets the first preset condition, and determine the average bucket length information The second preset condition is satisfied, where the first preset condition includes that the ratio of the number of used barrels to the total number of barrels is a first preset range value, and the second preset condition includes the average length of all used barrels The value of is the second preset range value; if the judgment result is all yes, the linear hash table corresponding to the judgment result is the final linear hash table; the target key code value is filled into the final linear hash table, and Output the final linear hash table in the form of a linked list to obtain the target data.

Optionally, the above-mentioned classification module 204 is further used to: perform data compression on the data to be processed; determine whether the transmission status of the transmission channel is normal; if the determination result is yes, input the data to be processed after data compression into Kafka, and input The data to be processed to Kafka is marked as sent; if the judgment result is no, the data to be processed after data compression is input to the first MySQL database, and the data to be processed input to the first MySQL database is marked as unsent; Call the created polling script, and poll the first MySQL database according to the preset time through the polling script; when the polling detects that the first MySQL database has pending data that has not been sent, and the polling detects the transmission When the transmission status of the channel is normal, enter the pending data marked as unsent into the first MySQL database; poll to detect whether the first MySQL database receives pending data marked as unsent; if the detection result is yes, then The unsent mark in the pending data marked as unsent is replaced with the sent mark; if the detection result is no, the unsent mark in the pending data marked as unsent is not updated.

Optionally, the above-mentioned receiving module 207 is also used to: uniformly manage the website system to call the listener program script, and use the listener program script to detect whether the application layer in the blockchain system receives the read instruction; when the detection result is no, the district The application layer in the blockchain system performs re-detection; when the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed Label to obtain the marked target data; convert the marked target data into a JSON object, and parse the JSON object into a first data object; identify whether there is a data object with the same content as the first data object in the second data object of the MySQL database ; If the recognition result is yes, delete the data object that has the same content as the second data object in the first data object to obtain the first target data object; obtain the subject and producer marked in the label of the first target data object Information; according to the subject and producer information, the first target data object is filled into the MySQL database cache; if the recognition result is no, then the subject and producer information marked in the label of the first data object are obtained; according to the subject and Producer information, fill the first data object into the cache area of the MySQL database.

Optionally, the above-mentioned receiving module 207 is also used to: detect whether the database transaction in the MySQL database is in an executing state; if so, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and input the subsequent input to MySQL The updated data of the target data in the cache area of the database is added to the initial data. Among them, the Locktable statement includes a Locktable statement with the WRITE keyword; to obtain data with preset fields in the target data of the cache area, and to obtain data with preset fields The preset fields include fields for Join, Where judgment and Orderby sorting, as well as fields for MAX() command, MIN() command and Orderby command; according to the data with preset fields and the pre-defined fields Set the field size of the field data and create an index according to a preset rule, where the preset rule includes indexing target data of the same field size and indexing target data that contains repeated values that do not exceed a fifth preset threshold; Check whether the type of the data table in the MySQL database is defined as the InnoDB type; if not, add TYPE=INNODB to the Createtable statement in the data table whose type is not the InnoDB type to obtain the InnoDB type table; if it is, the type is InnoDB type data table, and the type of InnoDB type data table as InnoDB type table; use the alter table command to create a foreign key to the InnoDB type table.

Optionally, the above-mentioned classification module 204 is further configured to: obtain characteristic information of the running state of the task corresponding to the data to be processed; sort and classify the data to be processed according to the characteristic information to obtain the classification data and mark the classification type of the classification data, Among them, the classification types of the classification data include task start data, task operation data, and task end data; the classification data are respectively established corresponding to the classification data and the theme according to the classification type, and the corresponding relationship of the classification data is marked to obtain the target data.

In the embodiments of this application, on the one hand, the system is decoupled, reducing the pressure of collecting job record data of multiple big data clusters at the same time, avoiding the congestion of collecting job record data of multiple big data clusters at the same time, and achieving high fault tolerance and high-speed caching. , High-efficiency and high-throughput processing effect; on the other hand, increase the access speed and operating speed, and reduce the load of the server; combined with the above, this application can achieve low-cost, high-efficiency, high-accuracy and multi-directional processing system concurrency The effect of the crash problem, therefore, the present application can effectively prevent and deal with the problem of concurrent crashes of the multi-cluster job management system.

Optionally, in some embodiments of the present application, the technical features mentioned in any embodiment or implementation of the method for processing multi-cluster job records are also applicable to the above-mentioned processing multi-cluster in this application. For the device 20 of the method of job recording, the similarities will not be repeated here.

The device 20 in the embodiment of the present application is described above from the perspective of modular functional entities. The following describes a computer device from the perspective of hardware, as shown in FIG. 3, which includes: a processor, a memory, a transceiver (or An input and output unit (not identified in FIG. 3) and a computer program stored in the memory and running on the processor. For example, the computer program may be a program corresponding to the method of processing multi-cluster job records in the embodiment corresponding to FIG. 1 or any optional embodiment in the embodiment corresponding to FIG. 1 or the optional implementation manner. For example, when the computer device implements the function of the device 20 shown in FIG. 2, the processor executes the computer program to implement the method for processing multi-cluster job records executed by the device 20 in the embodiment corresponding to FIG. 2 Or, when the processor executes the computer program, the function of each module in the apparatus 20 of the embodiment corresponding to FIG. 2 is realized. For another example, the computer program may be a program corresponding to the method in the embodiment corresponding to FIG. 1 or any optional embodiment in the embodiment corresponding to FIG. 1 or the optional implementation manner.

The so-called processor can be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc. The processor is the control center of the computer device, and various interfaces and lines are used to connect various parts of the entire computer device.

The memory may be used to store the computer program and/or module, and the processor implements the computer by running or executing the computer program and/or module stored in the memory and calling data stored in the memory. Various functions of the device. The memory may mainly include a storage program area and a storage data area, where the storage program area can store an operating system, an application program required by at least one function (such as obtaining job record data generated by multiple cluster running tasks, etc.), etc.; storage The data area can store the data created according to the use of the mobile phone (for example, divide the target data into blocks according to the producer and theme to obtain multiple blocks, etc.) and so on. In addition, the memory can include high-speed random access memory, and can also include non-volatile memory, such as hard disks, memory, plug-in hard disks, smart media cards (SMC), and secure digital (SD) cards. , Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The transceiver can also be replaced by a receiver and a transmitter, and can be the same or different physical entities. When they are the same physical entity, they can be collectively referred to as transceivers. The transceiver can be an input and output unit. The entity device corresponding to the transceiver module 201 in FIG. 2 may be the transceiver in FIG. 3, and the entity corresponding to the detection module 202, the calling module 203, the classification module 204, the division module 205, the construction module 206, and the receiving module 207 in FIG. 2 The device may be the processor in FIG. 3.

The memory may be integrated in the processor, or may be provided separately from the processor.

The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:

Obtain the job record data generated by multiple cluster running tasks and the running status of the detection task. When the running status is detected as the preset trigger point, the trigger command is sent to the created trigger, and the trigger receives the trigger command and records the job The data format is converted to JSON format to obtain the data to be processed. The preset trigger point includes the start, pause, or end operating status of multiple cluster running tasks; call the distributed messaging system Kafka in the message queue service system When the Kafka interface receives the topic creation command, the topic creation script is called, and the topic creation script is used to create the topic; Kafka creates a producer based on the cluster corresponding to the data to be processed, and Kafka creates consumption based on the unified management website system Enter the data to be processed into Kafka, and classify the data to be processed by Kafka according to the subject and producer to obtain the target data; divide the target data into blocks according to the producer and the subject to obtain multiple blocks, according to The created zoning protocol links multiple blocks, and uses the linked multiple blocks and consumers as the data storage layer. Among them, the zoning protocol is used to link and point each block in an orderly manner from back to front through the chain The previous block, and link the created blockchain system to Kafka, so that Kafka can be used in the blockchain system; build the blockchain system according to the zoning protocol and data storage layer, and use the blockchain system to follow http Input the target data into the repository and trigger the read instruction. Among them, Kafka includes the repository, and the number of repositories includes multiple; when the unified management website system receives the read instruction, it outputs the storage through the data storage layer Target data in the library, and enter the target data into the cache area of the MySQL database; convert the target data in the cache area into hypertext markup language data, and write the hypertext markup language data into the constructed static hypertext markup Language page file.

Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM), including Several instructions are used to make a terminal (which can be a mobile phone, a computer, a server, or a network device, etc.) execute the methods described in the various embodiments of the present application.

Claims

A method for processing multi-cluster job records includes a message queue service system and a unified management website system. The method includes:

Obtain job record data generated by multiple cluster running tasks, and detect the running status of the task. When it is detected that the running status is a preset trigger point, a trigger instruction is sent to the created trigger, and the trigger receives The trigger instruction converts the data format of the job record data into a JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of operations that start, pause, or end when the cluster runs tasks status;

Calling the interface of the distributed messaging system Kafka in the message queue service system, when the interface of Kafka receives a topic creation command, calling the topic creation script, and creating a topic through the topic creation script;

Create producers through the Kafka according to the cluster corresponding to the data to be processed, and create consumers through the Kafka according to the unified management website system;

Input the to-be-processed data into the Kafka, and use the Kafka to classify the to-be-processed data according to the theme and the producer to obtain target data;

Divide the target data into blocks according to the producer and the theme to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and link multiple blocks with And the consumer as a data storage layer, wherein the zoning protocol is used to link each of the blocks in an orderly manner from back to front through a chain and point to the previous block, and the created block The blockchain system is linked to the Kafka, so that the Kafka can be used in the blockchain system;

A blockchain system is constructed according to the zoning protocol and the data storage layer, and the target data is input to the repository through the blockchain system in the HTTP request mode, and a read instruction is triggered, wherein the Kafka includes repositories, and the number of repositories includes multiple;

When the unified management website system receives the read instruction, output the target data in the repository through the data storage layer, and input the target data into the cache area of the MySQL database;

The target data in the buffer area is converted into hypertext markup language data, and the hypertext markup language data is written into the constructed static hypertext markup language page file.
The method according to claim 1, wherein the task includes an event, and the Kafka classifies the to-be-processed data according to the topic and the producer to obtain target data, including:

Acquire the sequence correlation degree of the event, acquire the throughput of the event, and identify the entity type of the event, and acquire the correlation degree between the entity types, wherein the entity type is used for one address correspondence A user

According to the sequential correlation degree, the throughput, and the correlation degree, the data to be processed is classified into the topic according to a preset classification strategy to obtain the first classification data, wherein the preset classification The class strategy includes classifying the to-be-processed data that meets at least one of the following conditions: the sequential correlation degree is greater than a first preset threshold, the throughput is less than a second preset threshold, and the correlation degree is greater than a third preset threshold. The same theme

Marking the first classification data, wherein the content of the marking includes the sequential relevance, throughput, entity type, relevance between entity types and the name of the subject corresponding to the data to be processed;

The marked first classification data is classified according to the type of the producer, and the type of the producer of the marked first classification data is marked to obtain target data.
The method according to claim 1, after the Kafka classifies the data to be processed according to the subject and the producer, and before the acquisition of the target data, the method further comprises:

Performing initialization processing on the classified data to be processed, and setting the length of the linear hash table according to the classification type of the classified data to be processed;

Obtain the key code value of the classified data to be processed, calculate the word frequency-inverse text frequency index TF-IDF value of the classified data item of the to-be-processed data, and obtain that the TF-IDF value is greater than the fourth preset The target key value corresponding to the data item of the threshold, wherein the data to be processed includes the data item;

The remainder obtained by dividing the target key value by a value not greater than the length of the linear hash table is used as the address of the linear hash table, and the target key value is used as the header of the linear hash table, Use the address of the linear hash table as the number of the linear hash table to obtain the linear hash table;

Randomly generate a preset number of strings of the same length, and perform statistics and analysis on the linear hash table through a preset string function to obtain hash distribution information and average bucket length information, wherein the hash distribution The information includes the usage rate of the bucket, and the average bucket length information includes the average length of all used buckets;

Determine whether the hash distribution information satisfies a first preset condition, and determine that the average bucket length information satisfies a second preset condition, where the first preset condition includes the number of used buckets and the total buckets The value of the ratio of the number of is the first preset range value, and the second preset condition includes the value of the average length of all used buckets is the second preset range value;

If the judgment result is all yes, then the linear hash table corresponding to the judgment result is all as the final linear hash table;

The target key code value is filled into the final linear hash table, and the final linear hash table is output in the form of a linked list to obtain target data.
The method according to claim 1, said method comprising a transmission channel, and said inputting said to-be-processed data to said Kafka comprises:

Performing data compression on the to-be-processed data;

Judging whether the transmission status of the transmission channel is normal;

If the judgment result is yes, input the data to be processed after data compression into the Kafka, and mark the data to be processed input to the Kafka as sent;

If the judgment result is no, input the data to be processed after data compression into the first MySQL database, and mark the data to be processed into the first MySQL database as not sent;

Calling the created polling script, and polling and detecting the first MySQL database according to a preset time through the polling script;

When polling detects that the first MySQL database has unsent pending data, and polling detects that the transmission status of the transmission channel is normal, input the unsent pending data to the first MySQL database. MySQL database;

Polling to detect whether the first MySQL database receives pending data marked as unsent;

If the detection result is yes, replace the unsent mark in the pending data marked as unsent with the sent mark;

If the detection result is negative, the unsent mark in the pending data marked as unsent is not updated.
The method according to claim 1, wherein when the unified management website system receives the read instruction, output the target data in the repository through the data storage layer, and transfer the target The data is entered into the buffer area of the MySQL database, including:

The unified management website system invokes a listener program script, and detects whether the application layer in the blockchain system receives the read instruction through the listener program script;

When the detection result is negative, re-detect the application layer in the blockchain system;

When the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed tag to obtain the marked target data;

Convert the marked target data into a JSON object, and parse the JSON object into a first data object;

Identifying whether there is a data object with the same content as the first data object in the second data object of the MySQL database;

If the recognition result is yes, delete data objects that have the same content as the second data object from the first data object to obtain the first target data object;

Acquiring the subject and producer information marked in the label of the first target data object;

Filling the first target data object into the cache area of the MySQL database according to the subject and producer information;

If the recognition result is negative, obtain the subject and producer information marked in the label of the first data object;

According to the subject and producer information, the first data object is filled into the cache area of the MySQL database.
The method according to claim 1, before said converting the target data in the buffer area into hypertext markup language data, the method further comprises:

Detecting whether the database transaction in the MySQL database is in an executing state;

If yes, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and add the updated data of the target data subsequently input to the cache area of the MySQL database to the initial data, where all The Locktable statement includes the Locktable statement with the WRITE keyword;

Obtain data with preset fields in the target data of the buffer area, and obtain the field size of the data with preset fields, where the preset fields include fields for Join, Where judgment and Orderby sorting, and use The fields in the MAX() command, MIN() command and Orderby command;

According to the data with the preset field and the field size of the data with the preset field, an index is created according to a preset rule, wherein the preset rule includes indexing the target data of the same field size and indexing the target data containing the same field size. Create an index for the target data whose repeated value does not exceed the fifth preset threshold;

Detecting whether the type of the data table in the MySQL database is defined as an InnoDB type;

If not, add TYPE=INNODB to the Createtable statement in the data table whose type is not the InnoDB type to obtain the InnoDB type table;

If yes, obtain the data table whose type is the InnoDB type, and use the data table whose type is the InnoDB type as the InnoDB type table;

Create a foreign key for the InnoDB type table through the alter table command.
The method according to claim 1, wherein the Kafka classifies the to-be-processed data according to the subject and the producer to obtain target data, comprising:

Acquiring characteristic information of the running state of the task corresponding to the to-be-processed data;

Sort and classify the to-be-processed data according to the characteristic information to obtain classification data and mark the classification type of the classification data. The classification type of the classification data includes task start data, task operation data, and task End data

The corresponding relationship between the classified data and the subject is established for the classified data according to the classification type, and the corresponding relationship between the classified data is marked to obtain target data.
A device for processing multi-cluster job records, the device comprising:

The transceiver module is used to receive job record data generated by multiple cluster running tasks;

The detection module is used to detect the running state of the task, and when it is detected that the running state is a preset trigger point, send a trigger instruction to the created trigger, and the trigger receives the trigger instruction and sets the The data format of the job record data received by the transceiver module is converted into a JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of running states of the cluster running tasks that are started or paused or ended;

The calling module is used to call the distributed messaging system Kafka in the message queue service system. When the Kafka receives a topic creation command, it calls the topic creation script, and creates a topic through the topic creation script; through the Kafka Create a producer according to the cluster corresponding to the to-be-processed data, and create a consumer according to the unified management website system through the Kafka;

The classification module is configured to input the to-be-processed data acquired by the detection module into the Kafka invoked by the invoking module, and use the Kafka according to the topic created by the invoking module and the producer pair The data to be processed is classified to obtain target data;

The dividing module is configured to divide the target data obtained by the classification module into blocks according to the producer created by the invoking module and the theme created by the invoking module to obtain multiple blocks, according to The created zoning protocol links a plurality of the blocks, and uses the linked blocks and the consumers as the data storage layer, wherein the zoning protocol is used to link each of the blocks through a chain Link and point to the previous block in an orderly manner from back to front, and link the created blockchain system to the Kafka, so that the Kafka can be used in the blockchain system;

The construction module is used to construct a blockchain system according to the zoning protocol and the data storage layer obtained by the division module, and input the target data into the repository through the blockchain system in accordance with the HTTP request mode , And trigger a read instruction, wherein the Kafka includes a repository, and the number of the repository includes multiple;

The receiving module is configured to output the target data input by the building module into the repository through the data storage layer when the unified management website system receives the read instruction triggered by the building module, And input the target data into the buffer area of the MySQL database; convert the target data in the buffer area into hypertext markup language data, and control the buffer area through an output control function to obtain the hypertext markup language data, And input the hypertext markup language data into the constructed static hypertext markup language page file through the created read-write function.
According to the device of claim 8, the classification module is further configured to:

Acquire the sequence correlation degree of the event, acquire the throughput of the event, and identify the entity type of the event, and acquire the correlation degree between the entity types, wherein the entity type is used for one address correspondence A user

According to the sequential correlation degree, the throughput, and the correlation degree, the data to be processed is classified into the topic according to a preset classification strategy to obtain the first classification data, wherein the preset classification The class strategy includes classifying the to-be-processed data that meets at least one of the following conditions: the sequential correlation degree is greater than a first preset threshold, the throughput is less than a second preset threshold, and the correlation degree is greater than a third preset threshold. The same theme

Marking the first classification data, wherein the content of the marking includes the sequential relevance, throughput, entity type, relevance between entity types and the name of the subject corresponding to the data to be processed;

The marked first classification data is classified according to the type of the producer, and the type of the producer of the marked first classification data is marked to obtain target data.
The device according to claim 8, after the classification module performs the classification of the to-be-processed data according to the theme and the producer through the Kafka, and before the acquisition of the target data, it is also used for :

Performing initialization processing on the classified data to be processed, and setting the length of the linear hash table according to the classification type of the classified data to be processed;

Obtain the key code value of the classified data to be processed, calculate the word frequency-inverse text frequency index TF-IDF value of the classified data item of the to-be-processed data, and obtain that the TF-IDF value is greater than the fourth preset The target key value corresponding to the data item of the threshold, wherein the data to be processed includes the data item;

The remainder obtained by dividing the target key value by a value not greater than the length of the linear hash table is used as the address of the linear hash table, and the target key value is used as the header of the linear hash table, Use the address of the linear hash table as the number of the linear hash table to obtain the linear hash table;

Randomly generate a preset number of strings of the same length, and perform statistics and analysis on the linear hash table through a preset string function to obtain hash distribution information and average bucket length information, wherein the hash distribution The information includes the usage rate of the bucket, and the average bucket length information includes the average length of all used buckets;

Determine whether the hash distribution information satisfies a first preset condition, and determine that the average bucket length information satisfies a second preset condition, where the first preset condition includes the number of used buckets and the total buckets The value of the ratio of the number of is the first preset range value, and the second preset condition includes the value of the average length of all used buckets is the second preset range value;

If the judgment result is all yes, then the linear hash table corresponding to the judgment result is all as the final linear hash table;

The target key code value is filled into the final linear hash table, and the final linear hash table is output in the form of a linked list to obtain target data.
According to the device of claim 8, the classification module is further configured to:

Performing data compression on the to-be-processed data;

Judging whether the transmission status of the transmission channel is normal;

If the judgment result is yes, input the data to be processed after data compression into the Kafka, and mark the data to be processed input to the Kafka as sent;

If the judgment result is no, input the data to be processed after data compression into the first MySQL database, and mark the data to be processed into the first MySQL database as not sent;

Calling the created polling script, and polling and detecting the first MySQL database according to a preset time through the polling script;

When polling detects that the first MySQL database has unsent pending data, and polling detects that the transmission status of the transmission channel is normal, input the unsent pending data to the first MySQL database. MySQL database;

Polling to detect whether the first MySQL database receives pending data marked as unsent;

If the detection result is yes, replace the unsent mark in the pending data marked as unsent with the sent mark;

If the detection result is negative, the unsent mark in the pending data marked as unsent is not updated.
According to the device of claim 8, the receiving module is further configured to:

The unified management website system invokes a listener program script, and detects whether the application layer in the blockchain system receives the read instruction through the listener program script;

When the detection result is negative, re-detect the application layer in the blockchain system;

When the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed tag to obtain the marked target data;

Convert the marked target data into a JSON object, and parse the JSON object into a first data object;

Identifying whether there is a data object with the same content as the first data object in the second data object of the MySQL database;

If the recognition result is yes, delete data objects that have the same content as the second data object from the first data object to obtain the first target data object;

Acquiring the subject and producer information marked in the label of the first target data object;

Filling the first target data object into the cache area of the MySQL database according to the subject and producer information;

If the recognition result is negative, obtain the subject and producer information marked in the label of the first data object;

According to the subject and producer information, the first data object is filled into the cache area of the MySQL database.
The device according to claim 8, before the receiving module executes the conversion of the target data in the buffer area into hypertext markup language data, it is further configured to:

Detecting whether the database transaction in the MySQL database is in an executing state;

If yes, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and add the updated data of the target data subsequently input to the cache area of the MySQL database to the initial data, where all The Locktable statement includes the Locktable statement with the WRITE keyword;

Obtain data with preset fields in the target data of the buffer area, and obtain the field size of the data with preset fields, where the preset fields include fields for Join, Where judgment and Orderby sorting, and use The fields in the MAX() command, MIN() command and Orderby command;

According to the data with the preset field and the field size of the data with the preset field, an index is created according to a preset rule, wherein the preset rule includes indexing the target data of the same field size and indexing the target data containing the same field size. Create an index for the target data whose repeated value does not exceed the fifth preset threshold;

Detecting whether the type of the data table in the MySQL database is defined as an InnoDB type;

If not, add TYPE=INNODB to the Createtable statement in the data table whose type is not the InnoDB type to obtain the InnoDB type table;

If yes, obtain the data table whose type is the InnoDB type, and use the data table whose type is the InnoDB type as the InnoDB type table;

Create a foreign key for the InnoDB type table through the alter table command.
According to the device of claim 8, the classification module is further configured to:

Acquiring characteristic information of the running state of the task corresponding to the to-be-processed data;

Sort and classify the to-be-processed data according to the characteristic information to obtain classification data and mark the classification type of the classification data. The classification type of the classification data includes task start data, task operation data, and task End data

The corresponding relationship between the classified data and the subject is established for the classified data according to the classification type, and the corresponding relationship between the classified data is marked to obtain target data.
A device for processing multi-cluster job records includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer program:

Obtain job record data generated by multiple cluster running tasks, and detect the running status of the task. When it is detected that the running status is a preset trigger point, a trigger instruction is sent to the created trigger, and the trigger receives The trigger instruction converts the data format of the job record data into a JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of operations that start, pause, or end when the cluster runs tasks status;

Calling the interface of the distributed messaging system Kafka in the message queue service system, when the interface of Kafka receives a topic creation command, calling the topic creation script, and creating a topic through the topic creation script;

Create producers through the Kafka according to the cluster corresponding to the data to be processed, and create consumers through the Kafka according to the unified management website system;

Input the to-be-processed data into the Kafka, and use the Kafka to classify the to-be-processed data according to the theme and the producer to obtain target data;

Divide the target data into blocks according to the producer and the theme to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and link multiple blocks with And the consumer as a data storage layer, wherein the zoning protocol is used to link each of the blocks in an orderly manner from back to front through a chain and point to the previous block, and the created block The blockchain system is linked to the Kafka, so that the Kafka can be used in the blockchain system;

A blockchain system is constructed according to the zoning protocol and the data storage layer, and the target data is input to the repository through the blockchain system in the HTTP request mode, and a read instruction is triggered, wherein the Kafka includes repositories, and the number of repositories includes multiple;

When the unified management website system receives the read instruction, output the target data in the repository through the data storage layer, and input the target data into the cache area of the MySQL database;

The target data in the buffer area is converted into hypertext markup language data, and the hypertext markup language data is written into the constructed static hypertext markup language page file.
The device according to claim 15, wherein the task includes an event when the processor executes the computer program, and the Kafka classifies the to-be-processed data according to the subject and the producer, To obtain target data, the following steps are included:

Acquire the sequence correlation degree of the event, acquire the throughput of the event, and identify the entity type of the event, and acquire the correlation degree between the entity types, wherein the entity type is used for one address correspondence A user

According to the sequential correlation degree, the throughput, and the correlation degree, the data to be processed is classified into the topic according to a preset classification strategy to obtain the first classification data, wherein the preset classification The class strategy includes classifying the to-be-processed data that meets at least one of the following conditions: the sequential correlation degree is greater than a first preset threshold, the throughput is less than a second preset threshold, and the correlation degree is greater than a third preset threshold. The same theme

Marking the first classification data, wherein the content of the marking includes the sequential relevance, throughput, entity type, relevance between entity types and the name of the subject corresponding to the data to be processed;

The marked first classification data is classified according to the type of the producer, and the type of the producer of the marked first classification data is marked to obtain target data.
The device according to claim 15, wherein the processor executes the computer program to realize the classification of the to-be-processed data by the Kafka according to the subject and the producer, and before the acquisition of the target data , Also includes the following steps:

Performing initialization processing on the classified data to be processed, and setting the length of the linear hash table according to the classification type of the classified data to be processed;

Obtain the key code value of the classified data to be processed, calculate the word frequency-inverse text frequency index TF-IDF value of the classified data item of the to-be-processed data, and obtain that the TF-IDF value is greater than the fourth preset The target key value corresponding to the data item of the threshold, wherein the data to be processed includes the data item;

The remainder obtained by dividing the target key value by a value not greater than the length of the linear hash table is used as the address of the linear hash table, and the target key value is used as the header of the linear hash table, Use the address of the linear hash table as the number of the linear hash table to obtain the linear hash table;

Randomly generate a preset number of strings of the same length, and perform statistics and analysis on the linear hash table through a preset string function to obtain hash distribution information and average bucket length information, wherein the hash distribution The information includes the usage rate of the bucket, and the average bucket length information includes the average length of all used buckets;

Determine whether the hash distribution information satisfies a first preset condition, and determine that the average bucket length information satisfies a second preset condition, where the first preset condition includes the number of used buckets and the total buckets The value of the ratio of the number of is the first preset range value, and the second preset condition includes the value of the average length of all used buckets is the second preset range value;

If the judgment result is all yes, then the linear hash table corresponding to the judgment result is all as the final linear hash table;

The target key code value is filled into the final linear hash table, and the final linear hash table is output in the form of a linked list to obtain target data.
The device according to claim 15, when the processor executes the computer program to implement the input of the to-be-processed data into the Kafka, it comprises the following steps:

Performing data compression on the to-be-processed data;

Judging whether the transmission status of the transmission channel is normal;

If the judgment result is yes, input the data to be processed after data compression into the Kafka, and mark the data to be processed input to the Kafka as sent;

If the judgment result is no, input the data to be processed after data compression into the first MySQL database, and mark the data to be processed into the first MySQL database as not sent;

Calling the created polling script, and polling and detecting the first MySQL database according to a preset time through the polling script;

When polling detects that the first MySQL database has unsent pending data, and polling detects that the transmission status of the transmission channel is normal, input the unsent pending data to the first MySQL database. MySQL database;

Polling to detect whether the first MySQL database receives pending data marked as unsent;

If the detection result is yes, replace the unsent mark in the pending data marked as unsent with the sent mark;

If the detection result is negative, the unsent mark in the pending data marked as unsent is not updated.
The device according to claim 15, wherein the processor executes the computer program to realize the output of the data in the repository through the data storage layer when the unified management website system receives the read instruction When the target data is input into the buffer area of the MySQL database, the following steps are included:

The unified management website system invokes a listener program script, and detects whether the application layer in the blockchain system receives the read instruction through the listener program script;

When the detection result is negative, re-detect the application layer in the blockchain system;

When the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed tag to obtain the marked target data;

Convert the marked target data into a JSON object, and parse the JSON object into a first data object;

Identifying whether there is a data object with the same content as the first data object in the second data object of the MySQL database;

If the recognition result is yes, delete data objects that have the same content as the second data object from the first data object to obtain the first target data object;

Acquiring the subject and producer information marked in the label of the first target data object;

Filling the first target data object into the cache area of the MySQL database according to the subject and producer information;

If the recognition result is negative, obtain the subject and producer information marked in the label of the first data object;

According to the subject and producer information, the first data object is filled into the cache area of the MySQL database.
A computer-readable storage medium stores computer instructions in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer executes the following steps:

Obtain job record data generated by multiple cluster running tasks, and detect the running status of the task. When it is detected that the running status is a preset trigger point, a trigger instruction is sent to the created trigger, and the trigger receives The trigger instruction converts the data format of the job record data into a JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of operations that start, pause, or end when the cluster runs tasks status;

Calling the interface of the distributed messaging system Kafka in the message queue service system, when the interface of Kafka receives a topic creation command, calling the topic creation script, and creating a topic through the topic creation script;

Create producers through the Kafka according to the cluster corresponding to the data to be processed, and create consumers through the Kafka according to the unified management website system;

Input the to-be-processed data into the Kafka, and use the Kafka to classify the to-be-processed data according to the theme and the producer to obtain target data;

Divide the target data into blocks according to the producer and the theme to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and link multiple blocks with And the consumer as a data storage layer, wherein the zoning protocol is used to link each of the blocks in an orderly manner from back to front through a chain and point to the previous block, and the created block The blockchain system is linked to the Kafka, so that the Kafka can be used in the blockchain system;

A blockchain system is constructed according to the zoning protocol and the data storage layer, and the target data is input to the repository through the blockchain system in the HTTP request mode, and a read instruction is triggered, wherein the Kafka includes repositories, and the number of repositories includes multiple;

When the unified management website system receives the read instruction, output the target data in the repository through the data storage layer, and input the target data into the cache area of the MySQL database;

The target data in the buffer area is converted into hypertext markup language data, and the hypertext markup language data is written into the constructed static hypertext markup language page file.