WO2021051531A1 - Method and apparatus for processing multi-cluster job record, and device and storage medium - Google Patents

Method and apparatus for processing multi-cluster job record, and device and storage medium Download PDF

Info

Publication number
WO2021051531A1
WO2021051531A1 PCT/CN2019/117086 CN2019117086W WO2021051531A1 WO 2021051531 A1 WO2021051531 A1 WO 2021051531A1 CN 2019117086 W CN2019117086 W CN 2019117086W WO 2021051531 A1 WO2021051531 A1 WO 2021051531A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processed
preset
kafka
target data
Prior art date
Application number
PCT/CN2019/117086
Other languages
French (fr)
Chinese (zh)
Inventor
林琪琛
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051531A1 publication Critical patent/WO2021051531A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/004Error avoidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of data processing, and in particular to methods, devices, equipment and storage media for processing multi-cluster job records.
  • the task job data generated by multiple clusters is obtained, the job job data is input to the unified management website, and the task type of the task job data is detected through the system of the unified management website.
  • the task job data is classified according to the task type to obtain classified data, and the classified data is input into a plurality of storage repositories respectively according to the task type.
  • the inventor realizes that because multiple clusters directly centralize task job data to a unified management website, it is easy to cause too many portal requests and channel congestion during parallel processing, which may easily lead to concurrent collapse of the multi-cluster job management system.
  • the present application provides a method, device, equipment, and storage medium for processing multi-cluster job records, which can solve the problem of concurrent crashes of the multi-cluster job management system.
  • this application provides a method for processing multi-cluster job records, including: obtaining job record data generated by multiple cluster running tasks, and detecting the running status of the tasks, and when it is detected that the running status is a preset At the trigger point, a trigger instruction is sent to the created trigger.
  • the trigger receives the trigger instruction and converts the data format of the job record data into a JSON format to obtain the data to be processed, wherein the preset Trigger points include the start, pause, or end operating status of multiple cluster running tasks; call the distributed messaging system Kafka in the message queue service system, and when the Kafka receives a topic creation command, call the topic creation Script, and create a topic through the topic creation script; create a producer through the Kafka according to the cluster corresponding to the to-be-processed data, and create a consumer through the Kafka according to the unified management website system;
  • the data is input to the Kafka, and the Kafka classifies the to-be-processed data according to the theme and the producer to obtain target data; the target data is processed according to the producer and the theme Block division to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and use the linked multiple blocks and the consumers as the data storage layer, wherein the zone The protocol is used to link each of the blocks in an orderly manner
  • the present application provides an apparatus for processing multi-cluster job records, including: a transceiver module for receiving job record data generated by multiple cluster running tasks; a detection module for detecting the running status of the tasks When it is detected that the operating state is the preset trigger point, a trigger instruction is sent to the created trigger, the trigger receives the trigger instruction, and converts the data format of the job record data received by the transceiver module into JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of running states of the cluster running tasks that are started or paused or ended; the calling module is used to call the message queue service system In the distributed messaging system Kafka, when the Kafka receives a topic creation command, the topic creation script is invoked, and the topic is created through the topic creation script; the Kafka creates a producer according to the cluster corresponding to the to-be-processed data, and Create consumers through the Kafka according to the unified management website system; a classification module for inputting the to
  • the blockchain system is linked to the Kafka, so that the Kafka can be used in the blockchain system; a building module is used to build the data storage layer according to the zoning protocol and the data storage layer obtained by the division module A blockchain system, through which the target data is input to the repository according to the HTTP request mode, and triggering a read instruction, wherein the Kafka includes a repository, and the number of the repository includes Multiple; receiving module for when the unified management website system receives the read instruction triggered by the building module, output the building module input into the repository through the data storage layer Target data, and input the target data into the cache area of the MySQL database; convert the target data in the cache area into hypertext markup language data, and control the cache area through an output control function to obtain the hypertext markup Language data, and input the hypertext markup language data into the constructed static hypertext markup language page file through the created read-write function.
  • the present application provides a computer device, which includes at least one connected processor, a memory, a display, and an input and output unit, wherein the memory is used to store program code, and the processor is used to call the memory To execute the method described in the first aspect above.
  • the present application provides a computer-readable storage medium having computer instructions stored in the computer-readable storage medium.
  • the computer instructions run on a computer, the computer executes the above-mentioned first aspect. method.
  • the job record data generated by multiple cluster running tasks is processed to obtain the to-be-processed data;
  • the distributed message system Kafka in the message queue service system is used to create topics, producers and consumers; through
  • the Kafka classifies the to-be-processed data to obtain target data, and constructs a blockchain system according to the producer, the subject, and the target data; and inputs the target data through the blockchain system
  • To the repository input the target data in the repository into the cache area of the MySQL database through the unified management website system; convert the target data in the cache area into hypertext markup language data, and mark the hypertext
  • the language data is input to the static hypertext markup language page file.
  • the Kafka system in the message queue server is used as a message queue and combined with the blockchain system for distributed data storage, and multi-node concurrency Processing data caching decouples the system, reduces the pressure of collecting job record data of multiple big data clusters at the same time, avoids the congestion of the job record data of multiple big data clusters at the same time, and achieves high fault tolerance, high speed cache, high efficiency and high Throughput processing effect; on the other hand, by statically processing the hypertext markup language on the target data input to the cache area of the MySQL database to increase the access speed and operation speed, and reduce the load of the server; in summary, this application It can achieve the effects of low-cost, high-efficiency, high-accuracy and multi-dimensional handling of the concurrent crash of the system. Therefore, the present application can effectively prevent and deal with the concurrent crash of the multi-cluster job management system.
  • FIG. 1 is a schematic flowchart of a method for evaluating cloud host resources in an embodiment of this application
  • FIG. 2 is a schematic structural diagram of an apparatus for evaluating cloud host resources in an embodiment of the application
  • FIG. 3 is a schematic structural diagram of a computer device in an embodiment of the application.
  • This application provides a method, device, equipment, and storage medium for processing multi-cluster job records, which can be used in an enterprise multi-cluster job management platform to manage and query job operation records generated by multiple big data clusters.
  • this application mainly provides the following technical solutions:
  • the method includes a big data cluster layer, a message queue server, and a unified management website system architecture.
  • the method is executed by a computer device. It can be a server or a terminal.
  • the terminal is a terminal on which the device 20 shown in FIG. 2 is installed.
  • This application does not limit the type of execution subject, including:
  • the job record data generated by multiple cluster running tasks and the running status of the detection task.
  • the trigger command is sent to the created trigger, and the trigger receives the trigger command.
  • the data format of the job record data is converted to the JSON format to obtain the data to be processed.
  • the preset trigger point includes the running state of starting or suspending or ending when multiple clusters are running tasks.
  • the data to be processed is the running account, job content, submission time, start time, project and task operation initiator data; when the task is detected to be suspended, the data to be processed is the running account , Job content, submission time, start time, belonging project, task operation starter, operation suspension time and task operation suspension data; when the end of the task is detected, the pending data obtained is the running account, job content, and submission time , Start time, belonging project, task operation starter, operation end time and running result data.
  • the job record data generated by running tasks in multiple clusters is stored in the MySQL database connected to multiple clusters. After reading the job record data from the MySQL database connected to multiple clusters, the data format is converted to JSON format. To facilitate the processing of structured data.
  • the method of the present application before converting the data format of the job record data into the JSON format, further includes: performing data compression on the job record data; and performing data compression on the job record data after data compression Perform state detection to obtain state information, analyze the state information through the cache coherency protocol to obtain the first data and the second data.
  • the state information includes the modified state, the exclusive state, the shared state, and the invalid state.
  • One data includes job operation data that has strong requirements for consistent caching; the second data includes job operation data that has strong requirements for consistent cache; call the Cache local cache interface to generate a cache builder CacheBuilder object for the first data, and assemble it
  • the first data automatic loading function, and the first key-value pair data of the first data is obtained; the first key-value pair data is automatically loaded into the physical memory cache through the CacheBuilder object and the automatic loading function; the CacheLoader subclass object is created, When it is detected that the get data operation fails, the first key-value pair data is automatically loaded into the physical memory cache through the CacheLoader subclass object;
  • the cache architecture component of the high-speed cache system Memcached and the data structure server Redis is built, among which, the cache architecture component Including a cache server; obtaining the first hash value of the node of the cache architecture component, and obtaining the second key-value pair data of the second data, and obtaining the second hash value of the second key-value pair data; according to the
  • Kafka Call the distributed messaging system Kafka in the message queue server.
  • Kafka receives the topic creation command, it calls the topic creation script, and creates the topic through the topic creation script.
  • the content of the topic creation command includes: the topic is running_result, there are M partitions, each partition needs to be allocated N copies, and the topic creation script including the command line running part and the background (controller) logic running part is called.
  • the background (controller) logic running part monitors the corresponding directory node under the distributed application coordination service zookeeper.
  • the command line running part creates a new data node when receiving the theme creation command to trigger the background (controller) logic running part.
  • the theme has been created . Create topics to facilitate the summary of the input data to be processed.
  • the cluster is the provider of the data to be processed, that is, the producer; the unified management website system is the consumer of the data to be processed, that is, the consumer.
  • the consumer end (unified management website system) runs automatically and has the function of monitoring the update of the topic in Kafka. Through Kafka's producer and consumer mode, the effect of parallel processing of job record data generated by the cluster and balance of system load is achieved.
  • the to-be-processed data is classified into the to-be-processed data corresponding to different producers.
  • the to-be-processed data for the producer classification the to-be-processed data is reclassified according to the theme to obtain Target data.
  • the data to be processed can be classified by summarizing the data into topics: classify events in a fixed order in the same topic, and use the same partition key; for different entities, and one entity depends on another entity Event, classify the event in the same topic; classify the events whose throughput is higher than the first preset throughput threshold into different topics, and classify the events that are lower than the second preset throughput threshold into the same topic Topic.
  • the above-mentioned tasks include events.
  • the above-mentioned classification of data to be processed by Kafka according to topics and producers to obtain target data includes: obtaining the order correlation degree of events, and obtaining The throughput of the event, and the entity type that identifies the event, and obtains the correlation between the entity types.
  • the entity type is used for an address corresponding to a user; according to the sequential correlation, throughput and correlation, it is classified according to the preset
  • the strategy classifies the to-be-processed data into topics to obtain the first classification data, where the preset classification strategy includes satisfying the order that the degree of relevance is greater than the first preset threshold, the throughput is less than the second preset threshold, and the degree of relevance is greater than the first
  • the data to be processed under at least one of the three preset thresholds are classified into the same topic; the first classification data is marked, where the marked content includes the order correlation degree, throughput, entity type, and entity type corresponding to the data to be processed
  • the degree of relevance and the name of the subject in the marked first classification data, the classification is carried out according to the type of the producer, and the type of the producer of the marked first classification data is marked to obtain the target data.
  • the above-mentioned tasks include events.
  • the method of the present application further includes: The data to be processed is initialized, and the length of the linear hash table is set according to the classification type of the data to be processed after classification; the key value of the data to be processed after classification is obtained, and the word frequency of the data item of the data to be processed after classification is calculated -Inverse text frequency index TF-IDF value, to obtain the target key value corresponding to the data item whose TF-IDF value is greater than the fourth preset threshold, where the data to be processed includes the data item; the target key value is not greater than the linear scatter
  • the remainder obtained by dividing the value of the length of the list is used as the address of the linear hash table, the target key value is used as the header of the linear hash table, and the address of the linear hash table is used as the number of the linear
  • the access speed is not affected by the total amount of access elements, and it is suitable for databases with large amounts of data and High-efficiency features to improve the query speed of the function record data, and improve the query of the function record data without affecting the query speed of the function record data while solving the concurrent crash problem of the multi-cluster job management system speed.
  • the performance of the system and the scalability of the system can be improved at a low cost.
  • the method of the present application includes a transmission channel
  • the above-mentioned inputting the to-be-processed data into Kafka includes: performing data compression on the to-be-processed data; judging whether the transmission status of the transmission channel is normal; if If the judgment result is yes, the data to be processed after data compression is input into Kafka, and the data to be processed into Kafka is marked as sent; if the judgment result is no, the data to be processed after data compression is input to the first A MySQL database, and mark the pending data input to the first MySQL database as unsent; call the created polling script, and use the polling script to poll the first MySQL database according to the preset time; when polling When detecting that the first MySQL database has unsent pending data, and polling detects that the transmission status of the transmission channel is normal, input the pending data marked as unsent into the first MySQL database; polling detects the first MySQL Whether the database has received the pending data marked as unsent; if
  • the above-mentioned classification of the data to be processed by Kafka according to the subject and the producer to obtain the target data includes: obtaining characteristic information of the running state of the task corresponding to the data to be processed; The feature information sorts and classifies the data to be processed to obtain the classification data and mark the classification type of the classification data.
  • the classification type of the classification data includes task start data, task operation data, and task end data; the classification data is classified according to the classification type. Establish the correspondence between the classification data and the subject, and mark the correspondence between the classification data to obtain the target data.
  • the zoning protocol is used to link each block in an orderly manner from back to front through the chain and point to the previous block, and to link the created blockchain system to Kafka, so that Kafka can be applied to the blockchain system in.
  • Kafka includes repositories, and the number of repositories includes multiple.
  • the blockchain system includes the application layer, and the application layer includes the unified management website system.
  • request methods for http request methods which specify resource methods for different operations according to different methods, including GET request method, HEAD request method, POST request method, PUT request method, DELETE request method, CONNECT request method, OPTIONS
  • the request method, the TRACE request method and the PATCH request method uses the PUT request method to facilitate the transmission of the latest data of the specified target data to the repository in the message queue server.
  • Multiple repositories are set up in Kafka to store target data categorically, and are stored in corresponding repositories according to the producers and topics in the target data, so as to facilitate the management and acquisition of target data.
  • the Kafka system in the message queue server is used as a message queue and combined with the blockchain system for distributed data storage, and the data cache is processed concurrently by multiple nodes, which decouples the system and slows down the collection of job record data from multiple big data clusters at the same time Pressure to avoid the congestion of simultaneous collection of job record data of multiple big data clusters, and achieve high fault tolerance, high-speed caching, high efficiency and high throughput processing effects.
  • the unified management website system When the unified management website system receives the read instruction, it outputs the target data in the repository through the data storage layer, and inputs the target data into the cache area of the MySQL database.
  • the Kafka system is monitored through the unified management website system, and the target data is captured and stored in a timely manner; the captured target data is input into the cache area of the MySQL database to facilitate subsequent reading of the target data and slow down the MySQL database
  • the storage pressure is monitored through the unified management website system, and the target data is captured and stored in a timely manner; the captured target data is input into the cache area of the MySQL database to facilitate subsequent reading of the target data and slow down the MySQL database The storage pressure.
  • a preset data consumption frequency is set, and the target data is input into the cache area of the MySQL database according to the preset data consumption frequency.
  • the input of the target data has a certain buffer, thereby reducing the storage pressure of the MySQL data.
  • the above-mentioned unified management website system when the above-mentioned unified management website system receives a read instruction, it outputs the target data in the repository through the data storage layer, and inputs the target data into the cache area of the MySQL database.
  • the unified management website system calls the listener script, through the listener script to detect whether the application layer in the blockchain system has received a read instruction; when the detection result is no, the application layer in the blockchain system is re- Detection; when the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed tag to obtain the marked target data; Convert the marked target data into a JSON object, and parse the JSON object into the first data object; identify whether there is a data object with the same content as the first data object in the second data object of the MySQL database; if the identification result is yes, then Delete the data object with the same content as the second data object from the first data object to obtain the first
  • Kafka monitors whether to receive updated target data to reduce the risk of repeated data capture and storage; object conversion through the target data, so that the target data can be stored in the MySQL database; through the theme and producer information
  • the sub-target data are respectively filled into the multiple buffer areas set in the MySQL database to facilitate the classified management and acquisition of the data.
  • the multi-cluster job management system can improve the management efficiency of the job record data.
  • the method of the present application further includes: sending a startup instruction to the hidden system that has been set up, and the hidden system receives the startup instruction, Start the hidden protocol.
  • the hidden system includes the hidden protocol, and the hidden protocol includes the protocol involving failure, destruction and deletion, human ethics and morality; when the hidden system detects that the input information is contrary to the hidden protocol, the data in the MySQL database is copied and backed up
  • the hidden system enters the authentication state, where the information includes fault instructions, destruction and deletion instructions, and files with Trojan horse programs; when the hidden system entering the authentication state detects that the input access request has management authority, it outputs a password input request ;
  • the hidden system that enters the authentication state detects that the entered password information is correct and that the number of inputs has not reached the limit, it accepts the access request; when the hidden system that enters the authentication state detects that the number of inputs has reached the limit, it does not accept Access requests, and permanently archive the copied and backed-up data.
  • the method of the present application before converting the target data in the cache area into hypertext markup language data, the method of the present application further includes: detecting whether the database transaction in the MySQL database is in an executing state; If yes, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and add the updated data of the target data subsequently input to the cache area of the MySQL database to the initial data, where the Locktable statement includes the WRITE keyword Locktable statement; get the data with preset fields in the target data of the buffer area, and get the field size of the data with preset fields.
  • the MySQL database is optimized to improve database performance in terms of maintaining the integrity of the target data and ensuring the relevance of the target data, so as to release the storage database of the system and slow down the storage of the database Pressure provides space and speed support for the concurrent processing of the system, so as to effectively prevent and deal with the problem of concurrent crashes of the multi-cluster job management system.
  • the method of the present application further includes: when the unified management website system recognizes When the login request entered by the user is correct, the login request is accepted; when the server in the unified management website receives the query request entered by the user, it obtains the characteristic information of the query request; converts the characteristic information into a search statement, and then uses the search statement to check the MySQL database. Filter the data to obtain the data corresponding to the query request; perform statistics and analysis on the data corresponding to the query request, and generate and output visual charts. By outputting corresponding visual charts according to user needs, it is convenient for users to read the job record data, so as to improve the usability of the multi-cluster job management system.
  • the embodiments of this application decouple the system, alleviate the pressure of collecting job record data of multiple big data clusters at the same time, and avoid the congestion of the job record data of multiple big data clusters at the same time.
  • this application can achieve low cost, high efficiency, high accuracy
  • the present application can effectively prevent and deal with the concurrent crash of the multi-cluster job management system.
  • the foregoing describes a method for processing multi-cluster job records in the present application, and the following describes a device that executes the foregoing method for processing multi-cluster job records.
  • FIG. 2 shows a schematic structural diagram of a device 20 for processing multi-cluster job records, which can be applied to an enterprise multi-cluster job management platform to manage and query job operation records generated by multiple big data clusters.
  • the apparatus 20 in the embodiment of the present application can implement the method for processing multi-cluster job records executed in the embodiment corresponding to FIG. 1 or any optional embodiment or optional implementation in the embodiment corresponding to FIG. 1 A step of.
  • the functions implemented by the device 20 can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions, and the modules may be software and/or hardware.
  • the device 20 may include a transceiver module 201, a detection module 202, a calling module 203, a classification module 204, a division module 205, a construction module 206, and a receiving module 207, the transceiver module 201, the detection module 202, the calling module 203, the classification module 204, and the division module 205.
  • the function implementation of the building module 206 and the receiving module 207 may refer to the operation performed in the embodiment corresponding to FIG. 1 or any optional embodiment or optional implementation manner in the embodiment corresponding to FIG. 1, which will not be repeated here. .
  • the detection module 202 can be used to control the transceiving operation of the transceiving module 201
  • the classification module 204 can be used to control the acquisition operation of the detection module 202 and the creation operation of the calling module 203
  • the division module 205 can be used to control the creation operation of the calling module 203 and the creation operation of the classification module 204.
  • the construction module 206 can be used to control the obtaining operation of the division module 205
  • the receiving module 207 can be used to control the trigger operation and input operation of the construction module 206.
  • the transceiver module 201 is used to receive job record data generated by multiple cluster running tasks; the detection module 202 is used to detect the running status of the task. When the running status is detected as a preset trigger point, the The trigger sends the trigger instruction, the trigger receives the trigger instruction, and converts the data format of the job record data received by the transceiver module 201 into JSON format to obtain the data to be processed; the call model 203 is used to call the distribution in the message queue service system Message system Kafka, when Kafka receives a topic creation command, it calls the topic creation script, and creates the topic through the topic creation script; Kafka creates a producer based on the cluster corresponding to the data to be processed, and Kafka creates consumption based on the unified management website system
  • the classification module 204 is used to input the to-be-processed data obtained by the detection module 202 into Kafka, and to classify the data to be processed by Kafka according to the topic created by the call model 203 and the
  • the target data is input to the repository, and the reading instruction is triggered; the receiving module 207 is used to input the target in the repository through the data storage layer output building module 206 when the unified management website system receives the reading instruction triggered by the building module 206 Data, and input the target data into the buffer area of the MySQL database; convert the target data in the buffer area into hypertext markup language data, control the buffer area through the output control function to obtain the hypertext markup language data, and pass the created read
  • the write function inputs the hypertext markup language data into the constructed static hypertext markup language page file.
  • the preset trigger point includes the start or pause or end of the running state of multiple cluster running tasks;
  • the zoning protocol is used to link each block from back to front through the chain and point to the previous block in an orderly manner, and Link the created blockchain system to Kafka so that Kafka can be used in the blockchain system;
  • Kafka includes repositories, and the number of repositories includes multiple.
  • the above-mentioned classification module 204 is also used to: obtain the sequence association degree of the events, obtain the throughput of the event, and identify the entity type of the event, and obtain the association degree between the entity types, where the entity type is used for one
  • the address corresponds to a user; according to the order of relevance, throughput and relevance, the data to be processed is classified into topics according to the preset classification strategy to obtain the first classification data, where the preset classification strategy includes meeting the order of association At least one of the data to be processed with a degree greater than the first preset threshold, throughput less than the second preset threshold, and relevance greater than the third preset threshold is classified into the same topic; marking the first classification data, where the mark
  • the content includes the order correlation degree, throughput, entity type, the correlation degree between entity types and the name of the topic corresponding to the data to be processed; the first classification data of the mark is classified according to the type of producer, and the mark is marked The type of the producer of the first classification data to obtain the target data.
  • the above-mentioned classification module 204 is further configured to: initialize the classified data to be processed, and set the length of the linear hash table according to the classification type of the classified data to be processed; the key to obtaining the classified data to be processed Code value, calculate the word frequency-inverse text frequency index TF-IDF value of the classified data item to be processed, and obtain the target key code value corresponding to the data item whose TF-IDF value is greater than the fourth preset threshold, wherein, to be processed The data includes data items; the remainder obtained by dividing the target key value by a value not greater than the length of the linear hash table is used as the address of the linear hash table, and the target key value is used as the head of the linear hash table, and the value of the linear hash table is used as the head of the linear hash table.
  • the address is used as the number of the linear hash table to obtain the linear hash table; a preset number of strings of the same length are randomly generated, and the linear hash table is counted and analyzed through the preset string function to obtain hash distribution information and average Bucket length information, where the hash distribution information includes the usage rate of the bucket, and the average bucket length information includes the average length of all used buckets; determine whether the hash distribution information meets the first preset condition, and determine the average bucket length information
  • the second preset condition is satisfied, where the first preset condition includes that the ratio of the number of used barrels to the total number of barrels is a first preset range value, and the second preset condition includes the average length of all used barrels
  • the value of is the second preset range value; if the judgment result is all yes, the linear hash table corresponding to the judgment result is the final linear hash table; the target key code value is filled into the final linear hash table, and Output the final linear hash table in the form of a linked list to obtain the
  • the above-mentioned classification module 204 is further used to: perform data compression on the data to be processed; determine whether the transmission status of the transmission channel is normal; if the determination result is yes, input the data to be processed after data compression into Kafka, and input The data to be processed to Kafka is marked as sent; if the judgment result is no, the data to be processed after data compression is input to the first MySQL database, and the data to be processed input to the first MySQL database is marked as unsent; Call the created polling script, and poll the first MySQL database according to the preset time through the polling script; when the polling detects that the first MySQL database has pending data that has not been sent, and the polling detects the transmission When the transmission status of the channel is normal, enter the pending data marked as unsent into the first MySQL database; poll to detect whether the first MySQL database receives pending data marked as unsent; if the detection result is yes, then The unsent mark in the pending data marked as unsent is replaced with the sent mark; if the detection result is no,
  • the above-mentioned receiving module 207 is also used to: uniformly manage the website system to call the listener program script, and use the listener program script to detect whether the application layer in the blockchain system receives the read instruction; when the detection result is no, the district The application layer in the blockchain system performs re-detection; when the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed Label to obtain the marked target data; convert the marked target data into a JSON object, and parse the JSON object into a first data object; identify whether there is a data object with the same content as the first data object in the second data object of the MySQL database ; If the recognition result is yes, delete the data object that has the same content as the second data object in the first data object to obtain the first target data object; obtain the subject and producer marked in the label of the first target data object Information; according to the subject and producer information, the first target data object is filled into the MySQL database cache;
  • the above-mentioned receiving module 207 is also used to: detect whether the database transaction in the MySQL database is in an executing state; if so, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and input the subsequent input to MySQL The updated data of the target data in the cache area of the database is added to the initial data.
  • the Locktable statement includes a Locktable statement with the WRITE keyword; to obtain data with preset fields in the target data of the cache area, and to obtain data with preset fields
  • the above-mentioned classification module 204 is further configured to: obtain characteristic information of the running state of the task corresponding to the data to be processed; sort and classify the data to be processed according to the characteristic information to obtain the classification data and mark the classification type of the classification data,
  • the classification types of the classification data include task start data, task operation data, and task end data; the classification data are respectively established corresponding to the classification data and the theme according to the classification type, and the corresponding relationship of the classification data is marked to obtain the target data.
  • the system is decoupled, reducing the pressure of collecting job record data of multiple big data clusters at the same time, avoiding the congestion of collecting job record data of multiple big data clusters at the same time, and achieving high fault tolerance and high-speed caching.
  • High-efficiency and high-throughput processing effect on the other hand, increase the access speed and operating speed, and reduce the load of the server; combined with the above, this application can achieve low-cost, high-efficiency, high-accuracy and multi-directional processing system concurrency
  • the effect of the crash problem therefore, the present application can effectively prevent and deal with the problem of concurrent crashes of the multi-cluster job management system.
  • the technical features mentioned in any embodiment or implementation of the method for processing multi-cluster job records are also applicable to the above-mentioned processing multi-cluster in this application.
  • the device 20 of the method of job recording the similarities will not be repeated here.
  • the device 20 in the embodiment of the present application is described above from the perspective of modular functional entities.
  • the following describes a computer device from the perspective of hardware, as shown in FIG. 3, which includes: a processor, a memory, a transceiver (or An input and output unit (not identified in FIG. 3) and a computer program stored in the memory and running on the processor.
  • the computer program may be a program corresponding to the method of processing multi-cluster job records in the embodiment corresponding to FIG. 1 or any optional embodiment in the embodiment corresponding to FIG. 1 or the optional implementation manner.
  • the processor executes the computer program to implement the method for processing multi-cluster job records executed by the device 20 in the embodiment corresponding to FIG.
  • the computer program may be a program corresponding to the method in the embodiment corresponding to FIG. 1 or any optional embodiment in the embodiment corresponding to FIG. 1 or the optional implementation manner.
  • the so-called processor can be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc.
  • the processor is the control center of the computer device, and various interfaces and lines are used to connect various parts of the entire computer device.
  • the memory may be used to store the computer program and/or module, and the processor implements the computer by running or executing the computer program and/or module stored in the memory and calling data stored in the memory.
  • the memory may mainly include a storage program area and a storage data area, where the storage program area can store an operating system, an application program required by at least one function (such as obtaining job record data generated by multiple cluster running tasks, etc.), etc.; storage
  • the data area can store the data created according to the use of the mobile phone (for example, divide the target data into blocks according to the producer and theme to obtain multiple blocks, etc.) and so on.
  • the memory can include high-speed random access memory, and can also include non-volatile memory, such as hard disks, memory, plug-in hard disks, smart media cards (SMC), and secure digital (SD) cards.
  • non-volatile memory such as hard disks, memory, plug-in hard disks, smart media cards (SMC), and secure digital (SD) cards.
  • Flash Card at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the transceiver can also be replaced by a receiver and a transmitter, and can be the same or different physical entities. When they are the same physical entity, they can be collectively referred to as transceivers.
  • the transceiver can be an input and output unit.
  • the entity device corresponding to the transceiver module 201 in FIG. 2 may be the transceiver in FIG. 3, and the entity corresponding to the detection module 202, the calling module 203, the classification module 204, the division module 205, the construction module 206, and the receiving module 207 in FIG. 2
  • the device may be the processor in FIG. 3.
  • the memory may be integrated in the processor, or may be provided separately from the processor.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:
  • the trigger command is sent to the created trigger, and the trigger receives the trigger command and records the job
  • the data format is converted to JSON format to obtain the data to be processed.
  • the preset trigger point includes the start, pause, or end operating status of multiple cluster running tasks; call the distributed messaging system Kafka in the message queue service system When the Kafka interface receives the topic creation command, the topic creation script is called, and the topic creation script is used to create the topic; Kafka creates a producer based on the cluster corresponding to the data to be processed, and Kafka creates consumption based on the unified management website system Enter the data to be processed into Kafka, and classify the data to be processed by Kafka according to the subject and producer to obtain the target data; divide the target data into blocks according to the producer and the subject to obtain multiple blocks, according to The created zoning protocol links multiple blocks, and uses the linked multiple blocks and consumers as the data storage layer.
  • the zoning protocol is used to link and point each block in an orderly manner from back to front through the chain
  • the previous block and link the created blockchain system to Kafka, so that Kafka can be used in the blockchain system; build the blockchain system according to the zoning protocol and data storage layer, and use the blockchain system to follow http Input the target data into the repository and trigger the read instruction.
  • Kafka includes the repository, and the number of repositories includes multiple; when the unified management website system receives the read instruction, it outputs the storage through the data storage layer Target data in the library, and enter the target data into the cache area of the MySQL database; convert the target data in the cache area into hypertext markup language data, and write the hypertext markup language data into the constructed static hypertext markup Language page file.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to the field of big data. Provided are a method and apparatus for processing a multi-cluster job record, and a device and a storage medium. The method comprises: processing job record data generated by a plurality of clusters to acquire data to be processed; creating a topic, a producer and a consumer by means of a distributed message system Kafka in a message queue service system; classifying the data to be processed by means of the Kafka so as to acquire target data, and constructing a blockchain system according to the producer, the topic and the target data; inputting the target data into a repository by means of the blockchain system; inputting, by means of a unified management website system, the target data in the repository into a cache region of a MySQL database; and converting the target data in the cache region into hypertext markup language data, and inputting the hypertext markup language data into a static hypertext markup language page file. By using the present solution, the problem of concurrent crash of a multi-cluster job management system can be solved.

Description

处理多集群作业记录的方法、装置、设备及存储介质Method, device, equipment and storage medium for processing multi-cluster job records
本申请要求于2019年9月19日提交中国专利局、申请号为201910884887.8,发明名称为“处理多集群作业记录的方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 19, 2019, the application number is 201910884887.8, and the invention title is "Methods, Apparatus, Equipment, and Storage Media for Processing Multi-cluster Operation Records", and its entire contents Incorporated in the application by reference.
技术领域Technical field
本申请涉及数据处理领域,尤其涉及处理多集群作业记录的方法、装置、设备及存储介质。This application relates to the field of data processing, and in particular to methods, devices, equipment and storage media for processing multi-cluster job records.
背景技术Background technique
目前的集群作业管理中,一般是通过获取多个集群生成的任务作业数据,将所述任务作业数据输入到统一管理网站,通过所述统一管理网站的系统检测所述任务作业数据的任务类型,根据所述任务类型对所述任务作业数据进行分类以获得分类数据,将所述分类数据按照任务类型分别输入到多个储存库中。In the current cluster job management, generally, the task job data generated by multiple clusters is obtained, the job job data is input to the unified management website, and the task type of the task job data is detected through the system of the unified management website. The task job data is classified according to the task type to obtain classified data, and the classified data is input into a plurality of storage repositories respectively according to the task type.
发明人意识到由于多个集群直接将任务作业数据集中到统一管理网站,易造成门户网站的请求过多和并行处理时的通道拥挤,从而易导致多集群作业管理系统的并发崩溃。The inventor realizes that because multiple clusters directly centralize task job data to a unified management website, it is easy to cause too many portal requests and channel congestion during parallel processing, which may easily lead to concurrent collapse of the multi-cluster job management system.
发明内容Summary of the invention
本申请提供了一种处理多集群作业记录的方法、装置、设备及存储介质,能够解决多集群作业管理系统的并发崩溃的问题。The present application provides a method, device, equipment, and storage medium for processing multi-cluster job records, which can solve the problem of concurrent crashes of the multi-cluster job management system.
第一方面,本申请提供一种处理多集群作业记录的方法,包括:获取多个集群运行任务生成的作业记录数据,以及检测所述任务的运行状态,当检测到所述运行状态为预设触发点时,向已创建的触发器发送触发指令,所述触发器接收所述触发指令,将所述作业记录数据的数据格式转换为JSON格式,以获取待处理数据,其中,所述预设触发点包括多个所述集群运行任务时的启动或暂停或结束的运行状态;调用所述消息队列服务系统中的分布式消息系统Kafka,当所述Kafka接收到主题创建命令时,调用主题创建脚本,并通过所述主题创建脚本创建主题;通过所述Kafka根据所述待处理数据对应的集群创建生产者,并通过所述Kafka根据所述统一管理网站系统创建消费者;将所述待处理数据输入至所述Kafka,并通过所述Kafka根据所述主题和所述生产者对所述待处理数据进行分类,以获取目标数据;根据所述生产者和所述主题对所述目标数据进行区块划分,以获取多个区块,根据已创建的区划协议链接多个所述区块,并以链接的多个所述区块和所述消费者作为数据储存层,其中,所述区划协议用于通过链条将每个所述区块从后向前有序地链接和指向前一个所述区块,以及将创建的所述区块链系统链接到所述Kafka中,以使所述Kafka运用到所述区块链系统中;根据所述区划协议和所述数据存储层构建区块链系统,并通过所述区块链系统按照http的请求方式将所述目标数据输入至储存库,并触发读取指令,其中,所述Kafka包括储存库,所述储存库的数量包括多个;当所述统一管理网站系统接收到所述读取指令时,通过所述数据存储层输出所述储存库中的所述目标数据,并将所述目标数据输入至MySQL数据库的缓存区;将所述缓存区中的目标数据转换为超文本标记语言数据,并将所述超文本标记语言数据写入至已构建的静态超文本标记语言页面文件。In a first aspect, this application provides a method for processing multi-cluster job records, including: obtaining job record data generated by multiple cluster running tasks, and detecting the running status of the tasks, and when it is detected that the running status is a preset At the trigger point, a trigger instruction is sent to the created trigger. The trigger receives the trigger instruction and converts the data format of the job record data into a JSON format to obtain the data to be processed, wherein the preset Trigger points include the start, pause, or end operating status of multiple cluster running tasks; call the distributed messaging system Kafka in the message queue service system, and when the Kafka receives a topic creation command, call the topic creation Script, and create a topic through the topic creation script; create a producer through the Kafka according to the cluster corresponding to the to-be-processed data, and create a consumer through the Kafka according to the unified management website system; The data is input to the Kafka, and the Kafka classifies the to-be-processed data according to the theme and the producer to obtain target data; the target data is processed according to the producer and the theme Block division to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and use the linked multiple blocks and the consumers as the data storage layer, wherein the zone The protocol is used to link each of the blocks in an orderly manner from back to front through a chain and point to the previous block, and to link the created blockchain system to the Kafka, so that the Kafka is used in the blockchain system; a blockchain system is constructed according to the zoning protocol and the data storage layer, and the target data is input to the repository through the blockchain system in accordance with the HTTP request method , And trigger a read instruction, where the Kafka includes a repository, and the number of repositories includes multiple; when the unified management website system receives the read instruction, the data storage layer outputs all The target data in the repository, and the target data is input into the buffer area of the MySQL database; the target data in the buffer area is converted into hypertext markup language data, and the hypertext markup language data Write to the built static hypertext markup language page file.
第二方面,本申请提供一种用于处理多集群作业记录的装置,包括:收发模块,用于接收多个集群运行任务生成的作业记录数据;检测模块,用于检测所述任务的运行状态,当检测到所述运行状态为预设触发点时,向已创建的触发器发送触发指令,所述触发器接收所述触发指令,将所述收发模块接收的作业记录数据的数据格式转换为JSON格式,以获 取待处理数据,其中,所述预设触发点包括多个所述集群运行任务时的启动或暂停或结束的运行状态;调用模块,用于调用所述消息队列服务系统中的分布式消息系统Kafka,当所述Kafka接收到主题创建命令时,调用主题创建脚本,并通过所述主题创建脚本创建主题;通过所述Kafka根据所述待处理数据对应的集群创建生产者,并通过所述Kafka根据所述统一管理网站系统创建消费者;分类模块,用于将所述检测模块获取的所述待处理数据输入至所述调用模块调用的所述Kafka,并通过所述Kafka根据所述调用模块创建的所述主题和所述生产者对所述待处理数据进行分类,以获取目标数据;划分模块,用于根据所述调用模块创建的所述生产者和所述调用模块创建的所述主题对所述分类模块获取的所述目标数据进行区块划分,以获取多个区块,根据已创建的区划协议链接多个所述区块,并以链接的多个所述区块和所述消费者作为数据储存层,其中,所述区划协议用于通过链条将每个所述区块从后向前有序地链接和指向前一个所述区块,以及将创建的所述区块链系统链接到所述Kafka中,以使所述Kafka运用到所述区块链系统中;构建模块,用于根据所述区划协议和所述划分模块获取的所述数据存储层构建区块链系统,并通过所述区块链系统按照http的请求方式将所述目标数据输入至储存库,并触发读取指令,其中,所述Kafka包括储存库,所述储存库的数量包括多个;接收模块,用于当所述统一管理网站系统接收到所述构建模块触发的所述读取指令时,通过所述数据存储层输出所述构建模块输入所述储存库中的所述目标数据,并将所述目标数据输入至MySQL数据库的缓存区;将所述缓存区中的目标数据转换为超文本标记语言数据,通过输出控制函数控制所述缓存区以获取所述超文本标记语言数据,并通过已创建的读写函数将所述超文本标记语言数据输入至已构建的静态超文本标记语言页面文件。In a second aspect, the present application provides an apparatus for processing multi-cluster job records, including: a transceiver module for receiving job record data generated by multiple cluster running tasks; a detection module for detecting the running status of the tasks When it is detected that the operating state is the preset trigger point, a trigger instruction is sent to the created trigger, the trigger receives the trigger instruction, and converts the data format of the job record data received by the transceiver module into JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of running states of the cluster running tasks that are started or paused or ended; the calling module is used to call the message queue service system In the distributed messaging system Kafka, when the Kafka receives a topic creation command, the topic creation script is invoked, and the topic is created through the topic creation script; the Kafka creates a producer according to the cluster corresponding to the to-be-processed data, and Create consumers through the Kafka according to the unified management website system; a classification module for inputting the to-be-processed data acquired by the detection module into the Kafka invoked by the invoking module, and using the Kafka according to The subject created by the invoking module and the producer classify the to-be-processed data to obtain target data; the dividing module is used to create according to the producer created by the invoking module and the invoking module The subject of the subject divides the target data obtained by the classification module into blocks to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and link multiple blocks with The block and the consumer serve as a data storage layer, wherein the zoning protocol is used to link each block in an orderly manner from back to front through a chain and point to the previous block, as well as all the blocks to be created. The blockchain system is linked to the Kafka, so that the Kafka can be used in the blockchain system; a building module is used to build the data storage layer according to the zoning protocol and the data storage layer obtained by the division module A blockchain system, through which the target data is input to the repository according to the HTTP request mode, and triggering a read instruction, wherein the Kafka includes a repository, and the number of the repository includes Multiple; receiving module for when the unified management website system receives the read instruction triggered by the building module, output the building module input into the repository through the data storage layer Target data, and input the target data into the cache area of the MySQL database; convert the target data in the cache area into hypertext markup language data, and control the cache area through an output control function to obtain the hypertext markup Language data, and input the hypertext markup language data into the constructed static hypertext markup language page file through the created read-write function.
第三方面,本申请提供了一种计算机设备,其包括至少一个连接的处理器、存储器、显示器和输入输出单元,其中,所述存储器用于存储程序代码,所述处理器用于调用所述存储器中的程序代码来执行上述第一方面所述的方法。In a third aspect, the present application provides a computer device, which includes at least one connected processor, a memory, a display, and an input and output unit, wherein the memory is used to store program code, and the processor is used to call the memory To execute the method described in the first aspect above.
第四方面,本申请提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行上述第一方面所述的方法。In a fourth aspect, the present application provides a computer-readable storage medium having computer instructions stored in the computer-readable storage medium. When the computer instructions run on a computer, the computer executes the above-mentioned first aspect. method.
本申请提供的技术方案中,通过将多个集群运行任务生成的作业记录数据处理以获取待处理数据;通过消息队列服务系统中的分布式消息系统Kafka创建主题、生产者和和消费者;通过所述Kafka对所述待处理数据进行分类以获取目标数据,并根据所述生产者、所述主题和所述目标数据构建区块链系统;通过所述区块链系统将所述目标数据输入至储存库;通过统一管理网站系统将所述储存库中的目标数据输入至MySQL数据库的缓存区;将所述缓存区中的目标数据转换为超文本标记语言数据,并将所述超文本标记语言数据输入至静态超文本标记语言页面文件。由于采用大数据集群、消息队列服务器、统一管理网站和区划链系统的架构,一方面,通过消息队列服务器中的Kafka系统作为消息队列和结合区块链系统进行分布式数据存储,以多节点并发处理数据的缓存,解耦了系统,减缓同时汇集多个大数据集群的作业记录数据的压力,避免多个大数据集群的作业记录数据同时汇集的拥挤,实现高容错、高速缓存、高效和高吞吐量的处理效果;另一方面,通过对输入至MySQL数据库的缓存区的目标数据进行超文本标记语言静态化处理,以增加访问速度和运行速度,以及减轻服务器的负荷;综合上述,本申请可实现低成本、高效率、高准确性和多方位的处理系统并发崩溃问题的效果,因而,本申请能够有效防止和处理多集群作业管理系统并发崩溃的问题。In the technical solution provided by this application, the job record data generated by multiple cluster running tasks is processed to obtain the to-be-processed data; the distributed message system Kafka in the message queue service system is used to create topics, producers and consumers; through The Kafka classifies the to-be-processed data to obtain target data, and constructs a blockchain system according to the producer, the subject, and the target data; and inputs the target data through the blockchain system To the repository; input the target data in the repository into the cache area of the MySQL database through the unified management website system; convert the target data in the cache area into hypertext markup language data, and mark the hypertext The language data is input to the static hypertext markup language page file. Due to the adoption of the architecture of big data clusters, message queue servers, unified management websites and zoning chain systems, on the one hand, the Kafka system in the message queue server is used as a message queue and combined with the blockchain system for distributed data storage, and multi-node concurrency Processing data caching decouples the system, reduces the pressure of collecting job record data of multiple big data clusters at the same time, avoids the congestion of the job record data of multiple big data clusters at the same time, and achieves high fault tolerance, high speed cache, high efficiency and high Throughput processing effect; on the other hand, by statically processing the hypertext markup language on the target data input to the cache area of the MySQL database to increase the access speed and operation speed, and reduce the load of the server; in summary, this application It can achieve the effects of low-cost, high-efficiency, high-accuracy and multi-dimensional handling of the concurrent crash of the system. Therefore, the present application can effectively prevent and deal with the concurrent crash of the multi-cluster job management system.
附图说明Description of the drawings
图1为本申请实施例中评估云主机资源的方法的一种流程示意图;FIG. 1 is a schematic flowchart of a method for evaluating cloud host resources in an embodiment of this application;
图2为本申请实施例中用于评估云主机资源的装置的一种结构示意图;2 is a schematic structural diagram of an apparatus for evaluating cloud host resources in an embodiment of the application;
图3为本申请实施例中计算机装置的一种结构示意图。FIG. 3 is a schematic structural diagram of a computer device in an embodiment of the application.
具体实施方式detailed description
本申请提供一种处理多集群作业记录的方法、装置、设备及存储介质,可用于企业多集群作业管理平台,对多个大数据集群生成的作业运行记录进行管理与查询。This application provides a method, device, equipment, and storage medium for processing multi-cluster job records, which can be used in an enterprise multi-cluster job management platform to manage and query job operation records generated by multiple big data clusters.
为解决上述技术问题,本申请主要提供以下技术方案:In order to solve the above technical problems, this application mainly provides the following technical solutions:
请参照图1,以下对本申请提供一种处理多集群作业记录的方法进行举例说明,该方法包括大数据集群层、消息队列服务器和统一管理网站系统的架构,该方法由计算机设备执行,计算机设备可为服务器或者终端,当图2所示的装置20为应用或者执行程序时,终端为安装图2所示的装置20的终端,本申请不对执行主体的类型作限制,包括:Please refer to Figure 1, the following is an example of a method for processing multi-cluster job records provided in this application. The method includes a big data cluster layer, a message queue server, and a unified management website system architecture. The method is executed by a computer device. It can be a server or a terminal. When the device 20 shown in FIG. 2 is an application or an executing program, the terminal is a terminal on which the device 20 shown in FIG. 2 is installed. This application does not limit the type of execution subject, including:
101、获取多个集群运行任务生成的作业记录数据,以及检测任务的运行状态,当检测到运行状态为预设触发点时,向已创建的触发器发送触发指令,触发器接收触发指令,将作业记录数据的数据格式转换为JSON格式,以获取待处理数据。101. Obtain the job record data generated by multiple cluster running tasks and the running status of the detection task. When the running status is detected as the preset trigger point, the trigger command is sent to the created trigger, and the trigger receives the trigger command. The data format of the job record data is converted to the JSON format to obtain the data to be processed.
其中,预设触发点包括多个集群运行任务时的启动或暂停或结束的运行状态。采用create trigger trigger_name on{ }as sql_statemen的T-SQL语句根据任务的启动、暂停和结束的触发点创建触发器,以使其在检测到大数据集群的任务的运行状态为启动或暂停或结束时,触发执行处理脚本,将作业记录数据的数据格式转换为JSON格式,以获得待处理数据。当检测到任务启动时,获取的待处理数据为运行账号、作业内容、提交时间、启动时间、所属项目和任务操作启动者的数据;当检测到任务暂停时,获取的待处理数据为运行账号、作业内容、提交时间、启动时间、所属项目、任务操作启动者、操作暂停时间和任务操作暂停者的数据;当检测到任务结束时,获取的待处理数据为运行账号、作业内容、提交时间、启动时间、所属项目、任务操作启动者、操作结束时间和运行结果的数据。多个集群运行任务生成的作业记录数据存储在连接着多个集群的MySQL数据库中,将作业记录数据从连接着多个集群的MySQL数据库中读取出来后,将其数据格式转为JSON格式,以便于对结构化数据进行处理。Wherein, the preset trigger point includes the running state of starting or suspending or ending when multiple clusters are running tasks. Use the T-SQL statement of create trigger trigger_name on{}as sql_statemen to create a trigger based on the trigger point of the task's start, pause, and end, so that it detects that the running state of the big data cluster task is start, pause, or end , Trigger the execution of the processing script, and convert the data format of the job record data to the JSON format to obtain the data to be processed. When the task start is detected, the data to be processed is the running account, job content, submission time, start time, project and task operation initiator data; when the task is detected to be suspended, the data to be processed is the running account , Job content, submission time, start time, belonging project, task operation starter, operation suspension time and task operation suspension data; when the end of the task is detected, the pending data obtained is the running account, job content, and submission time , Start time, belonging project, task operation starter, operation end time and running result data. The job record data generated by running tasks in multiple clusters is stored in the MySQL database connected to multiple clusters. After reading the job record data from the MySQL database connected to multiple clusters, the data format is converted to JSON format. To facilitate the processing of structured data.
可选的,在本申请的一些实施例中,上述的将作业记录数据的数据格式转换为JSON格式之前,本申请方法还包括:对作业记录数据进行数据压缩;对经过数据压缩的作业记录数据进行状态检测,以获取状态信息,通过缓存一致性协议对状态信息进行分析,以获取第一数据和第二数据,其中,状态信息包括被修改状态、独享状态、共享状态和无效状态,第一数据包括对一致性缓存要求强的作业运行数据;第二数据包括除对一致性缓存要求强之外的作业运行数据;调用Cache本地缓存接口对第一数据生成缓存生成器CacheBuilder对象,并装配第一数据自动装载功能,以及获取第一数据的第一键值对数据;通过CacheBuilder对象和自动装载功能,将第一键值对数据自动加载到物理内存的缓存中;创建CacheLoader子类对象,当检测到get数据操作失败时,通过CacheLoader子类对象将第一键值对数据自动加载到物理内存的缓存中;搭建高速缓存系统Memcached和数据结构服务器Redis的缓存架构组件,其中,缓存架构组件包括缓存服务器;获取缓存架构组件的节点的第一哈希值,以及获取第二数据的第二键值对数据,并获取第二键值对数据的第二哈希值;根据第一哈希值和第二哈希值,通过缓存架构组件将第二键值对数据存储在缓存架构组件的缓存服务器中,以获得最终的作业记录数据。通过对获取的作业记录数据进行数据压缩,以减少作业记录数据在传输或者被转移过程中的数据量;通过对经过数据压缩的作业记录数据进行本地缓存和综合分布缓存,结合分布式缓存的高速缓存、高性能、 动态拓展性、高可用性和易用性实现在减缓系统的数据存储压力和通道压力的情况下,降低服务器的读写压力和负载,以及对缓存数据规范化和提高缓存的命中率的功能,从而达到快速而准确地对作业记录数据进行缓存,以防止系统并发崩溃的效果。Optionally, in some embodiments of the present application, before converting the data format of the job record data into the JSON format, the method of the present application further includes: performing data compression on the job record data; and performing data compression on the job record data after data compression Perform state detection to obtain state information, analyze the state information through the cache coherency protocol to obtain the first data and the second data. The state information includes the modified state, the exclusive state, the shared state, and the invalid state. One data includes job operation data that has strong requirements for consistent caching; the second data includes job operation data that has strong requirements for consistent cache; call the Cache local cache interface to generate a cache builder CacheBuilder object for the first data, and assemble it The first data automatic loading function, and the first key-value pair data of the first data is obtained; the first key-value pair data is automatically loaded into the physical memory cache through the CacheBuilder object and the automatic loading function; the CacheLoader subclass object is created, When it is detected that the get data operation fails, the first key-value pair data is automatically loaded into the physical memory cache through the CacheLoader subclass object; the cache architecture component of the high-speed cache system Memcached and the data structure server Redis is built, among which, the cache architecture component Including a cache server; obtaining the first hash value of the node of the cache architecture component, and obtaining the second key-value pair data of the second data, and obtaining the second hash value of the second key-value pair data; according to the first hash The second key-value pair data is stored in the cache server of the cache architecture component through the cache architecture component to obtain the final job record data. By compressing the acquired job record data to reduce the amount of job record data in the process of transmission or being transferred; through the local cache and comprehensive distributed cache of the job record data after data compression, combined with the high speed of the distributed cache Cache, high performance, dynamic scalability, high availability, and ease of use can reduce the read and write pressure and load of the server while reducing the data storage pressure and channel pressure of the system, and standardize the cached data and improve the hit rate of the cache , So as to achieve the effect of quickly and accurately caching the job record data to prevent concurrent system crashes.
102、调用消息队列服务器中的分布式消息系统Kafka,当Kafka接收到主题创建命令时,调用主题创建脚本,并通过主题创建脚本创建主题。102. Call the distributed messaging system Kafka in the message queue server. When Kafka receives the topic creation command, it calls the topic creation script, and creates the topic through the topic creation script.
在获取到待处理数据后,调用消息队列服务器中的分布式消息系统Kafka,并发起主题创建命令“bin/kafka-topics.sh--create–zookeeper localhost:--replication-factor N--partitions M–topic running_result”,主题创建命令的内容包括:主题为running_result,有M个分区,每个分区需分配N个副本,调用包含命令行运行部分和后台(controller)逻辑运行部分的主题创建脚本,通过后台(controller)逻辑运行部分监听分布式应用程序协调服务zookeeper下对应的目录节点,命令行运行部分在接收到主题创建命令时创建新的数据节点以触发后台(controller)逻辑运行部分,已创建主题。通过创建主题以便于归纳输入的待处理数据。After obtaining the data to be processed, call the distributed messaging system Kafka in the message queue server and initiate the topic creation command "bin/kafka-topics.sh--create--zookeeper localhost:--replication-factor N--partitions M –Topic running_result", the content of the topic creation command includes: the topic is running_result, there are M partitions, each partition needs to be allocated N copies, and the topic creation script including the command line running part and the background (controller) logic running part is called. The background (controller) logic running part monitors the corresponding directory node under the distributed application coordination service zookeeper. The command line running part creates a new data node when receiving the theme creation command to trigger the background (controller) logic running part. The theme has been created . Create topics to facilitate the summary of the input data to be processed.
103、通过Kafka根据待处理数据对应的集群创建生产者,并通过Kafka根据统一管理网站系统创建消费者。103. Create producers through Kafka according to the cluster corresponding to the data to be processed, and create consumers through Kafka according to the unified management website system.
集群作为待处理数据的提供方,即生产者;统一管理网站系统作为待处理数据的消费方,即消费者。消费者端(统一管理网站系统)自动运行,具备监听Kafka中主题的更新情况的功能。通过Kafka的生产者与消费者方式实现并行处理集群产生的作业记录数据以及均衡系统负载的效果。The cluster is the provider of the data to be processed, that is, the producer; the unified management website system is the consumer of the data to be processed, that is, the consumer. The consumer end (unified management website system) runs automatically and has the function of monitoring the update of the topic in Kafka. Through Kafka's producer and consumer mode, the effect of parallel processing of job record data generated by the cluster and balance of system load is achieved.
104、将待处理数据输入至Kafka,并通过Kafka根据主题和生产者对待处理数据进行分类,以获取目标数据。104. Input the to-be-processed data into Kafka, and use Kafka to classify the to-be-processed data according to topics and producers to obtain target data.
通过Kafka根据不同的生产者(即集群),将待处理数据分类成对应不同生产者的待处理数据,在进行生产者分类的待处理数据中,再根据主题对待处理数据进行再分类,以获取目标数据。其中,可通过将数据归纳到主题中以对待处理数据进行分类:将固定顺序的事件分类在同一个主题中,以及使用相同的分区键;对于具备不同实体,且一个实体依赖于另一个实体的事件,将该事件分类在同一个主题中;将事件吞吐量高于第一预设吞吐量阈值的事件分类在不同的主题中,将低于第二预设吞吐量阈值的事件分类在同一个主题中。通过根据主题和生产者对待处理数据进行分类,以便于快速而准确地获取数据和以便于对数据的并发处理。Through Kafka, according to different producers (ie clusters), the to-be-processed data is classified into the to-be-processed data corresponding to different producers. Among the to-be-processed data for the producer classification, the to-be-processed data is reclassified according to the theme to obtain Target data. Among them, the data to be processed can be classified by summarizing the data into topics: classify events in a fixed order in the same topic, and use the same partition key; for different entities, and one entity depends on another entity Event, classify the event in the same topic; classify the events whose throughput is higher than the first preset throughput threshold into different topics, and classify the events that are lower than the second preset throughput threshold into the same topic Topic. By categorizing the data to be processed according to the subject and the producer, the data can be obtained quickly and accurately and the concurrent processing of the data can be facilitated.
可选的,在本申请的一些实施例中,上述的任务包括事件,上述的通过Kafka根据主题和生产者对待处理数据进行分类,以获取目标数据,包括:获取事件的顺序关联度,以及获取事件的吞吐量,以及识别事件的实体类型,并获取实体类型之间的关联度,其中,实体类型用于一个地址对应一个用户;根据顺序关联度、吞吐量和关联度,按照预设归类策略将待处理数据归类到主题,以获得第一分类数据,其中,预设归类策略包括将满足顺序关联度大于第一预设阈值、吞吐量小于第二预设阈值和关联度大于第三预设阈值中的至少一个条件的待处理数据归类到同一个主题;标记第一分类数据,其中,标记的内容包括待处理数据对应的顺序关联度、吞吐量、实体类型、实体类型之间的关联度和主题的名称;在标记的第一分类数据中根据生产者的类型进行分类,并标记标记的第一分类数据的生产者的类型,以获取目标数据。通过按照此规则对待处理数据进行分类,以避免所有任务事件都归类到一个主题里;通过合理归类到多个主题中,以保证在待处理数据的有序和完整的基础上能够获取多个事件以及用户对应的待处理数据。Optionally, in some embodiments of the present application, the above-mentioned tasks include events. The above-mentioned classification of data to be processed by Kafka according to topics and producers to obtain target data includes: obtaining the order correlation degree of events, and obtaining The throughput of the event, and the entity type that identifies the event, and obtains the correlation between the entity types. Among them, the entity type is used for an address corresponding to a user; according to the sequential correlation, throughput and correlation, it is classified according to the preset The strategy classifies the to-be-processed data into topics to obtain the first classification data, where the preset classification strategy includes satisfying the order that the degree of relevance is greater than the first preset threshold, the throughput is less than the second preset threshold, and the degree of relevance is greater than the first The data to be processed under at least one of the three preset thresholds are classified into the same topic; the first classification data is marked, where the marked content includes the order correlation degree, throughput, entity type, and entity type corresponding to the data to be processed The degree of relevance and the name of the subject; in the marked first classification data, the classification is carried out according to the type of the producer, and the type of the producer of the marked first classification data is marked to obtain the target data. By categorizing the data to be processed in accordance with this rule, all task events are prevented from being classified into one topic; by rationally categorizing into multiple topics to ensure that multiple data can be obtained on the basis of the order and integrity of the data to be processed. Events and the data to be processed corresponding to the user.
可选的,在本申请的一些实施例中,上述的任务包括事件,上述的通过Kafka根据主题和生产者对待处理数据进行分类之后,获取目标数据之前,本申请的方法还包括:对经 过分类的待处理数据进行初始化处理,以及根据经过分类的待处理数据的分类类型设置线性散列表的长度;获取经过分类的待处理数据的关键码值,计算经过分类的待处理数据的数据项的词频-逆文本频率指数TF-IDF值,获取TF-IDF值大于第四预设阈值的数据项对应的目标关键码值,其中,待处理数据包括数据项;将目标关键码值被不大于线性散列表的长度的数值除后所得的余数作为线性散列表的地址,以目标关键码值作为线性散列表的表头,以线性散列表的地址作为线性散列表的数量,以获取线性散列表;随机生成预设数量的相同长度的字符串,通过预设字符串函数对线性散列表进行统计与分析,以获取散列分布性信息和平均桶长信息,其中,散列分布性信息包括桶的使用率,平均桶长信息包括所有已使用桶的平均长度;判断散列分布性信息是否满足第一预设条件,以及判断平均桶长信息满足第二预设条件,其中,第一预设条件包括已使用桶的数量与总的桶的数量的比例值为第一预设范围值,第二预设条件包括所有已使用桶的平均长度的值为第二预设范围值;若判断结果均为是,则以判断结果均为是对应的线性散列表作为最终的线性散列表;将目标关键码值填充至最终的线性散列表,并以链表形式输出最终的线性散列表,以获取目标数据。通过根据TF-IDF值对目标关键码值进行排序,以便于快速而准确地对待处理数据进行分类;通过利用线性散列表的访问速度不受访问元素总量影响、适用于数据量超大的数据库和高效率的特性,以提高对作用记录数据的查询速度,进而在解决多集群作业管理系统并发崩溃问题的情况下不影响到对作用记录数据的查询速度的基础下,提高对作用记录数据的查询速度。通过对待处理数据进行线性散列表处理,能够低成本地提升系统的性能和提高系统的扩展性。Optionally, in some embodiments of the present application, the above-mentioned tasks include events. After the above-mentioned Kafka classifies the data to be processed according to the subject and the producer and before the target data is obtained, the method of the present application further includes: The data to be processed is initialized, and the length of the linear hash table is set according to the classification type of the data to be processed after classification; the key value of the data to be processed after classification is obtained, and the word frequency of the data item of the data to be processed after classification is calculated -Inverse text frequency index TF-IDF value, to obtain the target key value corresponding to the data item whose TF-IDF value is greater than the fourth preset threshold, where the data to be processed includes the data item; the target key value is not greater than the linear scatter The remainder obtained by dividing the value of the length of the list is used as the address of the linear hash table, the target key value is used as the header of the linear hash table, and the address of the linear hash table is used as the number of the linear hash table to obtain the linear hash table; random Generate a preset number of strings of the same length, and perform statistics and analysis on the linear hash table through the preset string function to obtain hash distribution information and average bucket length information, where hash distribution information includes the use of buckets The average bucket length information includes the average length of all used buckets; it is determined whether the hash distribution information meets the first preset condition, and the average bucket length information meets the second preset condition, where the first preset condition includes The ratio of the number of used barrels to the total number of barrels is the first preset range value, and the second preset condition includes the value of the average length of all used barrels as the second preset range value; if the judgment result is both If yes, take the linear hash table corresponding to the judgment result as the final linear hash table; fill the target key code value into the final linear hash table, and output the final linear hash table in the form of a linked list to obtain the target data. By sorting the target key code values according to the TF-IDF value, in order to quickly and accurately classify the data to be processed; by using the linear hash table, the access speed is not affected by the total amount of access elements, and it is suitable for databases with large amounts of data and High-efficiency features to improve the query speed of the function record data, and improve the query of the function record data without affecting the query speed of the function record data while solving the concurrent crash problem of the multi-cluster job management system speed. By performing linear hash table processing on the data to be processed, the performance of the system and the scalability of the system can be improved at a low cost.
可选的,在本申请的一些实施例中,本申请的方法包括传输通道,上述的将待处理数据输入至Kafka,包括:对待处理数据进行数据压缩;判断传输通道的传输状态是否正常;若判断结果为是,则将经过数据压缩的待处理数据输入至Kafka,并将输入至Kafka的待处理数据标记为已发送;若判断结果为否,则将经过数据压缩的待处理数据输入至第一MySQL数据库,并将输入至第一MySQL数据库的待处理数据标记为未发送;调用已创建的轮询脚本,通过轮询脚本按照预设时间对第一MySQL数据库进行轮询检测;当轮询检测到第一MySQL数据库存在标记未发送的待处理数据,以及轮询检测到传输通道的传输状态正常时,将标记为未发送的待处理数据输入至第一MySQL数据库;轮询检测第一MySQL数据库是否接收到标记为未发送的待处理数据;若检测结果为是,则将标记为未发送的待处理数据中的未发送标记替换为已发送标记;若检测结果为否,则不更新标记为未发送的待处理数据中的未发送标记。通过对待处理数据的数据进行标记,以避免对待处理数据进行重复处理,从而增加系统的负荷,进而有利于防止多集群作业管理系统的并发崩溃。Optionally, in some embodiments of the present application, the method of the present application includes a transmission channel, and the above-mentioned inputting the to-be-processed data into Kafka includes: performing data compression on the to-be-processed data; judging whether the transmission status of the transmission channel is normal; if If the judgment result is yes, the data to be processed after data compression is input into Kafka, and the data to be processed into Kafka is marked as sent; if the judgment result is no, the data to be processed after data compression is input to the first A MySQL database, and mark the pending data input to the first MySQL database as unsent; call the created polling script, and use the polling script to poll the first MySQL database according to the preset time; when polling When detecting that the first MySQL database has unsent pending data, and polling detects that the transmission status of the transmission channel is normal, input the pending data marked as unsent into the first MySQL database; polling detects the first MySQL Whether the database has received the pending data marked as unsent; if the test result is yes, replace the unsent mark in the pending data marked as unsent with the sent mark; if the test result is no, the mark is not updated It is the unsent mark in the unsent pending data. By marking the data to be processed, it is possible to avoid repeated processing of the data to be processed, thereby increasing the load of the system, thereby helping to prevent concurrent crashes of the multi-cluster job management system.
可选的,在本申请的一些实施例中,上述的通过Kafka根据主题和生产者对待处理数据进行分类,以获取目标数据,包括:获取待处理数据对应的任务的运行状态的特征信息;根据特征信息对待处理数据进行整理与分类,以获取分类数据,并标记分类数据的分类类型,其中,分类数据的分类类型包括任务启动数据、任务运行数据和任务结束数据;对分类数据按照分类类型分别建立分类数据和主题的对应关系,并标记分类数据的对应关系,以获取目标数据。Optionally, in some embodiments of the present application, the above-mentioned classification of the data to be processed by Kafka according to the subject and the producer to obtain the target data includes: obtaining characteristic information of the running state of the task corresponding to the data to be processed; The feature information sorts and classifies the data to be processed to obtain the classification data and mark the classification type of the classification data. Among them, the classification type of the classification data includes task start data, task operation data, and task end data; the classification data is classified according to the classification type. Establish the correspondence between the classification data and the subject, and mark the correspondence between the classification data to obtain the target data.
105、根据生产者和主题对目标数据进行区块划分,以获取多个区块,根据已创建的区划协议链接多个区块,并以链接的多个区块和消费者作为数据储存层。105. Divide the target data into blocks according to the producer and theme to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and use the linked multiple blocks and consumers as the data storage layer.
其中,区划协议用于通过链条将每个区块从后向前有序地链接和指向前一个区块,以及将创建的区块链系统链接到Kafka中,以使Kafka运用到区块链系统中。在消息队列服务器中进行区块划分,以创建区块链系统。以不同的生产者将目标数据划分为不同的区块,一个生产者对应一个区块,以便于根据生产者对区块的数据进行管理。在根据生产者进行 区块划分的基础上再进行根据主题进行区块划分,以不同的主题将目标数据划分为不同的区块,一个主题对应一个区块,以便于根据主题对区块的数据进行管理。将链接的多个区块和消费者作为数据储存层,以便于目标数据的存储和消费者对目标数据的获取,以及将区块链系统链接运用到统一管理网站系统中。通过区块划分对目标数据进行分布式节点存储和处理,有效地处理多集群作业管理系统的并发崩溃问题。Among them, the zoning protocol is used to link each block in an orderly manner from back to front through the chain and point to the previous block, and to link the created blockchain system to Kafka, so that Kafka can be applied to the blockchain system in. Block division in the message queue server to create a blockchain system. Different producers divide the target data into different blocks, and one producer corresponds to one block, so that the data of the block can be managed according to the producer. On the basis of dividing the blocks according to the producer, divide the target data into different blocks with different themes, and divide the target data into different blocks according to different themes, so that the data of the blocks can be divided according to the theme. To manage. Use multiple linked blocks and consumers as the data storage layer to facilitate the storage of target data and the acquisition of target data by consumers, as well as the application of blockchain system links to the unified management website system. Distributed node storage and processing of target data through block division can effectively handle the concurrent crash of multi-cluster job management systems.
106、根据区划协议和数据存储层构建区块链系统,并通过区块链系统按照http的请求方式将目标数据输入至储存库,并触发读取指令。106. Construct a blockchain system according to the zoning protocol and the data storage layer, and input the target data into the repository through the blockchain system according to the HTTP request method, and trigger the read instruction.
其中,Kafka包括储存库,储存库的数量包括多个。区块链系统包括应用层,应用层包括统一管理网站系统。http的请求方式有多种请求方法,为根据不同的方法规定不同的操作指定的资源方式,包括GET请求方法、HEAD请求方法、POST请求方法、PUT请求方法、DELETE请求方法、CONNECT请求方法、OPTIONS请求方法、TRACE请求方法和PATCH请求方法,http的请求方式采用PUT请求方法,以便于将指定的目标数据的最新数据传送至消息队列服务器中的储存库。Kafka中设置多个存储库,以分类地存储目标数据,根据目标数据中的生产者和主题分别存储至对应的储存库,从而以便于对目标数据的管理与获取。通过消息队列服务器中的Kafka系统作为消息队列和结合区块链系统进行分布式数据存储,以多节点并发处理数据的缓存,解耦了系统,减缓同时汇集多个大数据集群的作业记录数据的压力,避免多个大数据集群的作业记录数据同时汇集的拥挤,实现高容错、高速缓存、高效和高吞吐量的处理效果。Among them, Kafka includes repositories, and the number of repositories includes multiple. The blockchain system includes the application layer, and the application layer includes the unified management website system. There are a variety of request methods for http request methods, which specify resource methods for different operations according to different methods, including GET request method, HEAD request method, POST request method, PUT request method, DELETE request method, CONNECT request method, OPTIONS The request method, the TRACE request method and the PATCH request method. The HTTP request method uses the PUT request method to facilitate the transmission of the latest data of the specified target data to the repository in the message queue server. Multiple repositories are set up in Kafka to store target data categorically, and are stored in corresponding repositories according to the producers and topics in the target data, so as to facilitate the management and acquisition of target data. The Kafka system in the message queue server is used as a message queue and combined with the blockchain system for distributed data storage, and the data cache is processed concurrently by multiple nodes, which decouples the system and slows down the collection of job record data from multiple big data clusters at the same time Pressure to avoid the congestion of simultaneous collection of job record data of multiple big data clusters, and achieve high fault tolerance, high-speed caching, high efficiency and high throughput processing effects.
107、当统一管理网站系统接收到读取指令时,通过数据存储层输出储存库中的目标数据,并将目标数据输入至MySQL数据库的缓存区。107. When the unified management website system receives the read instruction, it outputs the target data in the repository through the data storage layer, and inputs the target data into the cache area of the MySQL database.
将Kafka系统中的目标数据读取并存入至统一管理网站系统的MySQL数据库的缓存区。通过统一管理网站系统对Kafka系统进行监听,以及时抓取和存储目标数据;通过将抓取的所述目标数据输入至MySQL数据库的缓存区,以便于后续对目标数据的读取和减缓MySQL数据库的存储压力。Read and store the target data in the Kafka system into the cache area of the MySQL database of the unified management website system. The Kafka system is monitored through the unified management website system, and the target data is captured and stored in a timely manner; the captured target data is input into the cache area of the MySQL database to facilitate subsequent reading of the target data and slow down the MySQL database The storage pressure.
可选的,在本申请的一些实施例中,在将目标数据输入至MySQL数据库的缓存区之前,设置预设数据消费频率,将目标数据按照预设数据消费频率输入至MySQL数据库的缓存区。通过将目标数据按照预设数据消费频率输入至MySQL数据库的缓存区,以使目标数据的输入具有一定的缓冲,从而减缓MySQL数据的存储压力。Optionally, in some embodiments of the present application, before the target data is input into the cache area of the MySQL database, a preset data consumption frequency is set, and the target data is input into the cache area of the MySQL database according to the preset data consumption frequency. By inputting the target data into the cache area of the MySQL database according to the preset data consumption frequency, the input of the target data has a certain buffer, thereby reducing the storage pressure of the MySQL data.
可选的,在本申请的一些实施例中,上述的当统一管理网站系统接收到读取指令时,通过数据存储层输出储存库中的目标数据,并将目标数据输入至MySQL数据库的缓存区,包括:统一管理网站系统调用监听程序脚本,通过监听程序脚本检测区块链系统中的应用层是否接收到读取指令;当检测结果为否时,对区块链系统中的应用层进行再检测;当检测结果为是时,通过消费者按照预设抓取数量对从储存库中的目标数据进行抓取,并将抓取的目标数据添加已消费的标签,以获取标记的目标数据;把标记的目标数据转换为JSON对象,将JSON对象解析成第一数据对象;识别MySQL数据库的第二数据对象中是否存在与第一数据对象相同内容的数据对象;若识别结果为是,则在第一数据对象中删除与第二数据对象存在相同内容的数据对象,以获得第一目标数据对象;获取第一目标数据对象的标签中标记的主题和生产者信息;根据主题和生产者信息,将第一目标数据对象填充至MySQL数据库的缓存区;若识别结果为否时,则获取第一数据对象的标签中标记的主题和生产者信息;根据主题和生产者信息,将第一数据对象填充至MySQL数据库的缓存区。通过统一管理网站系统对Kafka监听是否接收更新的目标数据,以降低数据重复抓取和存储的风险;通过目标数据进行对象转换,以便于将目标数据存储在MySQL数据库;通过根据主题和生产者信息将分目标数据分别填充至MySQL数据库设置的多个缓存区,以便于对数据的分类 管理和获取,综合所述,以提高多集群作业管理系统对作业记录数据的管理效率。Optionally, in some embodiments of the present application, when the above-mentioned unified management website system receives a read instruction, it outputs the target data in the repository through the data storage layer, and inputs the target data into the cache area of the MySQL database. , Including: the unified management website system calls the listener script, through the listener script to detect whether the application layer in the blockchain system has received a read instruction; when the detection result is no, the application layer in the blockchain system is re- Detection; when the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed tag to obtain the marked target data; Convert the marked target data into a JSON object, and parse the JSON object into the first data object; identify whether there is a data object with the same content as the first data object in the second data object of the MySQL database; if the identification result is yes, then Delete the data object with the same content as the second data object from the first data object to obtain the first target data object; obtain the subject and producer information marked in the label of the first target data object; according to the subject and producer information, Fill the first target data object into the cache area of the MySQL database; if the recognition result is no, obtain the subject and producer information marked in the label of the first data object; according to the subject and producer information, the first data object Fill to the cache area of the MySQL database. Through the unified management of the website system, Kafka monitors whether to receive updated target data to reduce the risk of repeated data capture and storage; object conversion through the target data, so that the target data can be stored in the MySQL database; through the theme and producer information The sub-target data are respectively filled into the multiple buffer areas set in the MySQL database to facilitate the classified management and acquisition of the data. In summary, the multi-cluster job management system can improve the management efficiency of the job record data.
可选的,在本申请的一些实施例中,上述的将目标数据输入至MySQL数据库的缓存区之后,本申请的方法还包括:发送启动指令给已设置的隐藏系统,隐藏系统接收启动指令,启动隐藏协议,其中,隐藏系统包括隐藏协议,隐藏协议包括涉及故障、破坏删除、人类伦理和道德的协议;当隐藏系统检测到输入的信息有悖于隐藏协议,将MySQL数据库的数据复制备份在隐藏系统中,隐藏系统进入认证状态,其中,信息包括故障指令、破坏删除指令和带有木马程序的文件;当进入认证状态的隐藏系统检测到输入的访问请求具备管理权限时,输出密码输入请求;当进入认证状态的隐藏系统检测到输入的密码信息正确,以及检测到输入次数未到限定值时,接受访问请求;当进入认证状态的隐藏系统检测在输入次数已达到限定值时,不接受访问请求,并对复制备份的数据进行永久封存。通过对MySQL数据库存储的目标数据进行复制备份在隐藏系统中,并设置隐藏协议,以防止在设备发生故障或黑客入侵或遭到破坏删除时能够获取到目标数据的源数据和保证目标数据的源数据的安全,进而提高多集群作业管理系统的安全性和可用性。Optionally, in some embodiments of the present application, after the above-mentioned target data is entered into the cache area of the MySQL database, the method of the present application further includes: sending a startup instruction to the hidden system that has been set up, and the hidden system receives the startup instruction, Start the hidden protocol. The hidden system includes the hidden protocol, and the hidden protocol includes the protocol involving failure, destruction and deletion, human ethics and morality; when the hidden system detects that the input information is contrary to the hidden protocol, the data in the MySQL database is copied and backed up In the hidden system, the hidden system enters the authentication state, where the information includes fault instructions, destruction and deletion instructions, and files with Trojan horse programs; when the hidden system entering the authentication state detects that the input access request has management authority, it outputs a password input request ; When the hidden system that enters the authentication state detects that the entered password information is correct and that the number of inputs has not reached the limit, it accepts the access request; when the hidden system that enters the authentication state detects that the number of inputs has reached the limit, it does not accept Access requests, and permanently archive the copied and backed-up data. By copying and backing up the target data stored in the MySQL database in a hidden system, and setting a hidden protocol to prevent the source data of the target data from being obtained and the source of the target data guaranteed when the device fails or is hacked or destroyed or deleted The security of data further improves the security and availability of the multi-cluster job management system.
108、将缓存区中的目标数据转换为超文本标记语言数据,并将超文本标记语言数据写入至已构建的静态超文本标记语言页面文件。108. Convert the target data in the cache area into hypertext markup language data, and write the hypertext markup language data into the constructed static hypertext markup language page file.
将缓存区中的目标数据转换写入至已构建的静态超文本标记语言页面文件,通过对保存在MySQL数据库的缓存区的目标数据进行超文本标记语言静态化处理,以增加访问速度和运行速度,以及减轻服务器负担,从而有效解决多集群作业管理系统并发崩溃的问题。Convert the target data in the cache area and write it into the constructed static hypertext markup language page file, and perform hypertext markup language static processing on the target data stored in the cache area of the MySQL database to increase the access speed and running speed , And reduce the burden on the server, thereby effectively solving the problem of concurrent collapse of the multi-cluster job management system.
可选的,在本申请的一些实施例中,上述的将缓存区中的目标数据转换为超文本标记语言数据之前,本申请的方法还包括:检测MySQL数据库中的数据库事务是否处于执行状态;若是,则获取缓存区的目标数据的初始数据,并通过Locktable语句锁定MySQL数据库,将后续输入至MySQL数据库的缓存区的目标数据的更新数据添加至初始数据,其中,Locktable语句包括具备WRITE关键字的Locktable语句;获取缓存区的目标数据中具备预设字段的数据,并获取具备预设字段的数据的字段大小,其中,预设字段包括用于Join、Where判断和Orderby排序的字段,以及用于MAX()命令、MIN()命令和Orderby命令的字段;根据具备预设字段的数据和具备预设字段的数据的字段大小,按照预设规则创建索引,其中,预设规则包括对相同字段大小的目标数据进行创建索引和对包含重复值不超过第五预设阈值的目标数据进行创建索引;检测MySQL数据库中的数据表的类型是否定义为InnoDB类型;若否,则将类型不是InnoDB类型的数据表中的Createtable语句中加上TYPE=INNODB,以获取InnoDB类型表;若是,则获取类型为InnoDB类型的数据表,并将类型为InnoDB类型的数据表作为InnoDB类型表;通过alter table命令对InnoDB类型表创建外键。通过结合创建锁定表、使用外键和创建索引,以对MySQL数据库进行优化,在维护目标数据的完整性和保证目标数据的关联性上提高数据库性能,达到释放系统的存储数据库和减缓数据库的存储压力,为系统的并发处理提供空间以及速度支撑,从而实现有效防止和处理多集群作业管理系统并发崩溃的问题。Optionally, in some embodiments of the present application, before converting the target data in the cache area into hypertext markup language data, the method of the present application further includes: detecting whether the database transaction in the MySQL database is in an executing state; If yes, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and add the updated data of the target data subsequently input to the cache area of the MySQL database to the initial data, where the Locktable statement includes the WRITE keyword Locktable statement; get the data with preset fields in the target data of the buffer area, and get the field size of the data with preset fields. The preset fields include fields for Join, Where judgment and Orderby sorting, and use In the fields of the MAX() command, MIN() command and Orderby command; according to the field size of the data with the preset field and the field size of the data with the preset field, the index is created according to the preset rule, where the preset rule includes the same field Create an index for target data of a large size and create an index for target data that contains duplicate values that do not exceed the fifth preset threshold; check whether the type of the data table in the MySQL database is defined as the InnoDB type; if not, the type is not the InnoDB type Add TYPE=INNODB to the Createtable statement in the data table to obtain the InnoDB type table; if so, obtain the InnoDB type data table, and use the InnoDB type data table as the InnoDB type table; pass the alter table command Create foreign keys for InnoDB type tables. By combining the creation of locked tables, the use of foreign keys, and the creation of indexes, the MySQL database is optimized to improve database performance in terms of maintaining the integrity of the target data and ensuring the relevance of the target data, so as to release the storage database of the system and slow down the storage of the database Pressure provides space and speed support for the concurrent processing of the system, so as to effectively prevent and deal with the problem of concurrent crashes of the multi-cluster job management system.
可选的,在本申请的一些实施例中,上述的将超文本标记语言数据写入至已构建的静态超文本标记语言页面文件之后,本申请的方法还包括:当统一管理网站系统识别到用户输入的登录请求正确时,接受登录请求;当统一管理网站中的服务器接收到用户输入的查询请求时,获取查询请求的特征信息;将特征信息转换为检索语句,通过检索语句对MySQL数据库中的数据进行筛选,以获得与查询请求对应的数据;对查询请求对应的数据进行统计与分析,生成并输出可视化图表。通过根据用户需要输出对应的可视化图表,以便于用户对作业记录数据的读取,以提高多集群作业管理系统的的可用性。Optionally, in some embodiments of the present application, after the above-mentioned hypertext markup language data is written into the constructed static hypertext markup language page file, the method of the present application further includes: when the unified management website system recognizes When the login request entered by the user is correct, the login request is accepted; when the server in the unified management website receives the query request entered by the user, it obtains the characteristic information of the query request; converts the characteristic information into a search statement, and then uses the search statement to check the MySQL database. Filter the data to obtain the data corresponding to the query request; perform statistics and analysis on the data corresponding to the query request, and generate and output visual charts. By outputting corresponding visual charts according to user needs, it is convenient for users to read the job record data, so as to improve the usability of the multi-cluster job management system.
与现有机制相比,本申请实施例中,一方面,解耦了系统,减缓同时汇集多个大数据 集群的作业记录数据的压力,避免多个大数据集群的作业记录数据同时汇集的拥挤,实现高容错、高速缓存、高效和高吞吐量的处理效果;另一方面,增加访问速度和运行速度,以及减轻服务器的负荷;综合上述,本申请可实现低成本、高效率、高准确性和多方位的处理系统并发崩溃问题的效果,因而,本申请能够有效防止和处理多集群作业管理系统并发崩溃的问题。Compared with the existing mechanism, the embodiments of this application, on the one hand, decouple the system, alleviate the pressure of collecting job record data of multiple big data clusters at the same time, and avoid the congestion of the job record data of multiple big data clusters at the same time. , To achieve high fault tolerance, high-speed cache, high efficiency and high throughput processing effect; on the other hand, to increase access speed and operating speed, and reduce the load of the server; combined with the above, this application can achieve low cost, high efficiency, high accuracy With the effect of the concurrent crash of the multi-directional processing system, the present application can effectively prevent and deal with the concurrent crash of the multi-cluster job management system.
上述图1对应的实施例或图1对应的实施例中的任一可选实施例或可选实施方式中所提及的技术特征也同样适用于本申请中的图2和图3所对应的实施例,后续类似之处不再赘述。The technical features mentioned in any optional embodiment or optional implementation in the embodiment corresponding to FIG. 1 or the embodiment corresponding to FIG. 1 are also applicable to those corresponding to FIG. 2 and FIG. 3 in this application. In the embodiment, the similarities will not be repeated in the following.
以上对本申请中一种处理多集群作业记录的方法进行说明,以下对执行上述处理多集群作业记录的方法的装置进行描述。The foregoing describes a method for processing multi-cluster job records in the present application, and the following describes a device that executes the foregoing method for processing multi-cluster job records.
如图2所示的一种用于处理多集群作业记录的装置20的结构示意图,其可应用于企业多集群作业管理平台,对多个大数据集群生成的作业运行记录进行管理与查询。本申请实施例中的装置20能够实现对应于上述图1对应的实施例或图1对应的实施例中的任一可选实施例或可选实施方式中所执行的处理多集群作业记录的方法的步骤。装置20实现的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块,模块可以是软件和/或硬件。装置20可包括收发模块201、检测模块202、调用模块203、分类模块204、划分模块205、构建模块206和接收模块207,收发模块201、检测模块202、调用模块203、分类模块204、划分模块205、构建模块206和接收模块207的功能实现可参考图1对应的实施例或图1对应的实施例中的任一可选实施例或可选实施方式中所执行的操作,此处不作赘述。检测模块202可用于控制收发模块201的收发操作,分类模块204可用于控制检测模块202的获取操作和调用模块203的创建操作,划分模块205可用于控制调用模块203的创建操作和分类模块204的获取操作,构建模块206可用于控制划分模块205的获取操作,接收模块207可用于控制构建模块206的触发操作和输入操作。FIG. 2 shows a schematic structural diagram of a device 20 for processing multi-cluster job records, which can be applied to an enterprise multi-cluster job management platform to manage and query job operation records generated by multiple big data clusters. The apparatus 20 in the embodiment of the present application can implement the method for processing multi-cluster job records executed in the embodiment corresponding to FIG. 1 or any optional embodiment or optional implementation in the embodiment corresponding to FIG. 1 A step of. The functions implemented by the device 20 can be implemented by hardware, or can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-mentioned functions, and the modules may be software and/or hardware. The device 20 may include a transceiver module 201, a detection module 202, a calling module 203, a classification module 204, a division module 205, a construction module 206, and a receiving module 207, the transceiver module 201, the detection module 202, the calling module 203, the classification module 204, and the division module 205. The function implementation of the building module 206 and the receiving module 207 may refer to the operation performed in the embodiment corresponding to FIG. 1 or any optional embodiment or optional implementation manner in the embodiment corresponding to FIG. 1, which will not be repeated here. . The detection module 202 can be used to control the transceiving operation of the transceiving module 201, the classification module 204 can be used to control the acquisition operation of the detection module 202 and the creation operation of the calling module 203, and the division module 205 can be used to control the creation operation of the calling module 203 and the creation operation of the classification module 204. For obtaining operation, the construction module 206 can be used to control the obtaining operation of the division module 205, and the receiving module 207 can be used to control the trigger operation and input operation of the construction module 206.
一些实施方式中,收发模块201,用于接收多个集群运行任务生成的作业记录数据;检测模块202,用于检测任务的运行状态,当检测到运行状态为预设触发点时,向已创建的触发器发送触发指令,触发器接收触发指令,将收发模块201接收的作业记录数据的数据格式转换为JSON格式,以获取待处理数据;调用模型203,用于调用消息队列服务系统中的分布式消息系统Kafka,当Kafka接收到主题创建命令时,调用主题创建脚本,并通过主题创建脚本创建主题;通过Kafka根据待处理数据对应的集群创建生产者,并通过Kafka根据统一管理网站系统创建消费者;分类模块204,用于将检测模块202获取的待处理数据输入至Kafka,并通过Kafka根据调用模型203创建的主题和生产者对待处理数据进行分类,以获取目标数据;划分模块205,用于根据调用模块203创建的生产者和调用模块203创建的主题对分类模块204获取的目标数据进行区块划分,以获取多个区块,根据已创建的区划协议链接多个区块,并以链接的多个区块和消费者作为数据储存层;构建模块206,用于根据区划协议和划分模块205获取的数据存储层构建区块链系统,并通过区块链系统按照http的请求方式将目标数据输入至储存库,并触发读取指令;接收模块207,用于当统一管理网站系统接收到构建模块206触发的读取指令时,通过数据存储层输出构建模块206输入储存库中的目标数据,并将目标数据输入至MySQL数据库的缓存区;将缓存区中的目标数据转换为超文本标记语言数据,通过输出控制函数控制缓存区以获取超文本标记语言数据,并通过已创建的读写函数将超文本标记语言数据输入至已构建的静态超文本标记语言页面文件。In some embodiments, the transceiver module 201 is used to receive job record data generated by multiple cluster running tasks; the detection module 202 is used to detect the running status of the task. When the running status is detected as a preset trigger point, the The trigger sends the trigger instruction, the trigger receives the trigger instruction, and converts the data format of the job record data received by the transceiver module 201 into JSON format to obtain the data to be processed; the call model 203 is used to call the distribution in the message queue service system Message system Kafka, when Kafka receives a topic creation command, it calls the topic creation script, and creates the topic through the topic creation script; Kafka creates a producer based on the cluster corresponding to the data to be processed, and Kafka creates consumption based on the unified management website system The classification module 204 is used to input the to-be-processed data obtained by the detection module 202 into Kafka, and to classify the data to be processed by Kafka according to the topic created by the call model 203 and the producer to obtain the target data; the classification module 205 uses According to the producer created by the calling module 203 and the theme created by the calling module 203, the target data obtained by the classification module 204 is divided into blocks to obtain multiple blocks, and the multiple blocks are linked according to the created zoning protocol, and Multiple linked blocks and consumers are used as the data storage layer; the construction module 206 is used to construct a blockchain system according to the zoning protocol and the data storage layer obtained by the division module 205, and use the blockchain system to transfer the data according to the HTTP request. The target data is input to the repository, and the reading instruction is triggered; the receiving module 207 is used to input the target in the repository through the data storage layer output building module 206 when the unified management website system receives the reading instruction triggered by the building module 206 Data, and input the target data into the buffer area of the MySQL database; convert the target data in the buffer area into hypertext markup language data, control the buffer area through the output control function to obtain the hypertext markup language data, and pass the created read The write function inputs the hypertext markup language data into the constructed static hypertext markup language page file.
其中,预设触发点包括多个集群运行任务时的启动或暂停或结束的运行状态;区划协议用于通过链条将每个区块从后向前有序地链接和指向前一个区块,以及将创建的区块链系统链接到Kafka中,以使Kafka运用到区块链系统中;Kafka包括储存库,储存库的数量包括多个。Among them, the preset trigger point includes the start or pause or end of the running state of multiple cluster running tasks; the zoning protocol is used to link each block from back to front through the chain and point to the previous block in an orderly manner, and Link the created blockchain system to Kafka so that Kafka can be used in the blockchain system; Kafka includes repositories, and the number of repositories includes multiple.
可选的,上述分类模块204还用于:获取事件的顺序关联度,以及获取事件的吞吐量,以及识别事件的实体类型,并获取实体类型之间的关联度,其中,实体类型用于一个地址对应一个用户;根据顺序关联度、吞吐量和关联度,按照预设归类策略将待处理数据归类到主题,以获得第一分类数据,其中,预设归类策略包括将满足顺序关联度大于第一预设阈值、吞吐量小于第二预设阈值和关联度大于第三预设阈值中的至少一个条件的待处理数据归类到同一个主题;标记第一分类数据,其中,标记的内容包括待处理数据对应的顺序关联度、吞吐量、实体类型、实体类型之间的关联度和主题的名称;在标记的第一分类数据中根据生产者的类型进行分类,并标记该标记的第一分类数据的生产者的类型,以获取目标数据。Optionally, the above-mentioned classification module 204 is also used to: obtain the sequence association degree of the events, obtain the throughput of the event, and identify the entity type of the event, and obtain the association degree between the entity types, where the entity type is used for one The address corresponds to a user; according to the order of relevance, throughput and relevance, the data to be processed is classified into topics according to the preset classification strategy to obtain the first classification data, where the preset classification strategy includes meeting the order of association At least one of the data to be processed with a degree greater than the first preset threshold, throughput less than the second preset threshold, and relevance greater than the third preset threshold is classified into the same topic; marking the first classification data, where the mark The content includes the order correlation degree, throughput, entity type, the correlation degree between entity types and the name of the topic corresponding to the data to be processed; the first classification data of the mark is classified according to the type of producer, and the mark is marked The type of the producer of the first classification data to obtain the target data.
可选的,上述分类模块204还用于:对经过分类的待处理数据进行初始化处理,以及根据经过分类的待处理数据的分类类型设置线性散列表的长度;获取经过分类的待处理数据的关键码值,计算经过分类的待处理数据的数据项的词频-逆文本频率指数TF-IDF值,获取TF-IDF值大于第四预设阈值的数据项对应的目标关键码值,其中,待处理数据包括数据项;将目标关键码值被不大于线性散列表的长度的数值除后所得的余数作为线性散列表的地址,以目标关键码值作为线性散列表的表头,以线性散列表的地址作为线性散列表的数量,以获取线性散列表;随机生成预设数量的相同长度的字符串,通过预设字符串函数对线性散列表进行统计与分析,以获取散列分布性信息和平均桶长信息,其中,散列分布性信息包括桶的使用率,平均桶长信息包括所有已使用桶的平均长度;判断散列分布性信息是否满足第一预设条件,以及判断平均桶长信息满足第二预设条件,其中,第一预设条件包括已使用桶的数量与总的桶的数量的比例值为第一预设范围值,第二预设条件包括所有已使用桶的平均长度的值为第二预设范围值;若判断结果均为是,则以判断结果均为是对应的线性散列表作为最终的线性散列表;将目标关键码值填充至最终的线性散列表,并以链表形式输出最终的线性散列表,以获取目标数据。Optionally, the above-mentioned classification module 204 is further configured to: initialize the classified data to be processed, and set the length of the linear hash table according to the classification type of the classified data to be processed; the key to obtaining the classified data to be processed Code value, calculate the word frequency-inverse text frequency index TF-IDF value of the classified data item to be processed, and obtain the target key code value corresponding to the data item whose TF-IDF value is greater than the fourth preset threshold, wherein, to be processed The data includes data items; the remainder obtained by dividing the target key value by a value not greater than the length of the linear hash table is used as the address of the linear hash table, and the target key value is used as the head of the linear hash table, and the value of the linear hash table is used as the head of the linear hash table. The address is used as the number of the linear hash table to obtain the linear hash table; a preset number of strings of the same length are randomly generated, and the linear hash table is counted and analyzed through the preset string function to obtain hash distribution information and average Bucket length information, where the hash distribution information includes the usage rate of the bucket, and the average bucket length information includes the average length of all used buckets; determine whether the hash distribution information meets the first preset condition, and determine the average bucket length information The second preset condition is satisfied, where the first preset condition includes that the ratio of the number of used barrels to the total number of barrels is a first preset range value, and the second preset condition includes the average length of all used barrels The value of is the second preset range value; if the judgment result is all yes, the linear hash table corresponding to the judgment result is the final linear hash table; the target key code value is filled into the final linear hash table, and Output the final linear hash table in the form of a linked list to obtain the target data.
可选的,上述分类模块204还用于:对待处理数据进行数据压缩;判断传输通道的传输状态是否正常;若判断结果为是,则将经过数据压缩的待处理数据输入至Kafka,并将输入至Kafka的待处理数据标记为已发送;若判断结果为否,则将经过数据压缩的待处理数据输入至第一MySQL数据库,并将输入至第一MySQL数据库的待处理数据标记为未发送;调用已创建的轮询脚本,通过轮询脚本按照预设时间对第一MySQL数据库进行轮询检测;当轮询检测到第一MySQL数据库存在标记未发送的待处理数据,以及轮询检测到传输通道的传输状态正常时,将标记为未发送的待处理数据输入至第一MySQL数据库;轮询检测第一MySQL数据库是否接收到标记为未发送的待处理数据;若检测结果为是,则将标记为未发送的待处理数据中的未发送标记替换为已发送标记;若检测结果为否,则不更新标记为未发送的待处理数据中的未发送标记。Optionally, the above-mentioned classification module 204 is further used to: perform data compression on the data to be processed; determine whether the transmission status of the transmission channel is normal; if the determination result is yes, input the data to be processed after data compression into Kafka, and input The data to be processed to Kafka is marked as sent; if the judgment result is no, the data to be processed after data compression is input to the first MySQL database, and the data to be processed input to the first MySQL database is marked as unsent; Call the created polling script, and poll the first MySQL database according to the preset time through the polling script; when the polling detects that the first MySQL database has pending data that has not been sent, and the polling detects the transmission When the transmission status of the channel is normal, enter the pending data marked as unsent into the first MySQL database; poll to detect whether the first MySQL database receives pending data marked as unsent; if the detection result is yes, then The unsent mark in the pending data marked as unsent is replaced with the sent mark; if the detection result is no, the unsent mark in the pending data marked as unsent is not updated.
可选的,上述接收模块207还用于:统一管理网站系统调用监听程序脚本,通过监听程序脚本检测区块链系统中的应用层是否接收到读取指令;当检测结果为否时,对区块链系统中的应用层进行再检测;当检测结果为是时,通过消费者按照预设抓取数量对从储存库中的目标数据进行抓取,并将抓取的目标数据添加已消费的标签,以获取标记的目标数据;把标记的目标数据转换为JSON对象,将JSON对象解析成第一数据对象;识别MySQL数据库的第二数据对象中是否存在与第一数据对象相同内容的数据对象;若识别结果为是, 则在第一数据对象中删除与第二数据对象存在相同内容的数据对象,以获得第一目标数据对象;获取第一目标数据对象的标签中标记的主题和生产者信息;根据主题和生产者信息,将第一目标数据对象填充至MySQL数据库的缓存区;若识别结果为否时,则获取第一数据对象的标签中标记的主题和生产者信息;根据主题和生产者信息,将第一数据对象填充至MySQL数据库的缓存区。Optionally, the above-mentioned receiving module 207 is also used to: uniformly manage the website system to call the listener program script, and use the listener program script to detect whether the application layer in the blockchain system receives the read instruction; when the detection result is no, the district The application layer in the blockchain system performs re-detection; when the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed Label to obtain the marked target data; convert the marked target data into a JSON object, and parse the JSON object into a first data object; identify whether there is a data object with the same content as the first data object in the second data object of the MySQL database ; If the recognition result is yes, delete the data object that has the same content as the second data object in the first data object to obtain the first target data object; obtain the subject and producer marked in the label of the first target data object Information; according to the subject and producer information, the first target data object is filled into the MySQL database cache; if the recognition result is no, then the subject and producer information marked in the label of the first data object are obtained; according to the subject and Producer information, fill the first data object into the cache area of the MySQL database.
可选的,上述接收模块207还用于:检测MySQL数据库中的数据库事务是否处于执行状态;若是,则获取缓存区的目标数据的初始数据,并通过Locktable语句锁定MySQL数据库,将后续输入至MySQL数据库的缓存区的目标数据的更新数据添加至初始数据,其中,Locktable语句包括具备WRITE关键字的Locktable语句;获取缓存区的目标数据中具备预设字段的数据,并获取具备预设字段的数据的字段大小,其中,预设字段包括用于Join、Where判断和Orderby排序的字段,以及用于MAX()命令、MIN()命令和Orderby命令的字段;根据具备预设字段的数据和具备预设字段的数据的字段大小,按照预设规则创建索引,其中,预设规则包括对相同字段大小的目标数据进行创建索引和对包含重复值不超过第五预设阈值的目标数据进行创建索引;检测MySQL数据库中的数据表的类型是否定义为InnoDB类型;若否,则将类型不是InnoDB类型的数据表中的Createtable语句中加上TYPE=INNODB,以获取InnoDB类型表;若是,则获取类型为InnoDB类型的数据表,并将类型为InnoDB类型的数据表作为InnoDB类型表;通过alter table命令对InnoDB类型表创建外键。Optionally, the above-mentioned receiving module 207 is also used to: detect whether the database transaction in the MySQL database is in an executing state; if so, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and input the subsequent input to MySQL The updated data of the target data in the cache area of the database is added to the initial data. Among them, the Locktable statement includes a Locktable statement with the WRITE keyword; to obtain data with preset fields in the target data of the cache area, and to obtain data with preset fields The preset fields include fields for Join, Where judgment and Orderby sorting, as well as fields for MAX() command, MIN() command and Orderby command; according to the data with preset fields and the pre-defined fields Set the field size of the field data and create an index according to a preset rule, where the preset rule includes indexing target data of the same field size and indexing target data that contains repeated values that do not exceed a fifth preset threshold; Check whether the type of the data table in the MySQL database is defined as the InnoDB type; if not, add TYPE=INNODB to the Createtable statement in the data table whose type is not the InnoDB type to obtain the InnoDB type table; if it is, the type is InnoDB type data table, and the type of InnoDB type data table as InnoDB type table; use the alter table command to create a foreign key to the InnoDB type table.
可选的,上述分类模块204还用于:获取待处理数据对应的任务的运行状态的特征信息;根据特征信息对待处理数据进行整理与分类,以获取分类数据,并标记分类数据的分类类型,其中,分类数据的分类类型包括任务启动数据、任务运行数据和任务结束数据;对分类数据按照分类类型分别建立分类数据和主题的对应关系,并标记分类数据的对应关系,以获取目标数据。Optionally, the above-mentioned classification module 204 is further configured to: obtain characteristic information of the running state of the task corresponding to the data to be processed; sort and classify the data to be processed according to the characteristic information to obtain the classification data and mark the classification type of the classification data, Among them, the classification types of the classification data include task start data, task operation data, and task end data; the classification data are respectively established corresponding to the classification data and the theme according to the classification type, and the corresponding relationship of the classification data is marked to obtain the target data.
本申请实施例中,一方面,解耦了系统,减缓同时汇集多个大数据集群的作业记录数据的压力,避免多个大数据集群的作业记录数据同时汇集的拥挤,实现高容错、高速缓存、高效和高吞吐量的处理效果;另一方面,增加访问速度和运行速度,以及减轻服务器的负荷;综合上述,本申请可实现低成本、高效率、高准确性和多方位的处理系统并发崩溃问题的效果,因而,本申请能够有效防止和处理多集群作业管理系统并发崩溃的问题。In the embodiments of this application, on the one hand, the system is decoupled, reducing the pressure of collecting job record data of multiple big data clusters at the same time, avoiding the congestion of collecting job record data of multiple big data clusters at the same time, and achieving high fault tolerance and high-speed caching. , High-efficiency and high-throughput processing effect; on the other hand, increase the access speed and operating speed, and reduce the load of the server; combined with the above, this application can achieve low-cost, high-efficiency, high-accuracy and multi-directional processing system concurrency The effect of the crash problem, therefore, the present application can effectively prevent and deal with the problem of concurrent crashes of the multi-cluster job management system.
可选的,在本申请的一些实施方式中,上述处理多集群作业记录的方法的任一实施例或实施方式中所提及的技术特征也同样适用于本申请中的对执行上述处理多集群作业记录的方法的装置20,后续类似之处不再赘述。Optionally, in some embodiments of the present application, the technical features mentioned in any embodiment or implementation of the method for processing multi-cluster job records are also applicable to the above-mentioned processing multi-cluster in this application. For the device 20 of the method of job recording, the similarities will not be repeated here.
上面从模块化功能实体的角度分别介绍了本申请实施例中的装置20,以下从硬件角度介绍一种计算机装置,如图3所示,其包括:处理器、存储器、收发器(也可以是输入输出单元,图3中未标识出)以及存储在所述存储器中并可在所述处理器上运行的计算机程序。例如,该计算机程序可以为图1对应的实施例或图1对应的实施例中的任一可选实施例或可选实施方式中处理多集群作业记录的方法对应的程序。例如,当计算机装置实现如图2所示的装置20的功能时,所述处理器执行所述计算机程序时实现上述图2所对应的实施例中由装置20执行的处理多集群作业记录的方法中的各步骤;或者,所述处理器执行所述计算机程序时实现上述图2所对应的实施例的装置20中各模块的功能。又例如,该计算机程序可以为图1对应的实施例或图1对应的实施例中的任一可选实施例或可选实施方式的方法对应的程序。The device 20 in the embodiment of the present application is described above from the perspective of modular functional entities. The following describes a computer device from the perspective of hardware, as shown in FIG. 3, which includes: a processor, a memory, a transceiver (or An input and output unit (not identified in FIG. 3) and a computer program stored in the memory and running on the processor. For example, the computer program may be a program corresponding to the method of processing multi-cluster job records in the embodiment corresponding to FIG. 1 or any optional embodiment in the embodiment corresponding to FIG. 1 or the optional implementation manner. For example, when the computer device implements the function of the device 20 shown in FIG. 2, the processor executes the computer program to implement the method for processing multi-cluster job records executed by the device 20 in the embodiment corresponding to FIG. 2 Or, when the processor executes the computer program, the function of each module in the apparatus 20 of the embodiment corresponding to FIG. 2 is realized. For another example, the computer program may be a program corresponding to the method in the embodiment corresponding to FIG. 1 or any optional embodiment in the embodiment corresponding to FIG. 1 or the optional implementation manner.
所称处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通 用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,所述处理器是所述计算机装置的控制中心,利用各种接口和线路连接整个计算机装置的各个部分。The so-called processor can be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc. The processor is the control center of the computer device, and various interfaces and lines are used to connect various parts of the entire computer device.
所述存储器可用于存储所述计算机程序和/或模块,所述处理器通过运行或执行存储在所述存储器内的计算机程序和/或模块,以及调用存储在存储器内的数据,实现所述计算机装置的各种功能。所述存储器可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如获取多个集群运行任务生成的作业记录数据等)等;存储数据区可存储根据手机的使用所创建的数据(比如根据生产者和主题对目标数据进行区块划分,以获取多个区块等)等。此外,存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory may be used to store the computer program and/or module, and the processor implements the computer by running or executing the computer program and/or module stored in the memory and calling data stored in the memory. Various functions of the device. The memory may mainly include a storage program area and a storage data area, where the storage program area can store an operating system, an application program required by at least one function (such as obtaining job record data generated by multiple cluster running tasks, etc.), etc.; storage The data area can store the data created according to the use of the mobile phone (for example, divide the target data into blocks according to the producer and theme to obtain multiple blocks, etc.) and so on. In addition, the memory can include high-speed random access memory, and can also include non-volatile memory, such as hard disks, memory, plug-in hard disks, smart media cards (SMC), and secure digital (SD) cards. , Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
所述收发器也可以用接收器和发送器代替,可以为相同或者不同的物理实体。为相同的物理实体时,可以统称为收发器。该收发器可以为输入输出单元。图2中的收发模块201对应的实体设备可以为图3中的收发器,图2中的检测模块202、调用模块203、分类模块204、划分模块205、构建模块206和接收模块207对应的实体设备可以为图3中的处理器。The transceiver can also be replaced by a receiver and a transmitter, and can be the same or different physical entities. When they are the same physical entity, they can be collectively referred to as transceivers. The transceiver can be an input and output unit. The entity device corresponding to the transceiver module 201 in FIG. 2 may be the transceiver in FIG. 3, and the entity corresponding to the detection module 202, the calling module 203, the classification module 204, the division module 205, the construction module 206, and the receiving module 207 in FIG. 2 The device may be the processor in FIG. 3.
所述存储器可以集成在所述处理器中,也可以与所述处理器分开设置。The memory may be integrated in the processor, or may be provided separately from the processor.
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,也可以为易失性计算机可读存储介质。计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:
获取多个集群运行任务生成的作业记录数据,以及检测任务的运行状态,当检测到运行状态为预设触发点时,向已创建的触发器发送触发指令,触发器接收触发指令,将作业记录数据的数据格式转换为JSON格式,以获取待处理数据,其中,预设触发点包括多个集群运行任务时的启动或暂停或结束的运行状态;调用消息队列服务系统中的分布式消息系统Kafka的接口,当Kafka的接口接收到主题创建命令时,调用主题创建脚本,并通过主题创建脚本创建主题;通过Kafka根据待处理数据对应的集群创建生产者,并通过Kafka根据统一管理网站系统创建消费者;将待处理数据输入至Kafka,并通过Kafka根据主题和生产者对待处理数据进行分类,以获取目标数据;根据生产者和主题对目标数据进行区块划分,以获取多个区块,根据已创建的区划协议链接多个区块,并以链接的多个区块和消费者作为数据储存层,其中,区划协议用于通过链条将每个区块从后向前有序地链接和指向前一个区块,以及将创建的区块链系统链接到Kafka中,以使Kafka运用到区块链系统中;根据区划协议和数据存储层构建区块链系统,并通过区块链系统按照http的请求方式将目标数据输入至储存库,并触发读取指令,其中,Kafka包括储存库,储存库的数量包括多个;当统一管理网站系统接收到读取指令时,通过数据存储层输出储存库中的目标数据,并将目标数据输入至MySQL数据库的缓存区;将缓存区中的目标数据转换为超文本标记语言数据,并将超文本标记语言数据写入至已构建的静态超文本标记语言页面文件。Obtain the job record data generated by multiple cluster running tasks and the running status of the detection task. When the running status is detected as the preset trigger point, the trigger command is sent to the created trigger, and the trigger receives the trigger command and records the job The data format is converted to JSON format to obtain the data to be processed. The preset trigger point includes the start, pause, or end operating status of multiple cluster running tasks; call the distributed messaging system Kafka in the message queue service system When the Kafka interface receives the topic creation command, the topic creation script is called, and the topic creation script is used to create the topic; Kafka creates a producer based on the cluster corresponding to the data to be processed, and Kafka creates consumption based on the unified management website system Enter the data to be processed into Kafka, and classify the data to be processed by Kafka according to the subject and producer to obtain the target data; divide the target data into blocks according to the producer and the subject to obtain multiple blocks, according to The created zoning protocol links multiple blocks, and uses the linked multiple blocks and consumers as the data storage layer. Among them, the zoning protocol is used to link and point each block in an orderly manner from back to front through the chain The previous block, and link the created blockchain system to Kafka, so that Kafka can be used in the blockchain system; build the blockchain system according to the zoning protocol and data storage layer, and use the blockchain system to follow http Input the target data into the repository and trigger the read instruction. Among them, Kafka includes the repository, and the number of repositories includes multiple; when the unified management website system receives the read instruction, it outputs the storage through the data storage layer Target data in the library, and enter the target data into the cache area of the MySQL database; convert the target data in the cache area into hypertext markup language data, and write the hypertext markup language data into the constructed static hypertext markup Language page file.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如 ROM/RAM)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM), including Several instructions are used to make a terminal (which can be a mobile phone, a computer, a server, or a network device, etc.) execute the methods described in the various embodiments of the present application.

Claims (20)

  1. 一种处理多集群作业记录的方法,包括消息队列服务系统和统一管理网站系统,所述方法包括:A method for processing multi-cluster job records includes a message queue service system and a unified management website system. The method includes:
    获取多个集群运行任务生成的作业记录数据,以及检测所述任务的运行状态,当检测到所述运行状态为预设触发点时,向已创建的触发器发送触发指令,所述触发器接收所述触发指令,将所述作业记录数据的数据格式转换为JSON格式,以获取待处理数据,其中,所述预设触发点包括多个所述集群运行任务时的启动或暂停或结束的运行状态;Obtain job record data generated by multiple cluster running tasks, and detect the running status of the task. When it is detected that the running status is a preset trigger point, a trigger instruction is sent to the created trigger, and the trigger receives The trigger instruction converts the data format of the job record data into a JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of operations that start, pause, or end when the cluster runs tasks status;
    调用所述消息队列服务系统中的分布式消息系统Kafka的接口,当所述Kafka的接口接收到主题创建命令时,调用主题创建脚本,并通过所述主题创建脚本创建主题;Calling the interface of the distributed messaging system Kafka in the message queue service system, when the interface of Kafka receives a topic creation command, calling the topic creation script, and creating a topic through the topic creation script;
    通过所述Kafka根据所述待处理数据对应的集群创建生产者,并通过所述Kafka根据所述统一管理网站系统创建消费者;Create producers through the Kafka according to the cluster corresponding to the data to be processed, and create consumers through the Kafka according to the unified management website system;
    将所述待处理数据输入至所述Kafka,并通过所述Kafka根据所述主题和所述生产者对所述待处理数据进行分类,以获取目标数据;Input the to-be-processed data into the Kafka, and use the Kafka to classify the to-be-processed data according to the theme and the producer to obtain target data;
    根据所述生产者和所述主题对所述目标数据进行区块划分,以获取多个区块,根据已创建的区划协议链接多个所述区块,并以链接的多个所述区块和所述消费者作为数据储存层,其中,所述区划协议用于通过链条将每个所述区块从后向前有序地链接和指向前一个所述区块,以及将创建的所述区块链系统链接到所述Kafka中,以使所述Kafka运用到所述区块链系统中;Divide the target data into blocks according to the producer and the theme to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and link multiple blocks with And the consumer as a data storage layer, wherein the zoning protocol is used to link each of the blocks in an orderly manner from back to front through a chain and point to the previous block, and the created block The blockchain system is linked to the Kafka, so that the Kafka can be used in the blockchain system;
    根据所述区划协议和所述数据存储层构建区块链系统,并通过所述区块链系统按照http的请求方式将所述目标数据输入至储存库,并触发读取指令,其中,所述Kafka包括储存库,所述储存库的数量包括多个;A blockchain system is constructed according to the zoning protocol and the data storage layer, and the target data is input to the repository through the blockchain system in the HTTP request mode, and a read instruction is triggered, wherein the Kafka includes repositories, and the number of repositories includes multiple;
    当所述统一管理网站系统接收到所述读取指令时,通过所述数据存储层输出所述储存库中的所述目标数据,并将所述目标数据输入至MySQL数据库的缓存区;When the unified management website system receives the read instruction, output the target data in the repository through the data storage layer, and input the target data into the cache area of the MySQL database;
    将所述缓存区中的目标数据转换为超文本标记语言数据,并将所述超文本标记语言数据写入至已构建的静态超文本标记语言页面文件。The target data in the buffer area is converted into hypertext markup language data, and the hypertext markup language data is written into the constructed static hypertext markup language page file.
  2. 根据权利要求1所述的方法,所述任务包括事件,所述通过所述Kafka根据所述主题和所述生产者对所述待处理数据进行分类,以获取目标数据,包括:The method according to claim 1, wherein the task includes an event, and the Kafka classifies the to-be-processed data according to the topic and the producer to obtain target data, including:
    获取所述事件的顺序关联度,以及获取所述事件的吞吐量,以及识别所述事件的实体类型,并获取所述实体类型之间的关联度,其中,所述实体类型用于一个地址对应一个用户;Acquire the sequence correlation degree of the event, acquire the throughput of the event, and identify the entity type of the event, and acquire the correlation degree between the entity types, wherein the entity type is used for one address correspondence A user
    根据所述顺序关联度、所述吞吐量和所述关联度,按照预设归类策略将所述待处理数据归类到所述主题,以获得第一分类数据,其中,所述预设归类策略包括将满足所述顺序关联度大于第一预设阈值、所述吞吐量小于第二预设阈值和所述关联度大于第三预设阈值中的至少一个条件的待处理数据归类到同一个主题;According to the sequential correlation degree, the throughput, and the correlation degree, the data to be processed is classified into the topic according to a preset classification strategy to obtain the first classification data, wherein the preset classification The class strategy includes classifying the to-be-processed data that meets at least one of the following conditions: the sequential correlation degree is greater than a first preset threshold, the throughput is less than a second preset threshold, and the correlation degree is greater than a third preset threshold. The same theme
    标记所述第一分类数据,其中,所述标记的内容包括所述待处理数据对应的顺序关联度、吞吐量、实体类型、实体类型之间的关联度和主题的名称;Marking the first classification data, wherein the content of the marking includes the sequential relevance, throughput, entity type, relevance between entity types and the name of the subject corresponding to the data to be processed;
    在标记的第一分类数据中根据所述生产者的类型进行分类,并标记所述标记的第一分类数据的生产者的类型,以获取目标数据。The marked first classification data is classified according to the type of the producer, and the type of the producer of the marked first classification data is marked to obtain target data.
  3. 根据权利要求1所述的方法,所述通过所述Kafka根据所述主题和所述生产者对所述待处理数据进行分类之后,所述获取目标数据之前,所述方法还包括:The method according to claim 1, after the Kafka classifies the data to be processed according to the subject and the producer, and before the acquisition of the target data, the method further comprises:
    对经过分类的所述待处理数据进行初始化处理,以及根据经过分类的所述待处理数据 的分类类型设置线性散列表的长度;Performing initialization processing on the classified data to be processed, and setting the length of the linear hash table according to the classification type of the classified data to be processed;
    获取经过分类的所述待处理数据的关键码值,计算经过分类的所述待处理数据的数据项的词频-逆文本频率指数TF-IDF值,获取所述TF-IDF值大于第四预设阈值的数据项对应的目标关键码值,其中,所述待处理数据包括数据项;Obtain the key code value of the classified data to be processed, calculate the word frequency-inverse text frequency index TF-IDF value of the classified data item of the to-be-processed data, and obtain that the TF-IDF value is greater than the fourth preset The target key value corresponding to the data item of the threshold, wherein the data to be processed includes the data item;
    将所述目标关键码值被不大于所述线性散列表的长度的数值除后所得的余数作为所述线性散列表的地址,以所述目标关键码值作为所述线性散列表的表头,以所述线性散列表的地址作为所述线性散列表的数量,以获取所述线性散列表;The remainder obtained by dividing the target key value by a value not greater than the length of the linear hash table is used as the address of the linear hash table, and the target key value is used as the header of the linear hash table, Use the address of the linear hash table as the number of the linear hash table to obtain the linear hash table;
    随机生成预设数量的相同长度的字符串,通过预设字符串函数对所述线性散列表进行统计与分析,以获取散列分布性信息和平均桶长信息,其中,所述散列分布性信息包括桶的使用率,所述平均桶长信息包括所有已使用桶的平均长度;Randomly generate a preset number of strings of the same length, and perform statistics and analysis on the linear hash table through a preset string function to obtain hash distribution information and average bucket length information, wherein the hash distribution The information includes the usage rate of the bucket, and the average bucket length information includes the average length of all used buckets;
    判断所述散列分布性信息是否满足第一预设条件,以及判断所述平均桶长信息满足第二预设条件,其中,所述第一预设条件包括已使用桶的数量与总的桶的数量的比例值为第一预设范围值,所述第二预设条件包括所有已使用桶的平均长度的值为第二预设范围值;Determine whether the hash distribution information satisfies a first preset condition, and determine that the average bucket length information satisfies a second preset condition, where the first preset condition includes the number of used buckets and the total buckets The value of the ratio of the number of is the first preset range value, and the second preset condition includes the value of the average length of all used buckets is the second preset range value;
    若判断结果均为是,则以判断结果均为是对应的线性散列表作为最终的线性散列表;If the judgment result is all yes, then the linear hash table corresponding to the judgment result is all as the final linear hash table;
    将所述目标关键码值填充至最终的线性散列表,并以链表形式输出最终的线性散列表,以获取目标数据。The target key code value is filled into the final linear hash table, and the final linear hash table is output in the form of a linked list to obtain target data.
  4. 根据权利要求1所述的方法,所述方法包括传输通道,所述将所述待处理数据输入至所述Kafka,包括:The method according to claim 1, said method comprising a transmission channel, and said inputting said to-be-processed data to said Kafka comprises:
    对所述待处理数据进行数据压缩;Performing data compression on the to-be-processed data;
    判断所述传输通道的传输状态是否正常;Judging whether the transmission status of the transmission channel is normal;
    若判断结果为是,则将经过数据压缩的待处理数据输入至所述Kafka,并将输入至所述Kafka的待处理数据标记为已发送;If the judgment result is yes, input the data to be processed after data compression into the Kafka, and mark the data to be processed input to the Kafka as sent;
    若判断结果为否,则将经过数据压缩的待处理数据输入至第一MySQL数据库,并将输入至所述第一MySQL数据库的待处理数据标记为未发送;If the judgment result is no, input the data to be processed after data compression into the first MySQL database, and mark the data to be processed into the first MySQL database as not sent;
    调用已创建的轮询脚本,通过所述轮询脚本按照预设时间对所述第一MySQL数据库进行轮询检测;Calling the created polling script, and polling and detecting the first MySQL database according to a preset time through the polling script;
    当轮询检测到所述第一MySQL数据库存在标记未发送的待处理数据,以及轮询检测到所述传输通道的传输状态正常时,将标记为未发送的待处理数据输入至所述第一MySQL数据库;When polling detects that the first MySQL database has unsent pending data, and polling detects that the transmission status of the transmission channel is normal, input the unsent pending data to the first MySQL database. MySQL database;
    轮询检测所述第一MySQL数据库是否接收到标记为未发送的待处理数据;Polling to detect whether the first MySQL database receives pending data marked as unsent;
    若检测结果为是,则将标记为未发送的待处理数据中的未发送标记替换为已发送标记;If the detection result is yes, replace the unsent mark in the pending data marked as unsent with the sent mark;
    若检测结果为否,则不更新标记为未发送的待处理数据中的未发送标记。If the detection result is negative, the unsent mark in the pending data marked as unsent is not updated.
  5. 根据权利要求1所述的方法,所述当所述统一管理网站系统接收到所述读取指令时,通过所述数据存储层输出所述储存库中的所述目标数据,并将所述目标数据输入至MySQL数据库的缓存区,包括:The method according to claim 1, wherein when the unified management website system receives the read instruction, output the target data in the repository through the data storage layer, and transfer the target The data is entered into the buffer area of the MySQL database, including:
    所述统一管理网站系统调用监听程序脚本,通过所述监听程序脚本检测所述区块链系统中的应用层是否接收到所述读取指令;The unified management website system invokes a listener program script, and detects whether the application layer in the blockchain system receives the read instruction through the listener program script;
    当检测结果为否时,对所述区块链系统中的应用层进行再检测;When the detection result is negative, re-detect the application layer in the blockchain system;
    当检测结果为是时,通过所述消费者按照预设抓取数量对从所述储存库中的目标数据进行抓取,并将抓取的目标数据添加已消费的标签,以获取标记的目标数据;When the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed tag to obtain the marked target data;
    把所述标记的目标数据转换为JSON对象,将所述JSON对象解析成第一数据对象;Convert the marked target data into a JSON object, and parse the JSON object into a first data object;
    识别所述MySQL数据库的第二数据对象中是否存在与所述第一数据对象相同内容的数据对象;Identifying whether there is a data object with the same content as the first data object in the second data object of the MySQL database;
    若识别结果为是,则在所述第一数据对象中删除与所述第二数据对象存在相同内容的数据对象,以获得第一目标数据对象;If the recognition result is yes, delete data objects that have the same content as the second data object from the first data object to obtain the first target data object;
    获取所述第一目标数据对象的标签中标记的主题和生产者信息;Acquiring the subject and producer information marked in the label of the first target data object;
    根据所述主题和生产者信息,将所述第一目标数据对象填充至所述MySQL数据库的缓存区;Filling the first target data object into the cache area of the MySQL database according to the subject and producer information;
    若识别结果为否时,则获取所述第一数据对象的标签中标记的主题和生产者信息;If the recognition result is negative, obtain the subject and producer information marked in the label of the first data object;
    根据所述主题和生产者信息,将所述第一数据对象填充至所述MySQL数据库的缓存区。According to the subject and producer information, the first data object is filled into the cache area of the MySQL database.
  6. 根据权利要求1所述的方法,所述将所述缓存区中的目标数据转换为超文本标记语言数据之前,所述方法还包括:The method according to claim 1, before said converting the target data in the buffer area into hypertext markup language data, the method further comprises:
    检测所述MySQL数据库中的数据库事务是否处于执行状态;Detecting whether the database transaction in the MySQL database is in an executing state;
    若是,则获取所述缓存区的目标数据的初始数据,并通过Locktable语句锁定所述MySQL数据库,将后续输入至MySQL数据库的缓存区的目标数据的更新数据添加至所述初始数据,其中,所述Locktable语句包括具备WRITE关键字的Locktable语句;If yes, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and add the updated data of the target data subsequently input to the cache area of the MySQL database to the initial data, where all The Locktable statement includes the Locktable statement with the WRITE keyword;
    获取所述缓存区的目标数据中具备预设字段的数据,并获取具备预设字段的数据的字段大小,其中,所述预设字段包括用于Join、Where判断和Orderby排序的字段,以及用于MAX()命令、MIN()命令和Orderby命令的字段;Obtain data with preset fields in the target data of the buffer area, and obtain the field size of the data with preset fields, where the preset fields include fields for Join, Where judgment and Orderby sorting, and use The fields in the MAX() command, MIN() command and Orderby command;
    根据所述具备预设字段的数据和所述具备预设字段的数据的字段大小,按照预设规则创建索引,其中,所述预设规则包括对相同字段大小的目标数据进行创建索引和对包含重复值不超过第五预设阈值的目标数据进行创建索引;According to the data with the preset field and the field size of the data with the preset field, an index is created according to a preset rule, wherein the preset rule includes indexing the target data of the same field size and indexing the target data containing the same field size. Create an index for the target data whose repeated value does not exceed the fifth preset threshold;
    检测所述MySQL数据库中的数据表的类型是否定义为InnoDB类型;Detecting whether the type of the data table in the MySQL database is defined as an InnoDB type;
    若否,则将所述类型不是所述InnoDB类型的数据表中的Createtable语句中加上TYPE=INNODB,以获取InnoDB类型表;If not, add TYPE=INNODB to the Createtable statement in the data table whose type is not the InnoDB type to obtain the InnoDB type table;
    若是,则获取所述类型为所述InnoDB类型的数据表,并将所述类型为所述InnoDB类型的数据表作为InnoDB类型表;If yes, obtain the data table whose type is the InnoDB type, and use the data table whose type is the InnoDB type as the InnoDB type table;
    通过alter table命令对所述InnoDB类型表创建外键。Create a foreign key for the InnoDB type table through the alter table command.
  7. 根据权利要求1所述的方法,所述通过所述Kafka根据所述主题和所述生产者对所述待处理数据进行分类,以获取目标数据,包括:The method according to claim 1, wherein the Kafka classifies the to-be-processed data according to the subject and the producer to obtain target data, comprising:
    获取所述待处理数据对应的任务的运行状态的特征信息;Acquiring characteristic information of the running state of the task corresponding to the to-be-processed data;
    根据所述特征信息对所述待处理数据进行整理与分类,以获取分类数据,并标记所述分类数据的分类类型,其中,所述分类数据的分类类型包括任务启动数据、任务运行数据和任务结束数据;Sort and classify the to-be-processed data according to the characteristic information to obtain classification data and mark the classification type of the classification data. The classification type of the classification data includes task start data, task operation data, and task End data
    对所述分类数据按照所述分类类型分别建立所述分类数据和所述主题的对应关系,并标记所述分类数据的对应关系,以获取目标数据。The corresponding relationship between the classified data and the subject is established for the classified data according to the classification type, and the corresponding relationship between the classified data is marked to obtain target data.
  8. 一种处理多集群作业记录的装置,所述装置包括:A device for processing multi-cluster job records, the device comprising:
    收发模块,用于接收多个集群运行任务生成的作业记录数据;The transceiver module is used to receive job record data generated by multiple cluster running tasks;
    检测模块,用于检测所述任务的运行状态,当检测到所述运行状态为预设触发点时,向已创建的触发器发送触发指令,所述触发器接收所述触发指令,将所述收发模块接收的作业记录数据的数据格式转换为JSON格式,以获取待处理数据,其中,所述预设触发点包 括多个所述集群运行任务时的启动或暂停或结束的运行状态;The detection module is used to detect the running state of the task, and when it is detected that the running state is a preset trigger point, send a trigger instruction to the created trigger, and the trigger receives the trigger instruction and sets the The data format of the job record data received by the transceiver module is converted into a JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of running states of the cluster running tasks that are started or paused or ended;
    调用模块,用于调用所述消息队列服务系统中的分布式消息系统Kafka,当所述Kafka接收到主题创建命令时,调用主题创建脚本,并通过所述主题创建脚本创建主题;通过所述Kafka根据所述待处理数据对应的集群创建生产者,并通过所述Kafka根据所述统一管理网站系统创建消费者;The calling module is used to call the distributed messaging system Kafka in the message queue service system. When the Kafka receives a topic creation command, it calls the topic creation script, and creates a topic through the topic creation script; through the Kafka Create a producer according to the cluster corresponding to the to-be-processed data, and create a consumer according to the unified management website system through the Kafka;
    分类模块,用于将所述检测模块获取的所述待处理数据输入至所述调用模块调用的所述Kafka,并通过所述Kafka根据所述调用模块创建的所述主题和所述生产者对所述待处理数据进行分类,以获取目标数据;The classification module is configured to input the to-be-processed data acquired by the detection module into the Kafka invoked by the invoking module, and use the Kafka according to the topic created by the invoking module and the producer pair The data to be processed is classified to obtain target data;
    划分模块,用于根据所述调用模块创建的所述生产者和所述调用模块创建的所述主题对所述分类模块获取的所述目标数据进行区块划分,以获取多个区块,根据已创建的区划协议链接多个所述区块,并以链接的多个所述区块和所述消费者作为数据储存层,其中,所述区划协议用于通过链条将每个所述区块从后向前有序地链接和指向前一个所述区块,以及将创建的所述区块链系统链接到所述Kafka中,以使所述Kafka运用到所述区块链系统中;The dividing module is configured to divide the target data obtained by the classification module into blocks according to the producer created by the invoking module and the theme created by the invoking module to obtain multiple blocks, according to The created zoning protocol links a plurality of the blocks, and uses the linked blocks and the consumers as the data storage layer, wherein the zoning protocol is used to link each of the blocks through a chain Link and point to the previous block in an orderly manner from back to front, and link the created blockchain system to the Kafka, so that the Kafka can be used in the blockchain system;
    构建模块,用于根据所述区划协议和所述划分模块获取的所述数据存储层构建区块链系统,并通过所述区块链系统按照http的请求方式将所述目标数据输入至储存库,并触发读取指令,其中,所述Kafka包括储存库,所述储存库的数量包括多个;The construction module is used to construct a blockchain system according to the zoning protocol and the data storage layer obtained by the division module, and input the target data into the repository through the blockchain system in accordance with the HTTP request mode , And trigger a read instruction, wherein the Kafka includes a repository, and the number of the repository includes multiple;
    接收模块,用于当所述统一管理网站系统接收到所述构建模块触发的所述读取指令时,通过所述数据存储层输出所述构建模块输入所述储存库中的所述目标数据,并将所述目标数据输入至MySQL数据库的缓存区;将所述缓存区中的目标数据转换为超文本标记语言数据,通过输出控制函数控制所述缓存区以获取所述超文本标记语言数据,并通过已创建的读写函数将所述超文本标记语言数据输入至已构建的静态超文本标记语言页面文件。The receiving module is configured to output the target data input by the building module into the repository through the data storage layer when the unified management website system receives the read instruction triggered by the building module, And input the target data into the buffer area of the MySQL database; convert the target data in the buffer area into hypertext markup language data, and control the buffer area through an output control function to obtain the hypertext markup language data, And input the hypertext markup language data into the constructed static hypertext markup language page file through the created read-write function.
  9. 根据权利要求8所述的装置,所述分类模块还用于:According to the device of claim 8, the classification module is further configured to:
    获取所述事件的顺序关联度,以及获取所述事件的吞吐量,以及识别所述事件的实体类型,并获取所述实体类型之间的关联度,其中,所述实体类型用于一个地址对应一个用户;Acquire the sequence correlation degree of the event, acquire the throughput of the event, and identify the entity type of the event, and acquire the correlation degree between the entity types, wherein the entity type is used for one address correspondence A user
    根据所述顺序关联度、所述吞吐量和所述关联度,按照预设归类策略将所述待处理数据归类到所述主题,以获得第一分类数据,其中,所述预设归类策略包括将满足所述顺序关联度大于第一预设阈值、所述吞吐量小于第二预设阈值和所述关联度大于第三预设阈值中的至少一个条件的待处理数据归类到同一个主题;According to the sequential correlation degree, the throughput, and the correlation degree, the data to be processed is classified into the topic according to a preset classification strategy to obtain the first classification data, wherein the preset classification The class strategy includes classifying the to-be-processed data that meets at least one of the following conditions: the sequential correlation degree is greater than a first preset threshold, the throughput is less than a second preset threshold, and the correlation degree is greater than a third preset threshold. The same theme
    标记所述第一分类数据,其中,所述标记的内容包括所述待处理数据对应的顺序关联度、吞吐量、实体类型、实体类型之间的关联度和主题的名称;Marking the first classification data, wherein the content of the marking includes the sequential relevance, throughput, entity type, relevance between entity types and the name of the subject corresponding to the data to be processed;
    在标记的第一分类数据中根据所述生产者的类型进行分类,并标记所述标记的第一分类数据的生产者的类型,以获取目标数据。The marked first classification data is classified according to the type of the producer, and the type of the producer of the marked first classification data is marked to obtain target data.
  10. 根据权利要求8所述的装置,所述分类模块在执行所述通过所述Kafka根据所述主题和所述生产者对所述待处理数据进行分类之后,所述获取目标数据之前,还用于:The device according to claim 8, after the classification module performs the classification of the to-be-processed data according to the theme and the producer through the Kafka, and before the acquisition of the target data, it is also used for :
    对经过分类的所述待处理数据进行初始化处理,以及根据经过分类的所述待处理数据的分类类型设置线性散列表的长度;Performing initialization processing on the classified data to be processed, and setting the length of the linear hash table according to the classification type of the classified data to be processed;
    获取经过分类的所述待处理数据的关键码值,计算经过分类的所述待处理数据的数据项的词频-逆文本频率指数TF-IDF值,获取所述TF-IDF值大于第四预设阈值的数据项对应的目标关键码值,其中,所述待处理数据包括数据项;Obtain the key code value of the classified data to be processed, calculate the word frequency-inverse text frequency index TF-IDF value of the classified data item of the to-be-processed data, and obtain that the TF-IDF value is greater than the fourth preset The target key value corresponding to the data item of the threshold, wherein the data to be processed includes the data item;
    将所述目标关键码值被不大于所述线性散列表的长度的数值除后所得的余数作为所述线性散列表的地址,以所述目标关键码值作为所述线性散列表的表头,以所述线性散列表的地址作为所述线性散列表的数量,以获取所述线性散列表;The remainder obtained by dividing the target key value by a value not greater than the length of the linear hash table is used as the address of the linear hash table, and the target key value is used as the header of the linear hash table, Use the address of the linear hash table as the number of the linear hash table to obtain the linear hash table;
    随机生成预设数量的相同长度的字符串,通过预设字符串函数对所述线性散列表进行统计与分析,以获取散列分布性信息和平均桶长信息,其中,所述散列分布性信息包括桶的使用率,所述平均桶长信息包括所有已使用桶的平均长度;Randomly generate a preset number of strings of the same length, and perform statistics and analysis on the linear hash table through a preset string function to obtain hash distribution information and average bucket length information, wherein the hash distribution The information includes the usage rate of the bucket, and the average bucket length information includes the average length of all used buckets;
    判断所述散列分布性信息是否满足第一预设条件,以及判断所述平均桶长信息满足第二预设条件,其中,所述第一预设条件包括已使用桶的数量与总的桶的数量的比例值为第一预设范围值,所述第二预设条件包括所有已使用桶的平均长度的值为第二预设范围值;Determine whether the hash distribution information satisfies a first preset condition, and determine that the average bucket length information satisfies a second preset condition, where the first preset condition includes the number of used buckets and the total buckets The value of the ratio of the number of is the first preset range value, and the second preset condition includes the value of the average length of all used buckets is the second preset range value;
    若判断结果均为是,则以判断结果均为是对应的线性散列表作为最终的线性散列表;If the judgment result is all yes, then the linear hash table corresponding to the judgment result is all as the final linear hash table;
    将所述目标关键码值填充至最终的线性散列表,并以链表形式输出最终的线性散列表,以获取目标数据。The target key code value is filled into the final linear hash table, and the final linear hash table is output in the form of a linked list to obtain target data.
  11. 根据权利要求8所述的装置,所述分类模块还用于:According to the device of claim 8, the classification module is further configured to:
    对所述待处理数据进行数据压缩;Performing data compression on the to-be-processed data;
    判断所述传输通道的传输状态是否正常;Judging whether the transmission status of the transmission channel is normal;
    若判断结果为是,则将经过数据压缩的待处理数据输入至所述Kafka,并将输入至所述Kafka的待处理数据标记为已发送;If the judgment result is yes, input the data to be processed after data compression into the Kafka, and mark the data to be processed input to the Kafka as sent;
    若判断结果为否,则将经过数据压缩的待处理数据输入至第一MySQL数据库,并将输入至所述第一MySQL数据库的待处理数据标记为未发送;If the judgment result is no, input the data to be processed after data compression into the first MySQL database, and mark the data to be processed into the first MySQL database as not sent;
    调用已创建的轮询脚本,通过所述轮询脚本按照预设时间对所述第一MySQL数据库进行轮询检测;Calling the created polling script, and polling and detecting the first MySQL database according to a preset time through the polling script;
    当轮询检测到所述第一MySQL数据库存在标记未发送的待处理数据,以及轮询检测到所述传输通道的传输状态正常时,将标记为未发送的待处理数据输入至所述第一MySQL数据库;When polling detects that the first MySQL database has unsent pending data, and polling detects that the transmission status of the transmission channel is normal, input the unsent pending data to the first MySQL database. MySQL database;
    轮询检测所述第一MySQL数据库是否接收到标记为未发送的待处理数据;Polling to detect whether the first MySQL database receives pending data marked as unsent;
    若检测结果为是,则将标记为未发送的待处理数据中的未发送标记替换为已发送标记;If the detection result is yes, replace the unsent mark in the pending data marked as unsent with the sent mark;
    若检测结果为否,则不更新标记为未发送的待处理数据中的未发送标记。If the detection result is negative, the unsent mark in the pending data marked as unsent is not updated.
  12. 根据权利要求8所述的装置,所述接收模块还用于:According to the device of claim 8, the receiving module is further configured to:
    所述统一管理网站系统调用监听程序脚本,通过所述监听程序脚本检测所述区块链系统中的应用层是否接收到所述读取指令;The unified management website system invokes a listener program script, and detects whether the application layer in the blockchain system receives the read instruction through the listener program script;
    当检测结果为否时,对所述区块链系统中的应用层进行再检测;When the detection result is negative, re-detect the application layer in the blockchain system;
    当检测结果为是时,通过所述消费者按照预设抓取数量对从所述储存库中的目标数据进行抓取,并将抓取的目标数据添加已消费的标签,以获取标记的目标数据;When the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed tag to obtain the marked target data;
    把所述标记的目标数据转换为JSON对象,将所述JSON对象解析成第一数据对象;Convert the marked target data into a JSON object, and parse the JSON object into a first data object;
    识别所述MySQL数据库的第二数据对象中是否存在与所述第一数据对象相同内容的数据对象;Identifying whether there is a data object with the same content as the first data object in the second data object of the MySQL database;
    若识别结果为是,则在所述第一数据对象中删除与所述第二数据对象存在相同内容的数据对象,以获得第一目标数据对象;If the recognition result is yes, delete data objects that have the same content as the second data object from the first data object to obtain the first target data object;
    获取所述第一目标数据对象的标签中标记的主题和生产者信息;Acquiring the subject and producer information marked in the label of the first target data object;
    根据所述主题和生产者信息,将所述第一目标数据对象填充至所述MySQL数据库的缓存区;Filling the first target data object into the cache area of the MySQL database according to the subject and producer information;
    若识别结果为否时,则获取所述第一数据对象的标签中标记的主题和生产者信息;If the recognition result is negative, obtain the subject and producer information marked in the label of the first data object;
    根据所述主题和生产者信息,将所述第一数据对象填充至所述MySQL数据库的缓存区。According to the subject and producer information, the first data object is filled into the cache area of the MySQL database.
  13. 根据权利要求8所述的装置,所述接收模块在执行所述将所述缓存区中的目标数据转换为超文本标记语言数据之前,还用于:The device according to claim 8, before the receiving module executes the conversion of the target data in the buffer area into hypertext markup language data, it is further configured to:
    检测所述MySQL数据库中的数据库事务是否处于执行状态;Detecting whether the database transaction in the MySQL database is in an executing state;
    若是,则获取所述缓存区的目标数据的初始数据,并通过Locktable语句锁定所述MySQL数据库,将后续输入至MySQL数据库的缓存区的目标数据的更新数据添加至所述初始数据,其中,所述Locktable语句包括具备WRITE关键字的Locktable语句;If yes, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and add the updated data of the target data subsequently input to the cache area of the MySQL database to the initial data, where all The Locktable statement includes the Locktable statement with the WRITE keyword;
    获取所述缓存区的目标数据中具备预设字段的数据,并获取具备预设字段的数据的字段大小,其中,所述预设字段包括用于Join、Where判断和Orderby排序的字段,以及用于MAX()命令、MIN()命令和Orderby命令的字段;Obtain data with preset fields in the target data of the buffer area, and obtain the field size of the data with preset fields, where the preset fields include fields for Join, Where judgment and Orderby sorting, and use The fields in the MAX() command, MIN() command and Orderby command;
    根据所述具备预设字段的数据和所述具备预设字段的数据的字段大小,按照预设规则创建索引,其中,所述预设规则包括对相同字段大小的目标数据进行创建索引和对包含重复值不超过第五预设阈值的目标数据进行创建索引;According to the data with the preset field and the field size of the data with the preset field, an index is created according to a preset rule, wherein the preset rule includes indexing the target data of the same field size and indexing the target data containing the same field size. Create an index for the target data whose repeated value does not exceed the fifth preset threshold;
    检测所述MySQL数据库中的数据表的类型是否定义为InnoDB类型;Detecting whether the type of the data table in the MySQL database is defined as an InnoDB type;
    若否,则将所述类型不是所述InnoDB类型的数据表中的Createtable语句中加上TYPE=INNODB,以获取InnoDB类型表;If not, add TYPE=INNODB to the Createtable statement in the data table whose type is not the InnoDB type to obtain the InnoDB type table;
    若是,则获取所述类型为所述InnoDB类型的数据表,并将所述类型为所述InnoDB类型的数据表作为InnoDB类型表;If yes, obtain the data table whose type is the InnoDB type, and use the data table whose type is the InnoDB type as the InnoDB type table;
    通过alter table命令对所述InnoDB类型表创建外键。Create a foreign key for the InnoDB type table through the alter table command.
  14. 根据权利要求8所述的装置,所述分类模块还用于:According to the device of claim 8, the classification module is further configured to:
    获取所述待处理数据对应的任务的运行状态的特征信息;Acquiring characteristic information of the running state of the task corresponding to the to-be-processed data;
    根据所述特征信息对所述待处理数据进行整理与分类,以获取分类数据,并标记所述分类数据的分类类型,其中,所述分类数据的分类类型包括任务启动数据、任务运行数据和任务结束数据;Sort and classify the to-be-processed data according to the characteristic information to obtain classification data and mark the classification type of the classification data. The classification type of the classification data includes task start data, task operation data, and task End data
    对所述分类数据按照所述分类类型分别建立所述分类数据和所述主题的对应关系,并标记所述分类数据的对应关系,以获取目标数据。The corresponding relationship between the classified data and the subject is established for the classified data according to the classification type, and the corresponding relationship between the classified data is marked to obtain target data.
  15. 一种处理多集群作业记录的设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如下步骤:A device for processing multi-cluster job records includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer program:
    获取多个集群运行任务生成的作业记录数据,以及检测所述任务的运行状态,当检测到所述运行状态为预设触发点时,向已创建的触发器发送触发指令,所述触发器接收所述触发指令,将所述作业记录数据的数据格式转换为JSON格式,以获取待处理数据,其中,所述预设触发点包括多个所述集群运行任务时的启动或暂停或结束的运行状态;Obtain job record data generated by multiple cluster running tasks, and detect the running status of the task. When it is detected that the running status is a preset trigger point, a trigger instruction is sent to the created trigger, and the trigger receives The trigger instruction converts the data format of the job record data into a JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of operations that start, pause, or end when the cluster runs tasks status;
    调用所述消息队列服务系统中的分布式消息系统Kafka的接口,当所述Kafka的接口接收到主题创建命令时,调用主题创建脚本,并通过所述主题创建脚本创建主题;Calling the interface of the distributed messaging system Kafka in the message queue service system, when the interface of Kafka receives a topic creation command, calling the topic creation script, and creating a topic through the topic creation script;
    通过所述Kafka根据所述待处理数据对应的集群创建生产者,并通过所述Kafka根据所述统一管理网站系统创建消费者;Create producers through the Kafka according to the cluster corresponding to the data to be processed, and create consumers through the Kafka according to the unified management website system;
    将所述待处理数据输入至所述Kafka,并通过所述Kafka根据所述主题和所述生产者对所述待处理数据进行分类,以获取目标数据;Input the to-be-processed data into the Kafka, and use the Kafka to classify the to-be-processed data according to the theme and the producer to obtain target data;
    根据所述生产者和所述主题对所述目标数据进行区块划分,以获取多个区块,根据已 创建的区划协议链接多个所述区块,并以链接的多个所述区块和所述消费者作为数据储存层,其中,所述区划协议用于通过链条将每个所述区块从后向前有序地链接和指向前一个所述区块,以及将创建的所述区块链系统链接到所述Kafka中,以使所述Kafka运用到所述区块链系统中;Divide the target data into blocks according to the producer and the theme to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and link multiple blocks with And the consumer as a data storage layer, wherein the zoning protocol is used to link each of the blocks in an orderly manner from back to front through a chain and point to the previous block, and the created block The blockchain system is linked to the Kafka, so that the Kafka can be used in the blockchain system;
    根据所述区划协议和所述数据存储层构建区块链系统,并通过所述区块链系统按照http的请求方式将所述目标数据输入至储存库,并触发读取指令,其中,所述Kafka包括储存库,所述储存库的数量包括多个;A blockchain system is constructed according to the zoning protocol and the data storage layer, and the target data is input to the repository through the blockchain system in the HTTP request mode, and a read instruction is triggered, wherein the Kafka includes repositories, and the number of repositories includes multiple;
    当所述统一管理网站系统接收到所述读取指令时,通过所述数据存储层输出所述储存库中的所述目标数据,并将所述目标数据输入至MySQL数据库的缓存区;When the unified management website system receives the read instruction, output the target data in the repository through the data storage layer, and input the target data into the cache area of the MySQL database;
    将所述缓存区中的目标数据转换为超文本标记语言数据,并将所述超文本标记语言数据写入至已构建的静态超文本标记语言页面文件。The target data in the buffer area is converted into hypertext markup language data, and the hypertext markup language data is written into the constructed static hypertext markup language page file.
  16. 根据权利要求15所述的设备,所述处理器执行所述计算机程序时实现所述任务包括事件,所述通过所述Kafka根据所述主题和所述生产者对所述待处理数据进行分类,以获取目标数据时,包括以下步骤:The device according to claim 15, wherein the task includes an event when the processor executes the computer program, and the Kafka classifies the to-be-processed data according to the subject and the producer, To obtain target data, the following steps are included:
    获取所述事件的顺序关联度,以及获取所述事件的吞吐量,以及识别所述事件的实体类型,并获取所述实体类型之间的关联度,其中,所述实体类型用于一个地址对应一个用户;Acquire the sequence correlation degree of the event, acquire the throughput of the event, and identify the entity type of the event, and acquire the correlation degree between the entity types, wherein the entity type is used for one address correspondence A user
    根据所述顺序关联度、所述吞吐量和所述关联度,按照预设归类策略将所述待处理数据归类到所述主题,以获得第一分类数据,其中,所述预设归类策略包括将满足所述顺序关联度大于第一预设阈值、所述吞吐量小于第二预设阈值和所述关联度大于第三预设阈值中的至少一个条件的待处理数据归类到同一个主题;According to the sequential correlation degree, the throughput, and the correlation degree, the data to be processed is classified into the topic according to a preset classification strategy to obtain the first classification data, wherein the preset classification The class strategy includes classifying the to-be-processed data that meets at least one of the following conditions: the sequential correlation degree is greater than a first preset threshold, the throughput is less than a second preset threshold, and the correlation degree is greater than a third preset threshold. The same theme
    标记所述第一分类数据,其中,所述标记的内容包括所述待处理数据对应的顺序关联度、吞吐量、实体类型、实体类型之间的关联度和主题的名称;Marking the first classification data, wherein the content of the marking includes the sequential relevance, throughput, entity type, relevance between entity types and the name of the subject corresponding to the data to be processed;
    在标记的第一分类数据中根据所述生产者的类型进行分类,并标记所述标记的第一分类数据的生产者的类型,以获取目标数据。The marked first classification data is classified according to the type of the producer, and the type of the producer of the marked first classification data is marked to obtain target data.
  17. 根据权利要求15所述的设备,所述处理器执行所述计算机程序实现所述通过所述Kafka根据所述主题和所述生产者对所述待处理数据进行分类之后,所述获取目标数据之前,还包括以下步骤:The device according to claim 15, wherein the processor executes the computer program to realize the classification of the to-be-processed data by the Kafka according to the subject and the producer, and before the acquisition of the target data , Also includes the following steps:
    对经过分类的所述待处理数据进行初始化处理,以及根据经过分类的所述待处理数据的分类类型设置线性散列表的长度;Performing initialization processing on the classified data to be processed, and setting the length of the linear hash table according to the classification type of the classified data to be processed;
    获取经过分类的所述待处理数据的关键码值,计算经过分类的所述待处理数据的数据项的词频-逆文本频率指数TF-IDF值,获取所述TF-IDF值大于第四预设阈值的数据项对应的目标关键码值,其中,所述待处理数据包括数据项;Obtain the key code value of the classified data to be processed, calculate the word frequency-inverse text frequency index TF-IDF value of the classified data item of the to-be-processed data, and obtain that the TF-IDF value is greater than the fourth preset The target key value corresponding to the data item of the threshold, wherein the data to be processed includes the data item;
    将所述目标关键码值被不大于所述线性散列表的长度的数值除后所得的余数作为所述线性散列表的地址,以所述目标关键码值作为所述线性散列表的表头,以所述线性散列表的地址作为所述线性散列表的数量,以获取所述线性散列表;The remainder obtained by dividing the target key value by a value not greater than the length of the linear hash table is used as the address of the linear hash table, and the target key value is used as the header of the linear hash table, Use the address of the linear hash table as the number of the linear hash table to obtain the linear hash table;
    随机生成预设数量的相同长度的字符串,通过预设字符串函数对所述线性散列表进行统计与分析,以获取散列分布性信息和平均桶长信息,其中,所述散列分布性信息包括桶的使用率,所述平均桶长信息包括所有已使用桶的平均长度;Randomly generate a preset number of strings of the same length, and perform statistics and analysis on the linear hash table through a preset string function to obtain hash distribution information and average bucket length information, wherein the hash distribution The information includes the usage rate of the bucket, and the average bucket length information includes the average length of all used buckets;
    判断所述散列分布性信息是否满足第一预设条件,以及判断所述平均桶长信息满足第二预设条件,其中,所述第一预设条件包括已使用桶的数量与总的桶的数量的比例值为第 一预设范围值,所述第二预设条件包括所有已使用桶的平均长度的值为第二预设范围值;Determine whether the hash distribution information satisfies a first preset condition, and determine that the average bucket length information satisfies a second preset condition, where the first preset condition includes the number of used buckets and the total buckets The value of the ratio of the number of is the first preset range value, and the second preset condition includes the value of the average length of all used buckets is the second preset range value;
    若判断结果均为是,则以判断结果均为是对应的线性散列表作为最终的线性散列表;If the judgment result is all yes, then the linear hash table corresponding to the judgment result is all as the final linear hash table;
    将所述目标关键码值填充至最终的线性散列表,并以链表形式输出最终的线性散列表,以获取目标数据。The target key code value is filled into the final linear hash table, and the final linear hash table is output in the form of a linked list to obtain target data.
  18. 根据权利要求15所述的设备,所述处理器执行所述计算机程序实现所述将所述待处理数据输入至所述Kafka时,包括以下步骤:The device according to claim 15, when the processor executes the computer program to implement the input of the to-be-processed data into the Kafka, it comprises the following steps:
    对所述待处理数据进行数据压缩;Performing data compression on the to-be-processed data;
    判断所述传输通道的传输状态是否正常;Judging whether the transmission status of the transmission channel is normal;
    若判断结果为是,则将经过数据压缩的待处理数据输入至所述Kafka,并将输入至所述Kafka的待处理数据标记为已发送;If the judgment result is yes, input the data to be processed after data compression into the Kafka, and mark the data to be processed input to the Kafka as sent;
    若判断结果为否,则将经过数据压缩的待处理数据输入至第一MySQL数据库,并将输入至所述第一MySQL数据库的待处理数据标记为未发送;If the judgment result is no, input the data to be processed after data compression into the first MySQL database, and mark the data to be processed into the first MySQL database as not sent;
    调用已创建的轮询脚本,通过所述轮询脚本按照预设时间对所述第一MySQL数据库进行轮询检测;Calling the created polling script, and polling and detecting the first MySQL database according to a preset time through the polling script;
    当轮询检测到所述第一MySQL数据库存在标记未发送的待处理数据,以及轮询检测到所述传输通道的传输状态正常时,将标记为未发送的待处理数据输入至所述第一MySQL数据库;When polling detects that the first MySQL database has unsent pending data, and polling detects that the transmission status of the transmission channel is normal, input the unsent pending data to the first MySQL database. MySQL database;
    轮询检测所述第一MySQL数据库是否接收到标记为未发送的待处理数据;Polling to detect whether the first MySQL database receives pending data marked as unsent;
    若检测结果为是,则将标记为未发送的待处理数据中的未发送标记替换为已发送标记;If the detection result is yes, replace the unsent mark in the pending data marked as unsent with the sent mark;
    若检测结果为否,则不更新标记为未发送的待处理数据中的未发送标记。If the detection result is negative, the unsent mark in the pending data marked as unsent is not updated.
  19. 根据权利要求15所述的设备,所述处理器执行所述计算机程序实现所述当所述统一管理网站系统接收到所述读取指令时,通过所述数据存储层输出所述储存库中的所述目标数据,并将所述目标数据输入至MySQL数据库的缓存区时,包括以下步骤:The device according to claim 15, wherein the processor executes the computer program to realize the output of the data in the repository through the data storage layer when the unified management website system receives the read instruction When the target data is input into the buffer area of the MySQL database, the following steps are included:
    所述统一管理网站系统调用监听程序脚本,通过所述监听程序脚本检测所述区块链系统中的应用层是否接收到所述读取指令;The unified management website system invokes a listener program script, and detects whether the application layer in the blockchain system receives the read instruction through the listener program script;
    当检测结果为否时,对所述区块链系统中的应用层进行再检测;When the detection result is negative, re-detect the application layer in the blockchain system;
    当检测结果为是时,通过所述消费者按照预设抓取数量对从所述储存库中的目标数据进行抓取,并将抓取的目标数据添加已消费的标签,以获取标记的目标数据;When the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed tag to obtain the marked target data;
    把所述标记的目标数据转换为JSON对象,将所述JSON对象解析成第一数据对象;Convert the marked target data into a JSON object, and parse the JSON object into a first data object;
    识别所述MySQL数据库的第二数据对象中是否存在与所述第一数据对象相同内容的数据对象;Identifying whether there is a data object with the same content as the first data object in the second data object of the MySQL database;
    若识别结果为是,则在所述第一数据对象中删除与所述第二数据对象存在相同内容的数据对象,以获得第一目标数据对象;If the recognition result is yes, delete data objects that have the same content as the second data object from the first data object to obtain the first target data object;
    获取所述第一目标数据对象的标签中标记的主题和生产者信息;Acquiring the subject and producer information marked in the label of the first target data object;
    根据所述主题和生产者信息,将所述第一目标数据对象填充至所述MySQL数据库的缓存区;Filling the first target data object into the cache area of the MySQL database according to the subject and producer information;
    若识别结果为否时,则获取所述第一数据对象的标签中标记的主题和生产者信息;If the recognition result is negative, obtain the subject and producer information marked in the label of the first data object;
    根据所述主题和生产者信息,将所述第一数据对象填充至所述MySQL数据库的缓存区。According to the subject and producer information, the first data object is filled into the cache area of the MySQL database.
  20. 一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:A computer-readable storage medium stores computer instructions in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer executes the following steps:
    获取多个集群运行任务生成的作业记录数据,以及检测所述任务的运行状态,当检测到所述运行状态为预设触发点时,向已创建的触发器发送触发指令,所述触发器接收所述触发指令,将所述作业记录数据的数据格式转换为JSON格式,以获取待处理数据,其中,所述预设触发点包括多个所述集群运行任务时的启动或暂停或结束的运行状态;Obtain job record data generated by multiple cluster running tasks, and detect the running status of the task. When it is detected that the running status is a preset trigger point, a trigger instruction is sent to the created trigger, and the trigger receives The trigger instruction converts the data format of the job record data into a JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of operations that start, pause, or end when the cluster runs tasks status;
    调用所述消息队列服务系统中的分布式消息系统Kafka的接口,当所述Kafka的接口接收到主题创建命令时,调用主题创建脚本,并通过所述主题创建脚本创建主题;Calling the interface of the distributed messaging system Kafka in the message queue service system, when the interface of Kafka receives a topic creation command, calling the topic creation script, and creating a topic through the topic creation script;
    通过所述Kafka根据所述待处理数据对应的集群创建生产者,并通过所述Kafka根据所述统一管理网站系统创建消费者;Create producers through the Kafka according to the cluster corresponding to the data to be processed, and create consumers through the Kafka according to the unified management website system;
    将所述待处理数据输入至所述Kafka,并通过所述Kafka根据所述主题和所述生产者对所述待处理数据进行分类,以获取目标数据;Input the to-be-processed data into the Kafka, and use the Kafka to classify the to-be-processed data according to the theme and the producer to obtain target data;
    根据所述生产者和所述主题对所述目标数据进行区块划分,以获取多个区块,根据已创建的区划协议链接多个所述区块,并以链接的多个所述区块和所述消费者作为数据储存层,其中,所述区划协议用于通过链条将每个所述区块从后向前有序地链接和指向前一个所述区块,以及将创建的所述区块链系统链接到所述Kafka中,以使所述Kafka运用到所述区块链系统中;Divide the target data into blocks according to the producer and the theme to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and link multiple blocks with And the consumer as a data storage layer, wherein the zoning protocol is used to link each of the blocks in an orderly manner from back to front through a chain and point to the previous block, and the created block The blockchain system is linked to the Kafka, so that the Kafka can be used in the blockchain system;
    根据所述区划协议和所述数据存储层构建区块链系统,并通过所述区块链系统按照http的请求方式将所述目标数据输入至储存库,并触发读取指令,其中,所述Kafka包括储存库,所述储存库的数量包括多个;A blockchain system is constructed according to the zoning protocol and the data storage layer, and the target data is input to the repository through the blockchain system in the HTTP request mode, and a read instruction is triggered, wherein the Kafka includes repositories, and the number of repositories includes multiple;
    当所述统一管理网站系统接收到所述读取指令时,通过所述数据存储层输出所述储存库中的所述目标数据,并将所述目标数据输入至MySQL数据库的缓存区;When the unified management website system receives the read instruction, output the target data in the repository through the data storage layer, and input the target data into the cache area of the MySQL database;
    将所述缓存区中的目标数据转换为超文本标记语言数据,并将所述超文本标记语言数据写入至已构建的静态超文本标记语言页面文件。The target data in the buffer area is converted into hypertext markup language data, and the hypertext markup language data is written into the constructed static hypertext markup language page file.
PCT/CN2019/117086 2019-09-19 2019-11-11 Method and apparatus for processing multi-cluster job record, and device and storage medium WO2021051531A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910884887.8A CN110795257B (en) 2019-09-19 2019-09-19 Method, device, equipment and storage medium for processing multi-cluster job record
CN201910884887.8 2019-09-19

Publications (1)

Publication Number Publication Date
WO2021051531A1 true WO2021051531A1 (en) 2021-03-25

Family

ID=69427342

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117086 WO2021051531A1 (en) 2019-09-19 2019-11-11 Method and apparatus for processing multi-cluster job record, and device and storage medium

Country Status (2)

Country Link
CN (1) CN110795257B (en)
WO (1) WO2021051531A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113194070A (en) * 2021-03-31 2021-07-30 新华三大数据技术有限公司 Kafka cluster multi-type authority management method and device and storage medium
CN113315750A (en) * 2021-04-15 2021-08-27 新华三大数据技术有限公司 Kafka message issuing method, device and storage medium
CN113722198A (en) * 2021-09-02 2021-11-30 中国建设银行股份有限公司 Script job submission control method and device, storage medium and electronic equipment
CN113742087A (en) * 2021-09-22 2021-12-03 深圳市玄羽科技有限公司 Protection method and system for industrial internet big data server
CN114401239A (en) * 2021-12-20 2022-04-26 中国平安财产保险股份有限公司 Metadata transmission method and device, computer equipment and storage medium
CN116049190A (en) * 2023-01-18 2023-05-02 中电金信软件有限公司 Kafka-based data processing method, device, computer equipment and storage medium
WO2024037629A1 (en) * 2022-08-19 2024-02-22 顺丰科技有限公司 Data integration method and apparatus for blockchain, and computer device and storage medium

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111555957B (en) * 2020-03-26 2022-08-19 孩子王儿童用品股份有限公司 Kafka-based synchronous message service system and implementation method
CN112000515A (en) * 2020-08-07 2020-11-27 北京浪潮数据技术有限公司 Method and assembly for recovering instance data in redis cluster
CN112100265A (en) * 2020-09-17 2020-12-18 博雅正链(北京)科技有限公司 Multi-source data processing method and device for big data architecture and block chain
CN112131854A (en) * 2020-09-24 2020-12-25 北京开科唯识技术股份有限公司 Data processing method and device, electronic equipment and storage medium
CN112272220B (en) * 2020-10-16 2022-05-13 苏州浪潮智能科技有限公司 Cluster software start control method, system, terminal and storage medium
CN112751709B (en) * 2020-12-29 2023-01-10 北京浪潮数据技术有限公司 Management method, device and system of storage cluster
CN113269590B (en) * 2021-05-31 2023-06-06 五八到家有限公司 Data processing method, device and system for resource subsidy
CN115473858B (en) * 2022-09-05 2024-03-01 上海哔哩哔哩科技有限公司 Data transmission method, stream data transmission system, computer device and storage medium
CN117033449B (en) * 2023-10-09 2023-12-15 北京中科闻歌科技股份有限公司 Data processing method based on kafka stream, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685173A (en) * 2011-04-14 2012-09-19 天脉聚源(北京)传媒科技有限公司 Asynchronous task distribution system and scheduling distribution computing unit
CN106034160A (en) * 2015-03-19 2016-10-19 阿里巴巴集团控股有限公司 Distributed computing system and method
US20180181377A1 (en) * 2016-10-26 2018-06-28 Yoongu Kim Systems and methods for discovering automatable tasks
CN109800080A (en) * 2018-12-14 2019-05-24 深圳壹账通智能科技有限公司 A kind of method for scheduling task based on Quartz frame, system and terminal device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582470B (en) * 2017-09-28 2022-11-22 北京国双科技有限公司 Data processing method and data processing device
CN109451072A (en) * 2018-12-29 2019-03-08 广东电网有限责任公司 A kind of message caching system and method based on Kafka
CN110209507A (en) * 2019-05-16 2019-09-06 厦门市美亚柏科信息股份有限公司 Data processing method, device, system and storage medium based on message queue

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685173A (en) * 2011-04-14 2012-09-19 天脉聚源(北京)传媒科技有限公司 Asynchronous task distribution system and scheduling distribution computing unit
CN106034160A (en) * 2015-03-19 2016-10-19 阿里巴巴集团控股有限公司 Distributed computing system and method
US20180181377A1 (en) * 2016-10-26 2018-06-28 Yoongu Kim Systems and methods for discovering automatable tasks
CN109800080A (en) * 2018-12-14 2019-05-24 深圳壹账通智能科技有限公司 A kind of method for scheduling task based on Quartz frame, system and terminal device

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113194070A (en) * 2021-03-31 2021-07-30 新华三大数据技术有限公司 Kafka cluster multi-type authority management method and device and storage medium
CN113194070B (en) * 2021-03-31 2022-05-27 新华三大数据技术有限公司 Kafka cluster multi-type authority management method and device and storage medium
CN113315750A (en) * 2021-04-15 2021-08-27 新华三大数据技术有限公司 Kafka message issuing method, device and storage medium
CN113315750B (en) * 2021-04-15 2022-05-27 新华三大数据技术有限公司 Kafka message issuing method, device and storage medium
CN113722198A (en) * 2021-09-02 2021-11-30 中国建设银行股份有限公司 Script job submission control method and device, storage medium and electronic equipment
CN113742087A (en) * 2021-09-22 2021-12-03 深圳市玄羽科技有限公司 Protection method and system for industrial internet big data server
CN113742087B (en) * 2021-09-22 2023-12-12 深圳市玄羽科技有限公司 Protection method and system for industrial Internet big data server
CN114401239A (en) * 2021-12-20 2022-04-26 中国平安财产保险股份有限公司 Metadata transmission method and device, computer equipment and storage medium
CN114401239B (en) * 2021-12-20 2023-11-14 中国平安财产保险股份有限公司 Metadata transmission method, apparatus, computer device and storage medium
WO2024037629A1 (en) * 2022-08-19 2024-02-22 顺丰科技有限公司 Data integration method and apparatus for blockchain, and computer device and storage medium
CN116049190A (en) * 2023-01-18 2023-05-02 中电金信软件有限公司 Kafka-based data processing method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110795257B (en) 2023-06-16
CN110795257A (en) 2020-02-14

Similar Documents

Publication Publication Date Title
WO2021051531A1 (en) Method and apparatus for processing multi-cluster job record, and device and storage medium
US11573965B2 (en) Data partitioning and parallelism in a distributed event processing system
US9787706B1 (en) Modular architecture for analysis database
US20190266195A1 (en) Filtering queried data on data stores
CN107145489B (en) Information statistics method and device for client application based on cloud platform
KR101219856B1 (en) Automated data organization
US8166350B2 (en) Apparatus and method for persistent report serving
WO2021051627A1 (en) Database-based batch importing method, apparatus and device, and storage medium
US10860604B1 (en) Scalable tracking for database udpates according to a secondary index
US11429566B2 (en) Approach for a controllable trade-off between cost and availability of indexed data in a cloud log aggregation solution such as splunk or sumo
CN108228322B (en) Distributed link tracking and analyzing method, server and global scheduler
US10262024B1 (en) Providing consistent access to data objects transcending storage limitations in a non-relational data store
US11892976B2 (en) Enhanced search performance using data model summaries stored in a remote data store
US20230007014A1 (en) Detection of replacement/copy-paste attacks through monitoring and classifying api function invocations
US20130152102A1 (en) Runtime-agnostic management of applications
US20210224102A1 (en) Characterizing operation of software applications having large number of components
CN112612832A (en) Node analysis method, device, equipment and storage medium
US10248508B1 (en) Distributed data validation service
US11841827B2 (en) Facilitating generation of data model summaries
CN114461762A (en) Archive change identification method, device, equipment and storage medium
US11379268B1 (en) Affinity-based routing and execution for workflow service
US8214846B1 (en) Method and system for threshold management
US20240061494A1 (en) Monitoring energy consumption associated with users of a distributed computing system using tracing
US10896115B2 (en) Investigation of performance bottlenecks occurring during execution of software applications
TWI606350B (en) Cloud file search system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19946071

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19946071

Country of ref document: EP

Kind code of ref document: A1