WO2021051531A1 - Procédé et appareil de traitement d'enregistrement de travail à multiples grappes, et dispositif et support de stockage - Google Patents

Procédé et appareil de traitement d'enregistrement de travail à multiples grappes, et dispositif et support de stockage Download PDF

Info

Publication number
WO2021051531A1
WO2021051531A1 PCT/CN2019/117086 CN2019117086W WO2021051531A1 WO 2021051531 A1 WO2021051531 A1 WO 2021051531A1 CN 2019117086 W CN2019117086 W CN 2019117086W WO 2021051531 A1 WO2021051531 A1 WO 2021051531A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processed
preset
kafka
target data
Prior art date
Application number
PCT/CN2019/117086
Other languages
English (en)
Chinese (zh)
Inventor
林琪琛
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051531A1 publication Critical patent/WO2021051531A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/004Error avoidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of data processing, and in particular to methods, devices, equipment and storage media for processing multi-cluster job records.
  • the task job data generated by multiple clusters is obtained, the job job data is input to the unified management website, and the task type of the task job data is detected through the system of the unified management website.
  • the task job data is classified according to the task type to obtain classified data, and the classified data is input into a plurality of storage repositories respectively according to the task type.
  • the inventor realizes that because multiple clusters directly centralize task job data to a unified management website, it is easy to cause too many portal requests and channel congestion during parallel processing, which may easily lead to concurrent collapse of the multi-cluster job management system.
  • the present application provides a method, device, equipment, and storage medium for processing multi-cluster job records, which can solve the problem of concurrent crashes of the multi-cluster job management system.
  • this application provides a method for processing multi-cluster job records, including: obtaining job record data generated by multiple cluster running tasks, and detecting the running status of the tasks, and when it is detected that the running status is a preset At the trigger point, a trigger instruction is sent to the created trigger.
  • the trigger receives the trigger instruction and converts the data format of the job record data into a JSON format to obtain the data to be processed, wherein the preset Trigger points include the start, pause, or end operating status of multiple cluster running tasks; call the distributed messaging system Kafka in the message queue service system, and when the Kafka receives a topic creation command, call the topic creation Script, and create a topic through the topic creation script; create a producer through the Kafka according to the cluster corresponding to the to-be-processed data, and create a consumer through the Kafka according to the unified management website system;
  • the data is input to the Kafka, and the Kafka classifies the to-be-processed data according to the theme and the producer to obtain target data; the target data is processed according to the producer and the theme Block division to obtain multiple blocks, link multiple blocks according to the created zoning protocol, and use the linked multiple blocks and the consumers as the data storage layer, wherein the zone The protocol is used to link each of the blocks in an orderly manner
  • the present application provides an apparatus for processing multi-cluster job records, including: a transceiver module for receiving job record data generated by multiple cluster running tasks; a detection module for detecting the running status of the tasks When it is detected that the operating state is the preset trigger point, a trigger instruction is sent to the created trigger, the trigger receives the trigger instruction, and converts the data format of the job record data received by the transceiver module into JSON format to obtain the data to be processed, wherein the preset trigger point includes a plurality of running states of the cluster running tasks that are started or paused or ended; the calling module is used to call the message queue service system In the distributed messaging system Kafka, when the Kafka receives a topic creation command, the topic creation script is invoked, and the topic is created through the topic creation script; the Kafka creates a producer according to the cluster corresponding to the to-be-processed data, and Create consumers through the Kafka according to the unified management website system; a classification module for inputting the to
  • the blockchain system is linked to the Kafka, so that the Kafka can be used in the blockchain system; a building module is used to build the data storage layer according to the zoning protocol and the data storage layer obtained by the division module A blockchain system, through which the target data is input to the repository according to the HTTP request mode, and triggering a read instruction, wherein the Kafka includes a repository, and the number of the repository includes Multiple; receiving module for when the unified management website system receives the read instruction triggered by the building module, output the building module input into the repository through the data storage layer Target data, and input the target data into the cache area of the MySQL database; convert the target data in the cache area into hypertext markup language data, and control the cache area through an output control function to obtain the hypertext markup Language data, and input the hypertext markup language data into the constructed static hypertext markup language page file through the created read-write function.
  • the present application provides a computer device, which includes at least one connected processor, a memory, a display, and an input and output unit, wherein the memory is used to store program code, and the processor is used to call the memory To execute the method described in the first aspect above.
  • the present application provides a computer-readable storage medium having computer instructions stored in the computer-readable storage medium.
  • the computer instructions run on a computer, the computer executes the above-mentioned first aspect. method.
  • the job record data generated by multiple cluster running tasks is processed to obtain the to-be-processed data;
  • the distributed message system Kafka in the message queue service system is used to create topics, producers and consumers; through
  • the Kafka classifies the to-be-processed data to obtain target data, and constructs a blockchain system according to the producer, the subject, and the target data; and inputs the target data through the blockchain system
  • To the repository input the target data in the repository into the cache area of the MySQL database through the unified management website system; convert the target data in the cache area into hypertext markup language data, and mark the hypertext
  • the language data is input to the static hypertext markup language page file.
  • the Kafka system in the message queue server is used as a message queue and combined with the blockchain system for distributed data storage, and multi-node concurrency Processing data caching decouples the system, reduces the pressure of collecting job record data of multiple big data clusters at the same time, avoids the congestion of the job record data of multiple big data clusters at the same time, and achieves high fault tolerance, high speed cache, high efficiency and high Throughput processing effect; on the other hand, by statically processing the hypertext markup language on the target data input to the cache area of the MySQL database to increase the access speed and operation speed, and reduce the load of the server; in summary, this application It can achieve the effects of low-cost, high-efficiency, high-accuracy and multi-dimensional handling of the concurrent crash of the system. Therefore, the present application can effectively prevent and deal with the concurrent crash of the multi-cluster job management system.
  • FIG. 1 is a schematic flowchart of a method for evaluating cloud host resources in an embodiment of this application
  • FIG. 2 is a schematic structural diagram of an apparatus for evaluating cloud host resources in an embodiment of the application
  • FIG. 3 is a schematic structural diagram of a computer device in an embodiment of the application.
  • This application provides a method, device, equipment, and storage medium for processing multi-cluster job records, which can be used in an enterprise multi-cluster job management platform to manage and query job operation records generated by multiple big data clusters.
  • this application mainly provides the following technical solutions:
  • the method includes a big data cluster layer, a message queue server, and a unified management website system architecture.
  • the method is executed by a computer device. It can be a server or a terminal.
  • the terminal is a terminal on which the device 20 shown in FIG. 2 is installed.
  • This application does not limit the type of execution subject, including:
  • the job record data generated by multiple cluster running tasks and the running status of the detection task.
  • the trigger command is sent to the created trigger, and the trigger receives the trigger command.
  • the data format of the job record data is converted to the JSON format to obtain the data to be processed.
  • the preset trigger point includes the running state of starting or suspending or ending when multiple clusters are running tasks.
  • the data to be processed is the running account, job content, submission time, start time, project and task operation initiator data; when the task is detected to be suspended, the data to be processed is the running account , Job content, submission time, start time, belonging project, task operation starter, operation suspension time and task operation suspension data; when the end of the task is detected, the pending data obtained is the running account, job content, and submission time , Start time, belonging project, task operation starter, operation end time and running result data.
  • the job record data generated by running tasks in multiple clusters is stored in the MySQL database connected to multiple clusters. After reading the job record data from the MySQL database connected to multiple clusters, the data format is converted to JSON format. To facilitate the processing of structured data.
  • the method of the present application before converting the data format of the job record data into the JSON format, further includes: performing data compression on the job record data; and performing data compression on the job record data after data compression Perform state detection to obtain state information, analyze the state information through the cache coherency protocol to obtain the first data and the second data.
  • the state information includes the modified state, the exclusive state, the shared state, and the invalid state.
  • One data includes job operation data that has strong requirements for consistent caching; the second data includes job operation data that has strong requirements for consistent cache; call the Cache local cache interface to generate a cache builder CacheBuilder object for the first data, and assemble it
  • the first data automatic loading function, and the first key-value pair data of the first data is obtained; the first key-value pair data is automatically loaded into the physical memory cache through the CacheBuilder object and the automatic loading function; the CacheLoader subclass object is created, When it is detected that the get data operation fails, the first key-value pair data is automatically loaded into the physical memory cache through the CacheLoader subclass object;
  • the cache architecture component of the high-speed cache system Memcached and the data structure server Redis is built, among which, the cache architecture component Including a cache server; obtaining the first hash value of the node of the cache architecture component, and obtaining the second key-value pair data of the second data, and obtaining the second hash value of the second key-value pair data; according to the
  • Kafka Call the distributed messaging system Kafka in the message queue server.
  • Kafka receives the topic creation command, it calls the topic creation script, and creates the topic through the topic creation script.
  • the content of the topic creation command includes: the topic is running_result, there are M partitions, each partition needs to be allocated N copies, and the topic creation script including the command line running part and the background (controller) logic running part is called.
  • the background (controller) logic running part monitors the corresponding directory node under the distributed application coordination service zookeeper.
  • the command line running part creates a new data node when receiving the theme creation command to trigger the background (controller) logic running part.
  • the theme has been created . Create topics to facilitate the summary of the input data to be processed.
  • the cluster is the provider of the data to be processed, that is, the producer; the unified management website system is the consumer of the data to be processed, that is, the consumer.
  • the consumer end (unified management website system) runs automatically and has the function of monitoring the update of the topic in Kafka. Through Kafka's producer and consumer mode, the effect of parallel processing of job record data generated by the cluster and balance of system load is achieved.
  • the to-be-processed data is classified into the to-be-processed data corresponding to different producers.
  • the to-be-processed data for the producer classification the to-be-processed data is reclassified according to the theme to obtain Target data.
  • the data to be processed can be classified by summarizing the data into topics: classify events in a fixed order in the same topic, and use the same partition key; for different entities, and one entity depends on another entity Event, classify the event in the same topic; classify the events whose throughput is higher than the first preset throughput threshold into different topics, and classify the events that are lower than the second preset throughput threshold into the same topic Topic.
  • the above-mentioned tasks include events.
  • the above-mentioned classification of data to be processed by Kafka according to topics and producers to obtain target data includes: obtaining the order correlation degree of events, and obtaining The throughput of the event, and the entity type that identifies the event, and obtains the correlation between the entity types.
  • the entity type is used for an address corresponding to a user; according to the sequential correlation, throughput and correlation, it is classified according to the preset
  • the strategy classifies the to-be-processed data into topics to obtain the first classification data, where the preset classification strategy includes satisfying the order that the degree of relevance is greater than the first preset threshold, the throughput is less than the second preset threshold, and the degree of relevance is greater than the first
  • the data to be processed under at least one of the three preset thresholds are classified into the same topic; the first classification data is marked, where the marked content includes the order correlation degree, throughput, entity type, and entity type corresponding to the data to be processed
  • the degree of relevance and the name of the subject in the marked first classification data, the classification is carried out according to the type of the producer, and the type of the producer of the marked first classification data is marked to obtain the target data.
  • the above-mentioned tasks include events.
  • the method of the present application further includes: The data to be processed is initialized, and the length of the linear hash table is set according to the classification type of the data to be processed after classification; the key value of the data to be processed after classification is obtained, and the word frequency of the data item of the data to be processed after classification is calculated -Inverse text frequency index TF-IDF value, to obtain the target key value corresponding to the data item whose TF-IDF value is greater than the fourth preset threshold, where the data to be processed includes the data item; the target key value is not greater than the linear scatter
  • the remainder obtained by dividing the value of the length of the list is used as the address of the linear hash table, the target key value is used as the header of the linear hash table, and the address of the linear hash table is used as the number of the linear
  • the access speed is not affected by the total amount of access elements, and it is suitable for databases with large amounts of data and High-efficiency features to improve the query speed of the function record data, and improve the query of the function record data without affecting the query speed of the function record data while solving the concurrent crash problem of the multi-cluster job management system speed.
  • the performance of the system and the scalability of the system can be improved at a low cost.
  • the method of the present application includes a transmission channel
  • the above-mentioned inputting the to-be-processed data into Kafka includes: performing data compression on the to-be-processed data; judging whether the transmission status of the transmission channel is normal; if If the judgment result is yes, the data to be processed after data compression is input into Kafka, and the data to be processed into Kafka is marked as sent; if the judgment result is no, the data to be processed after data compression is input to the first A MySQL database, and mark the pending data input to the first MySQL database as unsent; call the created polling script, and use the polling script to poll the first MySQL database according to the preset time; when polling When detecting that the first MySQL database has unsent pending data, and polling detects that the transmission status of the transmission channel is normal, input the pending data marked as unsent into the first MySQL database; polling detects the first MySQL Whether the database has received the pending data marked as unsent; if
  • the above-mentioned classification of the data to be processed by Kafka according to the subject and the producer to obtain the target data includes: obtaining characteristic information of the running state of the task corresponding to the data to be processed; The feature information sorts and classifies the data to be processed to obtain the classification data and mark the classification type of the classification data.
  • the classification type of the classification data includes task start data, task operation data, and task end data; the classification data is classified according to the classification type. Establish the correspondence between the classification data and the subject, and mark the correspondence between the classification data to obtain the target data.
  • the zoning protocol is used to link each block in an orderly manner from back to front through the chain and point to the previous block, and to link the created blockchain system to Kafka, so that Kafka can be applied to the blockchain system in.
  • Kafka includes repositories, and the number of repositories includes multiple.
  • the blockchain system includes the application layer, and the application layer includes the unified management website system.
  • request methods for http request methods which specify resource methods for different operations according to different methods, including GET request method, HEAD request method, POST request method, PUT request method, DELETE request method, CONNECT request method, OPTIONS
  • the request method, the TRACE request method and the PATCH request method uses the PUT request method to facilitate the transmission of the latest data of the specified target data to the repository in the message queue server.
  • Multiple repositories are set up in Kafka to store target data categorically, and are stored in corresponding repositories according to the producers and topics in the target data, so as to facilitate the management and acquisition of target data.
  • the Kafka system in the message queue server is used as a message queue and combined with the blockchain system for distributed data storage, and the data cache is processed concurrently by multiple nodes, which decouples the system and slows down the collection of job record data from multiple big data clusters at the same time Pressure to avoid the congestion of simultaneous collection of job record data of multiple big data clusters, and achieve high fault tolerance, high-speed caching, high efficiency and high throughput processing effects.
  • the unified management website system When the unified management website system receives the read instruction, it outputs the target data in the repository through the data storage layer, and inputs the target data into the cache area of the MySQL database.
  • the Kafka system is monitored through the unified management website system, and the target data is captured and stored in a timely manner; the captured target data is input into the cache area of the MySQL database to facilitate subsequent reading of the target data and slow down the MySQL database
  • the storage pressure is monitored through the unified management website system, and the target data is captured and stored in a timely manner; the captured target data is input into the cache area of the MySQL database to facilitate subsequent reading of the target data and slow down the MySQL database The storage pressure.
  • a preset data consumption frequency is set, and the target data is input into the cache area of the MySQL database according to the preset data consumption frequency.
  • the input of the target data has a certain buffer, thereby reducing the storage pressure of the MySQL data.
  • the above-mentioned unified management website system when the above-mentioned unified management website system receives a read instruction, it outputs the target data in the repository through the data storage layer, and inputs the target data into the cache area of the MySQL database.
  • the unified management website system calls the listener script, through the listener script to detect whether the application layer in the blockchain system has received a read instruction; when the detection result is no, the application layer in the blockchain system is re- Detection; when the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed tag to obtain the marked target data; Convert the marked target data into a JSON object, and parse the JSON object into the first data object; identify whether there is a data object with the same content as the first data object in the second data object of the MySQL database; if the identification result is yes, then Delete the data object with the same content as the second data object from the first data object to obtain the first
  • Kafka monitors whether to receive updated target data to reduce the risk of repeated data capture and storage; object conversion through the target data, so that the target data can be stored in the MySQL database; through the theme and producer information
  • the sub-target data are respectively filled into the multiple buffer areas set in the MySQL database to facilitate the classified management and acquisition of the data.
  • the multi-cluster job management system can improve the management efficiency of the job record data.
  • the method of the present application further includes: sending a startup instruction to the hidden system that has been set up, and the hidden system receives the startup instruction, Start the hidden protocol.
  • the hidden system includes the hidden protocol, and the hidden protocol includes the protocol involving failure, destruction and deletion, human ethics and morality; when the hidden system detects that the input information is contrary to the hidden protocol, the data in the MySQL database is copied and backed up
  • the hidden system enters the authentication state, where the information includes fault instructions, destruction and deletion instructions, and files with Trojan horse programs; when the hidden system entering the authentication state detects that the input access request has management authority, it outputs a password input request ;
  • the hidden system that enters the authentication state detects that the entered password information is correct and that the number of inputs has not reached the limit, it accepts the access request; when the hidden system that enters the authentication state detects that the number of inputs has reached the limit, it does not accept Access requests, and permanently archive the copied and backed-up data.
  • the method of the present application before converting the target data in the cache area into hypertext markup language data, the method of the present application further includes: detecting whether the database transaction in the MySQL database is in an executing state; If yes, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and add the updated data of the target data subsequently input to the cache area of the MySQL database to the initial data, where the Locktable statement includes the WRITE keyword Locktable statement; get the data with preset fields in the target data of the buffer area, and get the field size of the data with preset fields.
  • the MySQL database is optimized to improve database performance in terms of maintaining the integrity of the target data and ensuring the relevance of the target data, so as to release the storage database of the system and slow down the storage of the database Pressure provides space and speed support for the concurrent processing of the system, so as to effectively prevent and deal with the problem of concurrent crashes of the multi-cluster job management system.
  • the method of the present application further includes: when the unified management website system recognizes When the login request entered by the user is correct, the login request is accepted; when the server in the unified management website receives the query request entered by the user, it obtains the characteristic information of the query request; converts the characteristic information into a search statement, and then uses the search statement to check the MySQL database. Filter the data to obtain the data corresponding to the query request; perform statistics and analysis on the data corresponding to the query request, and generate and output visual charts. By outputting corresponding visual charts according to user needs, it is convenient for users to read the job record data, so as to improve the usability of the multi-cluster job management system.
  • the embodiments of this application decouple the system, alleviate the pressure of collecting job record data of multiple big data clusters at the same time, and avoid the congestion of the job record data of multiple big data clusters at the same time.
  • this application can achieve low cost, high efficiency, high accuracy
  • the present application can effectively prevent and deal with the concurrent crash of the multi-cluster job management system.
  • the foregoing describes a method for processing multi-cluster job records in the present application, and the following describes a device that executes the foregoing method for processing multi-cluster job records.
  • FIG. 2 shows a schematic structural diagram of a device 20 for processing multi-cluster job records, which can be applied to an enterprise multi-cluster job management platform to manage and query job operation records generated by multiple big data clusters.
  • the apparatus 20 in the embodiment of the present application can implement the method for processing multi-cluster job records executed in the embodiment corresponding to FIG. 1 or any optional embodiment or optional implementation in the embodiment corresponding to FIG. 1 A step of.
  • the functions implemented by the device 20 can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions, and the modules may be software and/or hardware.
  • the device 20 may include a transceiver module 201, a detection module 202, a calling module 203, a classification module 204, a division module 205, a construction module 206, and a receiving module 207, the transceiver module 201, the detection module 202, the calling module 203, the classification module 204, and the division module 205.
  • the function implementation of the building module 206 and the receiving module 207 may refer to the operation performed in the embodiment corresponding to FIG. 1 or any optional embodiment or optional implementation manner in the embodiment corresponding to FIG. 1, which will not be repeated here. .
  • the detection module 202 can be used to control the transceiving operation of the transceiving module 201
  • the classification module 204 can be used to control the acquisition operation of the detection module 202 and the creation operation of the calling module 203
  • the division module 205 can be used to control the creation operation of the calling module 203 and the creation operation of the classification module 204.
  • the construction module 206 can be used to control the obtaining operation of the division module 205
  • the receiving module 207 can be used to control the trigger operation and input operation of the construction module 206.
  • the transceiver module 201 is used to receive job record data generated by multiple cluster running tasks; the detection module 202 is used to detect the running status of the task. When the running status is detected as a preset trigger point, the The trigger sends the trigger instruction, the trigger receives the trigger instruction, and converts the data format of the job record data received by the transceiver module 201 into JSON format to obtain the data to be processed; the call model 203 is used to call the distribution in the message queue service system Message system Kafka, when Kafka receives a topic creation command, it calls the topic creation script, and creates the topic through the topic creation script; Kafka creates a producer based on the cluster corresponding to the data to be processed, and Kafka creates consumption based on the unified management website system
  • the classification module 204 is used to input the to-be-processed data obtained by the detection module 202 into Kafka, and to classify the data to be processed by Kafka according to the topic created by the call model 203 and the
  • the target data is input to the repository, and the reading instruction is triggered; the receiving module 207 is used to input the target in the repository through the data storage layer output building module 206 when the unified management website system receives the reading instruction triggered by the building module 206 Data, and input the target data into the buffer area of the MySQL database; convert the target data in the buffer area into hypertext markup language data, control the buffer area through the output control function to obtain the hypertext markup language data, and pass the created read
  • the write function inputs the hypertext markup language data into the constructed static hypertext markup language page file.
  • the preset trigger point includes the start or pause or end of the running state of multiple cluster running tasks;
  • the zoning protocol is used to link each block from back to front through the chain and point to the previous block in an orderly manner, and Link the created blockchain system to Kafka so that Kafka can be used in the blockchain system;
  • Kafka includes repositories, and the number of repositories includes multiple.
  • the above-mentioned classification module 204 is also used to: obtain the sequence association degree of the events, obtain the throughput of the event, and identify the entity type of the event, and obtain the association degree between the entity types, where the entity type is used for one
  • the address corresponds to a user; according to the order of relevance, throughput and relevance, the data to be processed is classified into topics according to the preset classification strategy to obtain the first classification data, where the preset classification strategy includes meeting the order of association At least one of the data to be processed with a degree greater than the first preset threshold, throughput less than the second preset threshold, and relevance greater than the third preset threshold is classified into the same topic; marking the first classification data, where the mark
  • the content includes the order correlation degree, throughput, entity type, the correlation degree between entity types and the name of the topic corresponding to the data to be processed; the first classification data of the mark is classified according to the type of producer, and the mark is marked The type of the producer of the first classification data to obtain the target data.
  • the above-mentioned classification module 204 is further configured to: initialize the classified data to be processed, and set the length of the linear hash table according to the classification type of the classified data to be processed; the key to obtaining the classified data to be processed Code value, calculate the word frequency-inverse text frequency index TF-IDF value of the classified data item to be processed, and obtain the target key code value corresponding to the data item whose TF-IDF value is greater than the fourth preset threshold, wherein, to be processed The data includes data items; the remainder obtained by dividing the target key value by a value not greater than the length of the linear hash table is used as the address of the linear hash table, and the target key value is used as the head of the linear hash table, and the value of the linear hash table is used as the head of the linear hash table.
  • the address is used as the number of the linear hash table to obtain the linear hash table; a preset number of strings of the same length are randomly generated, and the linear hash table is counted and analyzed through the preset string function to obtain hash distribution information and average Bucket length information, where the hash distribution information includes the usage rate of the bucket, and the average bucket length information includes the average length of all used buckets; determine whether the hash distribution information meets the first preset condition, and determine the average bucket length information
  • the second preset condition is satisfied, where the first preset condition includes that the ratio of the number of used barrels to the total number of barrels is a first preset range value, and the second preset condition includes the average length of all used barrels
  • the value of is the second preset range value; if the judgment result is all yes, the linear hash table corresponding to the judgment result is the final linear hash table; the target key code value is filled into the final linear hash table, and Output the final linear hash table in the form of a linked list to obtain the
  • the above-mentioned classification module 204 is further used to: perform data compression on the data to be processed; determine whether the transmission status of the transmission channel is normal; if the determination result is yes, input the data to be processed after data compression into Kafka, and input The data to be processed to Kafka is marked as sent; if the judgment result is no, the data to be processed after data compression is input to the first MySQL database, and the data to be processed input to the first MySQL database is marked as unsent; Call the created polling script, and poll the first MySQL database according to the preset time through the polling script; when the polling detects that the first MySQL database has pending data that has not been sent, and the polling detects the transmission When the transmission status of the channel is normal, enter the pending data marked as unsent into the first MySQL database; poll to detect whether the first MySQL database receives pending data marked as unsent; if the detection result is yes, then The unsent mark in the pending data marked as unsent is replaced with the sent mark; if the detection result is no,
  • the above-mentioned receiving module 207 is also used to: uniformly manage the website system to call the listener program script, and use the listener program script to detect whether the application layer in the blockchain system receives the read instruction; when the detection result is no, the district The application layer in the blockchain system performs re-detection; when the detection result is yes, the target data from the repository is captured by the consumer according to the preset crawling quantity, and the captured target data is added to the consumed Label to obtain the marked target data; convert the marked target data into a JSON object, and parse the JSON object into a first data object; identify whether there is a data object with the same content as the first data object in the second data object of the MySQL database ; If the recognition result is yes, delete the data object that has the same content as the second data object in the first data object to obtain the first target data object; obtain the subject and producer marked in the label of the first target data object Information; according to the subject and producer information, the first target data object is filled into the MySQL database cache;
  • the above-mentioned receiving module 207 is also used to: detect whether the database transaction in the MySQL database is in an executing state; if so, obtain the initial data of the target data in the cache area, lock the MySQL database through the Locktable statement, and input the subsequent input to MySQL The updated data of the target data in the cache area of the database is added to the initial data.
  • the Locktable statement includes a Locktable statement with the WRITE keyword; to obtain data with preset fields in the target data of the cache area, and to obtain data with preset fields
  • the above-mentioned classification module 204 is further configured to: obtain characteristic information of the running state of the task corresponding to the data to be processed; sort and classify the data to be processed according to the characteristic information to obtain the classification data and mark the classification type of the classification data,
  • the classification types of the classification data include task start data, task operation data, and task end data; the classification data are respectively established corresponding to the classification data and the theme according to the classification type, and the corresponding relationship of the classification data is marked to obtain the target data.
  • the system is decoupled, reducing the pressure of collecting job record data of multiple big data clusters at the same time, avoiding the congestion of collecting job record data of multiple big data clusters at the same time, and achieving high fault tolerance and high-speed caching.
  • High-efficiency and high-throughput processing effect on the other hand, increase the access speed and operating speed, and reduce the load of the server; combined with the above, this application can achieve low-cost, high-efficiency, high-accuracy and multi-directional processing system concurrency
  • the effect of the crash problem therefore, the present application can effectively prevent and deal with the problem of concurrent crashes of the multi-cluster job management system.
  • the technical features mentioned in any embodiment or implementation of the method for processing multi-cluster job records are also applicable to the above-mentioned processing multi-cluster in this application.
  • the device 20 of the method of job recording the similarities will not be repeated here.
  • the device 20 in the embodiment of the present application is described above from the perspective of modular functional entities.
  • the following describes a computer device from the perspective of hardware, as shown in FIG. 3, which includes: a processor, a memory, a transceiver (or An input and output unit (not identified in FIG. 3) and a computer program stored in the memory and running on the processor.
  • the computer program may be a program corresponding to the method of processing multi-cluster job records in the embodiment corresponding to FIG. 1 or any optional embodiment in the embodiment corresponding to FIG. 1 or the optional implementation manner.
  • the processor executes the computer program to implement the method for processing multi-cluster job records executed by the device 20 in the embodiment corresponding to FIG.
  • the computer program may be a program corresponding to the method in the embodiment corresponding to FIG. 1 or any optional embodiment in the embodiment corresponding to FIG. 1 or the optional implementation manner.
  • the so-called processor can be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc.
  • the processor is the control center of the computer device, and various interfaces and lines are used to connect various parts of the entire computer device.
  • the memory may be used to store the computer program and/or module, and the processor implements the computer by running or executing the computer program and/or module stored in the memory and calling data stored in the memory.
  • the memory may mainly include a storage program area and a storage data area, where the storage program area can store an operating system, an application program required by at least one function (such as obtaining job record data generated by multiple cluster running tasks, etc.), etc.; storage
  • the data area can store the data created according to the use of the mobile phone (for example, divide the target data into blocks according to the producer and theme to obtain multiple blocks, etc.) and so on.
  • the memory can include high-speed random access memory, and can also include non-volatile memory, such as hard disks, memory, plug-in hard disks, smart media cards (SMC), and secure digital (SD) cards.
  • non-volatile memory such as hard disks, memory, plug-in hard disks, smart media cards (SMC), and secure digital (SD) cards.
  • Flash Card at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the transceiver can also be replaced by a receiver and a transmitter, and can be the same or different physical entities. When they are the same physical entity, they can be collectively referred to as transceivers.
  • the transceiver can be an input and output unit.
  • the entity device corresponding to the transceiver module 201 in FIG. 2 may be the transceiver in FIG. 3, and the entity corresponding to the detection module 202, the calling module 203, the classification module 204, the division module 205, the construction module 206, and the receiving module 207 in FIG. 2
  • the device may be the processor in FIG. 3.
  • the memory may be integrated in the processor, or may be provided separately from the processor.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:
  • the trigger command is sent to the created trigger, and the trigger receives the trigger command and records the job
  • the data format is converted to JSON format to obtain the data to be processed.
  • the preset trigger point includes the start, pause, or end operating status of multiple cluster running tasks; call the distributed messaging system Kafka in the message queue service system When the Kafka interface receives the topic creation command, the topic creation script is called, and the topic creation script is used to create the topic; Kafka creates a producer based on the cluster corresponding to the data to be processed, and Kafka creates consumption based on the unified management website system Enter the data to be processed into Kafka, and classify the data to be processed by Kafka according to the subject and producer to obtain the target data; divide the target data into blocks according to the producer and the subject to obtain multiple blocks, according to The created zoning protocol links multiple blocks, and uses the linked multiple blocks and consumers as the data storage layer.
  • the zoning protocol is used to link and point each block in an orderly manner from back to front through the chain
  • the previous block and link the created blockchain system to Kafka, so that Kafka can be used in the blockchain system; build the blockchain system according to the zoning protocol and data storage layer, and use the blockchain system to follow http Input the target data into the repository and trigger the read instruction.
  • Kafka includes the repository, and the number of repositories includes multiple; when the unified management website system receives the read instruction, it outputs the storage through the data storage layer Target data in the library, and enter the target data into the cache area of the MySQL database; convert the target data in the cache area into hypertext markup language data, and write the hypertext markup language data into the constructed static hypertext markup Language page file.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention se rapporte au domaine des mégadonnées. L'invention concerne un procédé et un appareil permettant de traiter un enregistrement de travail à multiples grappes, et un dispositif et un support de stockage. Le procédé consiste à : traiter des données d'enregistrement de travail produites par une pluralité de grappes pour acquérir des données à traiter ; créer un sujet, un producteur et un consommateur au moyen d'un système de messagerie distribuée Kafka dans un système de service de file d'attente de messages ; classifier les données à traiter au moyen de Kafka de façon à acquérir des données cibles, et construire un système de chaîne de blocs selon le producteur, le sujet et les données cibles ; placer les données cibles dans un référentiel au moyen du système de chaîne de blocs ; placer, au moyen d'un système de site Web de gestion unifiée, les données cibles dans le référentiel dans une région de cache d'une base de données MySQL ; et convertir les données cibles dans la région de cache en données de langage de balisage hypertexte, et placer les données de langage de balisage hypertexte dans un fichier de page de langage de balisage hypertexte statique. L'utilisation de la présente solution permet de résoudre le problème de plantage simultané d'un système de gestion de travail à multiples grappes.
PCT/CN2019/117086 2019-09-19 2019-11-11 Procédé et appareil de traitement d'enregistrement de travail à multiples grappes, et dispositif et support de stockage WO2021051531A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910884887.8A CN110795257B (zh) 2019-09-19 2019-09-19 处理多集群作业记录的方法、装置、设备及存储介质
CN201910884887.8 2019-09-19

Publications (1)

Publication Number Publication Date
WO2021051531A1 true WO2021051531A1 (fr) 2021-03-25

Family

ID=69427342

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117086 WO2021051531A1 (fr) 2019-09-19 2019-11-11 Procédé et appareil de traitement d'enregistrement de travail à multiples grappes, et dispositif et support de stockage

Country Status (2)

Country Link
CN (1) CN110795257B (fr)
WO (1) WO2021051531A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065848A (zh) * 2021-04-02 2021-07-02 东云睿连(武汉)计算技术有限公司 一种支持多类集群后端的深度学习调度系统及调度方法
CN113194070A (zh) * 2021-03-31 2021-07-30 新华三大数据技术有限公司 Kafka集群多类型权限管理方法、装置及存储介质
CN113315750A (zh) * 2021-04-15 2021-08-27 新华三大数据技术有限公司 一种Kafka消息发布方法、装置及存储介质
CN113722198A (zh) * 2021-09-02 2021-11-30 中国建设银行股份有限公司 脚本作业提交控制方法及装置、存储介质及电子设备
CN113742087A (zh) * 2021-09-22 2021-12-03 深圳市玄羽科技有限公司 一种工业互联网大数据服务器的保护方法及系统
CN114401239A (zh) * 2021-12-20 2022-04-26 中国平安财产保险股份有限公司 元数据传输方法、装置、计算机设备和存储介质
CN116049190A (zh) * 2023-01-18 2023-05-02 中电金信软件有限公司 基于Kafka的数据处理方法、装置、计算机设备和存储介质
WO2024037629A1 (fr) * 2022-08-19 2024-02-22 顺丰科技有限公司 Procédé et appareil d'intégration de données pour chaîne de blocs, et dispositif informatique et support de stockage

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111555957B (zh) * 2020-03-26 2022-08-19 孩子王儿童用品股份有限公司 一种基于Kafka的同步消息服务系统及实现方法
CN112000515A (zh) * 2020-08-07 2020-11-27 北京浪潮数据技术有限公司 一种redis集群中的实例数据恢复方法及组件
CN112100265A (zh) * 2020-09-17 2020-12-18 博雅正链(北京)科技有限公司 面向大数据架构与区块链的多源数据处理方法及装置
CN112131854A (zh) * 2020-09-24 2020-12-25 北京开科唯识技术股份有限公司 一种数据处理方法、装置、电子设备及存储介质
CN112272220B (zh) * 2020-10-16 2022-05-13 苏州浪潮智能科技有限公司 一种集群软件启动控制方法、系统、终端及存储介质
CN112751709B (zh) * 2020-12-29 2023-01-10 北京浪潮数据技术有限公司 一种存储集群的管理方法、装置和系统
CN113269590B (zh) * 2021-05-31 2023-06-06 五八到家有限公司 一种用于资源补贴的数据处理方法、装置和系统
CN115473858B (zh) * 2022-09-05 2024-03-01 上海哔哩哔哩科技有限公司 数据传输方法、流式数据传输系统、计算机设备及存储介质
CN117033449B (zh) * 2023-10-09 2023-12-15 北京中科闻歌科技股份有限公司 基于kafka流的数据处理方法、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685173A (zh) * 2011-04-14 2012-09-19 天脉聚源(北京)传媒科技有限公司 一种异步任务分发系统及调度分发计算单元
CN106034160A (zh) * 2015-03-19 2016-10-19 阿里巴巴集团控股有限公司 分布式计算系统和方法
US20180181377A1 (en) * 2016-10-26 2018-06-28 Yoongu Kim Systems and methods for discovering automatable tasks
CN109800080A (zh) * 2018-12-14 2019-05-24 深圳壹账通智能科技有限公司 一种基于Quartz框架的任务调度方法、系统及终端设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582470B (zh) * 2017-09-28 2022-11-22 北京国双科技有限公司 一种数据处理方法及数据处理装置
CN109451072A (zh) * 2018-12-29 2019-03-08 广东电网有限责任公司 一种基于Kafka的消息缓存系统和方法
CN110209507A (zh) * 2019-05-16 2019-09-06 厦门市美亚柏科信息股份有限公司 基于消息队列的数据处理方法、装置、系统及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685173A (zh) * 2011-04-14 2012-09-19 天脉聚源(北京)传媒科技有限公司 一种异步任务分发系统及调度分发计算单元
CN106034160A (zh) * 2015-03-19 2016-10-19 阿里巴巴集团控股有限公司 分布式计算系统和方法
US20180181377A1 (en) * 2016-10-26 2018-06-28 Yoongu Kim Systems and methods for discovering automatable tasks
CN109800080A (zh) * 2018-12-14 2019-05-24 深圳壹账通智能科技有限公司 一种基于Quartz框架的任务调度方法、系统及终端设备

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113194070A (zh) * 2021-03-31 2021-07-30 新华三大数据技术有限公司 Kafka集群多类型权限管理方法、装置及存储介质
CN113194070B (zh) * 2021-03-31 2022-05-27 新华三大数据技术有限公司 Kafka集群多类型权限管理方法、装置及存储介质
CN113065848A (zh) * 2021-04-02 2021-07-02 东云睿连(武汉)计算技术有限公司 一种支持多类集群后端的深度学习调度系统及调度方法
CN113315750A (zh) * 2021-04-15 2021-08-27 新华三大数据技术有限公司 一种Kafka消息发布方法、装置及存储介质
CN113315750B (zh) * 2021-04-15 2022-05-27 新华三大数据技术有限公司 一种Kafka消息发布方法、装置及存储介质
CN113722198A (zh) * 2021-09-02 2021-11-30 中国建设银行股份有限公司 脚本作业提交控制方法及装置、存储介质及电子设备
CN113742087A (zh) * 2021-09-22 2021-12-03 深圳市玄羽科技有限公司 一种工业互联网大数据服务器的保护方法及系统
CN113742087B (zh) * 2021-09-22 2023-12-12 深圳市玄羽科技有限公司 一种工业互联网大数据服务器的保护方法及系统
CN114401239A (zh) * 2021-12-20 2022-04-26 中国平安财产保险股份有限公司 元数据传输方法、装置、计算机设备和存储介质
CN114401239B (zh) * 2021-12-20 2023-11-14 中国平安财产保险股份有限公司 元数据传输方法、装置、计算机设备和存储介质
WO2024037629A1 (fr) * 2022-08-19 2024-02-22 顺丰科技有限公司 Procédé et appareil d'intégration de données pour chaîne de blocs, et dispositif informatique et support de stockage
CN116049190A (zh) * 2023-01-18 2023-05-02 中电金信软件有限公司 基于Kafka的数据处理方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN110795257B (zh) 2023-06-16
CN110795257A (zh) 2020-02-14

Similar Documents

Publication Publication Date Title
WO2021051531A1 (fr) Procédé et appareil de traitement d'enregistrement de travail à multiples grappes, et dispositif et support de stockage
US20230252028A1 (en) Data serialization in a distributed event processing system
US9787706B1 (en) Modular architecture for analysis database
US20190266195A1 (en) Filtering queried data on data stores
CN107145489B (zh) 一种基于云平台的客户端应用的信息统计方法和装置
KR101219856B1 (ko) 데이터 프로세싱을 자동화하기 위한 방법 및 시스템
US7681087B2 (en) Apparatus and method for persistent report serving
US10860604B1 (en) Scalable tracking for database udpates according to a secondary index
US11429566B2 (en) Approach for a controllable trade-off between cost and availability of indexed data in a cloud log aggregation solution such as splunk or sumo
WO2021051627A1 (fr) Procédé, appareil et dispositif d'importation de lot fondés sur une base de données, et support de stockage
KR20090035545A (ko) 초대형 데이터베이스 상의 데이터 처리
US10262024B1 (en) Providing consistent access to data objects transcending storage limitations in a non-relational data store
US11892976B2 (en) Enhanced search performance using data model summaries stored in a remote data store
US20230007014A1 (en) Detection of replacement/copy-paste attacks through monitoring and classifying api function invocations
US20210224102A1 (en) Characterizing operation of software applications having large number of components
CN113297057A (zh) 内存分析方法、装置及系统
US10248508B1 (en) Distributed data validation service
CN112612832A (zh) 节点分析方法、装置、设备及存储介质
US11841827B2 (en) Facilitating generation of data model summaries
CN114461762A (zh) 档案变更识别方法、装置、设备及存储介质
US11379268B1 (en) Affinity-based routing and execution for workflow service
US8214846B1 (en) Method and system for threshold management
US20240061494A1 (en) Monitoring energy consumption associated with users of a distributed computing system using tracing
US10896115B2 (en) Investigation of performance bottlenecks occurring during execution of software applications
TWM540309U (zh) 雲端檔案搜尋系統

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19946071

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19946071

Country of ref document: EP

Kind code of ref document: A1