CN112532450B - Dynamic updating method and system for data stream distribution process configuration - Google Patents

Dynamic updating method and system for data stream distribution process configuration Download PDF

Info

Publication number
CN112532450B
CN112532450B CN202011374412.3A CN202011374412A CN112532450B CN 112532450 B CN112532450 B CN 112532450B CN 202011374412 A CN202011374412 A CN 202011374412A CN 112532450 B CN112532450 B CN 112532450B
Authority
CN
China
Prior art keywords
data stream
distribution process
stream distribution
data
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011374412.3A
Other languages
Chinese (zh)
Other versions
CN112532450A (en
Inventor
张洋
吴同仁
肖伟
董俊庆
杨元山
付永庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Meiluo Technology Co Ltd
Original Assignee
Zhongke Meiluo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Meiluo Technology Co Ltd filed Critical Zhongke Meiluo Technology Co Ltd
Priority to CN202011374412.3A priority Critical patent/CN112532450B/en
Publication of CN112532450A publication Critical patent/CN112532450A/en
Application granted granted Critical
Publication of CN112532450B publication Critical patent/CN112532450B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0813Configuration setting characterised by the conditions triggering a change of settings
    • H04L41/082Configuration setting characterised by the conditions triggering a change of settings the condition being updates or upgrades of network functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a dynamic updating method and a dynamic updating system for data stream distribution process configuration, which are applied to a scheduling process, wherein the method comprises the following steps: a: the method comprises the steps of obtaining first configuration data of a target data stream distribution process at the current moment, and obtaining second configuration data of the target data stream distribution process at the current moment, wherein the target data stream distribution process comprises the following steps: kafka stream; b: comparing whether the first configuration data is the same as the second configuration data; c: if not, generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache; taking a backup data stream distribution process as a target data stream distribution process so as to enable the target data stream distribution process to distribute data streams to a data processing process, wherein the data streams comprise: and (4) log data flow. By applying the embodiment of the invention, the dynamic updating efficiency of the configuration data can be improved.

Description

Dynamic updating method and system for data stream distribution process configuration
Technical Field
The invention relates to the technical field of data processing, in particular to a dynamic updating method and a dynamic updating system for data stream distribution process configuration.
Background
In the data stream processing process, the situation that the configuration of the process related to the data stream needs to be updated is often encountered, for example, the invention patent application with the application number of 201410314207.6 in the prior art discloses a dynamic loading method of a logic file of a broadband access network, the dynamic loading of the FPGA logic file is realized through a TCP protocol, the system does not need to be restarted after the loading is completed, and in the single board operation process, a new version of the FPGA logic file can be loaded to replace the currently operated FPGA logic file at any time, so that the transmission efficiency and the reliability are ensured, and the flexibility of the system is improved; in the actual operation process, only the connection IP address of the client needs to be set as the IP address of the single board, so that the application range is expanded, and the method is simple and easy to implement and meets the requirement of the development of the current embedded system. However, since the technical solution loads the logic file, not the configuration file for the data stream processing process, the data update can be performed in a direct overlay manner.
In order to solve the above problem, the invention patent application with application number 201810541847.9 in the prior art discloses a method and a device for testing the performance of a server memory, wherein the method includes: the method comprises the following steps: under a pre-deployed stream test environment, exporting a BIOS default configuration file on a server to be tested; step two: selecting a configuration change parameter from a preset configuration change parameter list, and modifying a default configuration file; step three: the modified configuration file is re-introduced into a server to be tested and automatically restarted; step four: running a stream test, and recording a test result; and repeating the second step to the fourth step until all the configuration change parameters in the configuration change parameter list are tested. The method and the device for testing the memory performance of the server save the testing time and improve the testing efficiency and accuracy.
However, the inventor found that, although the configuration file is updated in the prior art, when the configuration file is updated, the configuration file needs to be loaded first, and then the corresponding server needs to be restarted. However, restarting the server takes longer, resulting in inefficient dynamic update of the configuration information.
Disclosure of Invention
The technical problem to be solved by the present invention is how to provide a method and a system for dynamically updating configuration of a data stream distribution process, so as to solve the technical problem in the prior art that the efficiency of dynamically updating configuration of a data stream distribution process is low.
The invention solves the technical problems through the following technical means:
the embodiment of the invention provides a dynamic updating method for data stream distribution process configuration, which is applied to a scheduling process and comprises the following steps:
a: the method comprises the steps of obtaining first configuration data of a target data stream distribution process at the current moment, and obtaining second configuration data of the target data stream distribution process at the current moment, wherein the target data stream distribution process comprises the following steps: kafka stream;
b: comparing whether the first configuration data is the same as the second configuration data;
c: if not, generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache; taking the backup data stream distribution process as a target data stream distribution process so as to enable the target data stream distribution process to distribute the data stream to the data processing process, wherein the data stream comprises: and (4) log data flow.
Optionally, the step of obtaining second configuration data for the target data stream distribution process at the current time in step a includes:
intercepting second configuration data input by the user using a dynamic proxy method.
Optionally, the step B includes:
generating a first key value pair by taking the first configuration data as a value and taking the identification information of the corresponding data processing process as a key word; generating a second key value pair by taking the second configuration data as a value and the identification information of the corresponding second data processing process as a key word;
comparing whether the first key-value pair and the second key-value pair are completely consistent;
the generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache, and distributing the data stream to the data processing process by using the target data stream distribution process includes:
generating a third key value pair by taking the backup data stream distribution process as a value and the identification information of the target data stream distribution process as a key word, generating a fourth key value pair by taking the target data stream distribution process as a value and the identification information of the target data stream distribution process as a key word; and replacing the fourth key value with the third key value according to the identification information of the target data stream distribution process so as to distribute the data stream to the data processing process.
Optionally, the step of generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache in the step C includes:
and each target data stream distribution process deployed on the same server competes for the zookeeper lock from the first zookeeper, and if the competition is successful, a backup data stream distribution process corresponding to the target data stream distribution process is generated in the cache.
Optionally, the step of generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache in step C includes:
all target data stream distribution processes deployed on each server request the second zookeeper so that the second zookeeper generates a coordination instruction according to the residual computing power of each server, wherein the coordination instruction is used for distributing the number of tasks for generating the backup data stream distribution process for each server;
after receiving the coordination instruction sent by each server by the second zookeeper, generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache according to the coordination instruction.
Optionally, while executing the step of taking the backup data stream distribution process as the target data stream distribution process in step C, the method further includes:
taking the target data stream distribution process as a backup data stream distribution process;
acquiring third configuration data of the target data stream distribution process at the current moment, and comparing whether the third configuration data are the same as the first configuration data of the backup data stream distribution process or not;
and if so, using the backup data stream distribution process as a target data stream distribution process.
Optionally, the method further includes:
acquiring a corresponding moment when a target data stream distribution process starts to distribute data streams, and taking a first data set to be distributed in a first time length range before the moment and a second data set to be distributed in a second time length range after the moment as recurrence data;
establishing a temporary data flow distribution process so that the temporary data flow distribution process sends an inquiry instruction whether the data processing process is processed or not to each piece of data in the recurrent data, and the data processing process returns an inquiry result to the temporary data flow distribution process according to the self processing data flow under the condition that the data processing process receives the inquiry instruction; in the case that the inquiry result is no, the temporary data flow distribution process recurs the piece of data.
Optionally, when receiving the piece of data that has relapsed in the temporary data stream distribution process, the data processing process determines whether a timestamp corresponding to the piece of data is located before a time corresponding to when the target data stream distribution process starts to distribute the data stream, or is located after the time corresponding to when the target data stream distribution process starts to distribute the data stream;
under the condition that the timestamp corresponding to the piece of data is positioned before the corresponding time when the target data stream distribution process starts to distribute the data streams, processing the piece of data by using the backup data stream corresponding to the first configuration data;
and when the time stamp corresponding to the piece of data is located after the time corresponding to the time when the target data stream distribution process starts to distribute the data stream, the piece of data is sent to the target data stream distribution process.
Optionally, the method for determining the first duration and/or the second duration includes:
acquiring the proportion p of abnormal data streams contained in the data streams within a first preset duration range;
acquiring the weight w of the data stream;
calculating the first duration and/or the second duration using the formula, T-T1 + (w + p) b, wherein,
t is a first time length and/or a second time length; t1 is the time consumed for switching between the target data stream distribution process and the backup data stream distribution process, and
Figure BDA0002806853040000051
t2 is the maximum value of the historical switching time between the target data stream distribution process and the backup data stream distribution process; a is a second preset duration; w is a data stream weight, an
Figure BDA0002806853040000052
c1 is the total duration of the processing result of the data stream distribution process corresponding to the data stream viewed by the operator; c2 is the accumulated total duration of the processing results of the data stream distribution process corresponding to all the data streams viewed by the operator; p is the proportion of abnormal data flow; b is a third preset duration.
The embodiment of the invention also provides a system for dynamically updating the configuration of the data stream distribution process, which comprises the following steps: a scheduling process and a data processing process, wherein,
the scheduling process is to:
a: the method comprises the steps of obtaining first configuration data of a target data stream distribution process at the current moment, and obtaining second configuration data of the target data stream distribution process at the current moment, wherein the target data stream distribution process comprises the following steps: kafka stream;
b: comparing whether the first configuration data is the same as the second configuration data;
c: if not, generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache; taking the backup data stream distribution process as a target data stream distribution process so as to enable the target data stream distribution process to distribute the data stream to the data processing process;
and the data processing process is used for receiving the data stream sent by the target data stream distribution process.
The invention has the advantages that:
by applying the embodiment of the invention, when the configuration is changed, the backup data stream distribution process is generated according to the second configuration data at the current moment, and the backup data stream distribution process is used as the new target data stream distribution process, so that the data stream distribution process is prevented from being restarted, the dynamic update of the configuration data of the data stream distribution process is realized while the restart time is saved, and the dynamic update efficiency of the configuration data of the data stream distribution process is improved.
Drawings
Fig. 1 is a schematic flowchart of a dynamic update method for data stream distribution process configuration according to an embodiment of the present invention;
fig. 2 is a schematic view of an application architecture of a dynamic update method for data stream distribution process configuration according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
It should be noted that the embodiment of fig. 1 of the present invention is preferably applied to a scheduling process.
Fig. 1 is a schematic flowchart of a method for dynamically updating a data stream distribution process configuration according to an embodiment of the present invention, where as shown in fig. 1, the method includes:
s101: the method comprises the steps of obtaining first configuration data of a target data stream distribution process at the current moment, and obtaining second configuration data of the target data stream distribution process at the current moment, wherein the target data stream distribution process comprises the following steps: kafka stream.
The data Stream distribution process includes, but is not limited to, kafka Stream, and may also include Storm Stream, Spark Stream, Flink Stream, and the like, which require restarting when updating the first configuration data. The distribution process of kafka stream as a target data stream is explained in the embodiment of the present invention.
For example, fig. 2 is a schematic view of an application architecture of a dynamic update method for data stream distribution process configuration according to an embodiment of the present invention, as shown in fig. 2, in the field of bus management application, a user uses an APP to apply, schedule, approve, return, monitor, and maintain a bus; after the vehicle use application is approved, a user uses a corresponding vehicle, a vehicle-mounted intelligent terminal is pre-installed on the vehicle, and the vehicle-mounted intelligent terminal monitors information such as the residual electric quantity, the battery health state, the vehicle speed, the vehicle coordinate and the like of the vehicle in real time; after collecting the information of the vehicle, the vehicle-mounted intelligent terminal uploads the information as a vehicle log to a log database through a gateway 21 by means of a 4G, 5G, WiFi network and the like. A large number of logs are continuously transferred to the log database 23 to form a log stream. The log monitoring function 25 of the bus management service platform screens out abnormal logs from the log stream by using a target data stream distribution process, and then performs statistics and summarization.
In order to screen different types of abnormal logs or different dimensions of the same abnormal log, an operation and maintenance person 29 inputs corresponding operation type data 27, a scheduling process generates corresponding kafka stream-1 according to the operation type data-1 and identification information of an APP used by a user, and the kafka stream-1 is used as a target data stream distribution process to process a log data stream. In practical applications, the operation and maintenance personnel 29 may send operation type data for different APPs or different types of exception logs to a scheduling process in the server 25. The operation type data entered by the operation and maintenance personnel form a message queue 27. The scheduling process monitors the message queue 27 to determine whether new operation type data for kafka stream-1 is received. The operation type data includes, but is not limited to, updating of the alarm rule, modifying of the alarm rule, deleting of the alarm rule, updating of the clustering rule, modifying of the clustering rule, deleting of the clustering rule, and the like.
The scheduling process checks whether new first configuration data are received in the message queue 27 in real time, and acquires the operation type data-1 of the kafka stream-1 at the time of 00:00: 00; furthermore, the scheduling process inquires whether there is new second configuration data for kafka stream-1, i.e., operation type data-2, in the message queue 27 at time 00:00:00, and if so, executes step S102; if not, the scheduling process continues to acquire the operation type data-1 of kafka stream-1 at the next time, e.g., 00:00: 01.
Further, in order to reduce the load of the scheduling process, the scheduling process monitors whether the message queue 27 receives the operation type data-2 by using a dynamic proxy method, and if so, intercepts and reads the operation type data to obtain the corresponding kafka stream-1. Then, the scheduling process acquires the operation type data-1 of kafka stream-1, and then executes the step S102. If not, the scheduling process continues to acquire whether there is new second configuration data for kafka stream-1 in the message queue 27 at the next time, e.g., 00:00: 01. In practical application, the existing agent process can be used for realizing real-time interception of the operation type data-2, so that dynamic monitoring is realized.
S102: comparing whether the first configuration data is the same as the second configuration data.
The scheduling process compares whether or not the operation type data-1 of the kafka stream-1 is the same as the operation type data-2 for the kafka stream-1, and if not, indicates that the operation type data of the kafka stream-1 has been updated, it is necessary to update the operation type data-1 to the operation type data-2, and therefore step S103 should be performed
If the two are the same, the operation type data-2 for the kafka stream-1, which is configured by the user, is described as being repeatedly submitted, and the step S101 may be returned to.
S103: generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache; taking a backup data stream distribution process as a target data stream distribution process so as to enable the target data stream distribution process to distribute data streams to a data processing process, wherein the data streams comprise: and (4) log data flow.
And the scheduling process acquires a certain capacity of cache, generates a corresponding backup data stream distribution process kafka stream-2 in the cache according to the operation type data-2 and the identification information of the APP used by the user, and takes the kafka stream-2 as a new target data stream distribution process. The data stream is then sorted using kafka stream-2 and distributed to data processing processes.
It should also be noted that, if there are multiple data stream distribution processes: if the kafka stream-1 handles a log data stream and the kafka stream-10 handles an interface data stream, identification information of the object type to be handled should be also taken into consideration when generating the kafka stream-1 to distinguish the kafka stream-1 from the kafka stream-10.
By applying the embodiment of the invention, when the configuration is changed, the backup data stream distribution process is generated according to the second configuration data at the current moment, and the backup data stream distribution process is used as the new target data stream distribution process, so that the data stream distribution process is prevented from being restarted, the dynamic update of the configuration data of the data stream distribution process is realized while the restart time is saved, and the dynamic update efficiency of the configuration data is improved.
In practical application, the first configuration data can be used as a value, and the identification information of the corresponding data processing process is used as a key word to generate a first key value pair; taking the second configuration data as a value, and taking identification information of a corresponding second data processing process as a keyword to generate a second key value pair; the first key-value pair and the second key-value pair are stored in a cache. And finding out key value pairs identical to the key words, namely a first key value pair and a second key value pair according to the key words, and then comparing whether the first key value pair and the second key value pair are completely consistent or not in a cache.
Correspondingly, the third key value pair can be generated by taking the backup data stream distribution process kafka stream-2 as a value and the identification information of the target data stream distribution process as a key, the fourth key value pair can be generated by taking the target data stream distribution process as a value and the identification information of the target data stream distribution process as a key; replacing the fourth key-value pair with the third key-value pair according to the identification information of the target data stream distribution process to distribute the data stream to the data processing process using kafka stream-2.
Moreover, if the data stream has a high requirement on processing real-time performance, the data stream cannot be processed in the time period of restarting the server in the prior art, and therefore, the technical problem that the processing of the data stream is interrupted exists in the prior art. By applying the embodiment of the invention, the two data stream distribution processes can realize seamless connection, thereby avoiding the technical problem of discontinuous processing. That is, in embodiment 1 of the present invention, while implementing uninterrupted processing, dynamic update of configuration data may be implemented in a manner that does not restart a server or kafka stream.
Example 2
In order to avoid the server overload caused by the simultaneous dynamic update of multiple kafka streams, embodiment 2 of the present invention implements, on the basis of embodiment 1, the generation of the backup data stream distribution process corresponding to the target data stream distribution process in the cache by using the following method:
the target data stream distribution processes kafka stream-1, kafka stream-2, kafka stream-3, and kafka stream-4 are all disposed on the same server a. And each kafka stream corresponds to a data processing process. At this time, kafka stream-1, kafka stream-2, kafka stream-3, and kafka stream-4 each need to perform a dynamic update operation, the kafka stream-1, kafka stream-2, kafka stream-3, and kafka stream-4 compete for the zookeeper lock first, for example, if the kafka stream-1 competes successfully first, the kafka stream-1 generates a backup data stream distribution process corresponding to the target data stream distribution process in the cache; after the kafka stream-1 update is completed, the kafka stream-2, kafka stream-3, and kafka stream-4 compete again for zookeeper locks with the first zookeeper, and so on until all the target data stream distribution processes complete the dynamic update.
It should be noted that the operation principle of the zookeeper lock is the prior art, and the embodiment of the present invention only uses registering with the zookeeper lock and obtaining the registration result.
By applying the embodiment 2 of the invention, only one target data stream distribution process can be updated at one time by using zookeeper lock control, so that simultaneous updating of a plurality of target data stream distribution processes is avoided, and further the load of a server is reduced.
Example 3
In order to balance the load among the servers, in embodiment 3 of the present invention, on the basis of embodiment 1, the following method is used to implement generation of a backup data stream distribution process corresponding to a target data stream distribution process in a cache:
all target data stream distribution processes deployed on each server request the second zookeeper so that the second zookeeper generates a coordination instruction according to the residual computing power of each server, wherein the coordination instruction is used for distributing the number of tasks for generating the backup data stream distribution process for each server; after receiving the coordination instruction sent by each server by the second zookeeper, generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache according to the coordination instruction.
Illustratively, the target data stream distribution processes kafka stream-1 and kafka stream-2 are deployed on servers-1; kafka stream-3, kafka stream-4 and kafka stream-5 are disposed on servers-2; kafka stream-6, kafka stream-7 and kafka stream-8 were disposed on servers-3. servers-1, servers-2 and servers-3 need to register service nodes with the second zookeeper in advance. At this time, if the kafka stream-1 and the kafka stream-3 need to be dynamically updated, the second zookeeper inquires the residual computing power resource of each server, and under the condition that the residual computing power of the servers-1 and the servers-2 can meet the requirement, the kafka stream-1 and the kafka stream-3 are randomly distributed to the servers to dynamically update the target data stream distribution process.
If there is only one server whose power satisfies the requirement, all the production processes of kafka stream-1 and kafka stream-3 are allocated to the server.
Example 4
After the operation of embodiment 1 is performed, the target data stream distribution process is kafka stream-2; the backup data stream distribution process is kafka stream-1. In general, the backup data stream distribution process may be optionally shut down. However, in order to avoid a large amount of effort in regenerating the backup data stream distribution process for the target data stream distribution process, the response speed is increased. The embodiment 4 of the invention adds the following steps on the basis of the embodiment 1:
the target data stream distribution process kafka stream-1 is used as a backup data stream distribution process for standby.
Then, the third configuration data of the kafka stream-2 corresponding to the target data stream distribution process at the current time 00:00:05 is acquired by using the dynamic proxy method in the scheduling process, and whether the third configuration data is identical to the first configuration data of the kafka stream-1 is compared.
Then, in the case where the third configuration data is different from the first configuration data, dynamic update is performed by the method of embodiment 1.
In the case where the third configuration data is the same as the first configuration data, the kafka stream-1 is taken as a target data stream distribution process; kafka stream-2 is made to be the backup data stream distribution process.
In practical applications, the backup data stream distribution process may include several kafka streams generated at historically different times. Correspondingly, when the comparison is carried out, the comparison is carried out with a plurality of backup data stream distribution processes which are generated historically one by one.
By applying the embodiment of the invention, the backup data stream distribution process is reserved, the difference and the sameness between the third configuration data, which are acquired by the scheduling process and aim at the target data stream distribution process, and the first configuration data of the backup data stream scheduling process are compared, and the backup data stream distribution process is directly used under the condition that the third configuration data and the first configuration data are the same, so that the data stream distribution process generated by the history is prevented from being generated again, the server load is reduced, and the dynamic updating efficiency is improved.
It should be noted that the principle process of implementing load balancing by zookeeper is as follows: first, a servers node is established for each server in zookeeper, and a monitor is established that monitors the status of the servers child nodes. And the monitor is also used for discovering a newly added server in the system at any time. When each server is started, a child node worker server (which can be named by a server address) is established under a server node, and relevant information of the server is stored under the corresponding child node. In this way, the zookeeper server can obtain the server list and related information in the current cluster. And then according to a self-defined random load balancing algorithm, when the requests of kafka stream-1 and kafka stream-3 come, obtaining a server list with the residual computing power meeting the requirements in the current cluster server list from the zookeeper server, and randomly selecting one of the servers to process the request.
Example 5
The embodiment 5 of the invention adds the following steps on the basis of the embodiment 1:
s104 (not shown in the figure): and acquiring the corresponding time when the target data stream distribution process starts to distribute the data streams, and taking a first data set to be distributed in a first time length range before the time and a second data set to be distributed in a second time length range after the time as recurrence data.
After the completion of the dynamic update, the scheduling process acquires the time when the target data stream distribution process kafka stream-2 starts to receive and distribute the log data streams, for example, 00:00:30, and the scheduling process reads the log data having the time stamp between 00:00:25 and 00:00:37 from the log data storage database and sets the set of the log data as recurrence data.
S105 (not shown): establishing a temporary data flow distribution process so that the temporary data flow distribution process sends an inquiry instruction whether the data processing process is processed or not to each piece of data in the recurrence data, and the data processing process returns an inquiry result to the temporary data flow distribution process according to the self processing data flow under the condition of receiving the inquiry instruction; in the case that the inquiry result is no, the temporary data flow distribution process recurs the piece of data.
The scheduling process establishes a temporary data stream distribution process in the cache, the data stream distribution process sends each piece of data in the recurrence data to the data processing process according to the sequence, the data processing process judges whether the data is processed or not after receiving the recurrence log data, and if yes, the scheduling process sends ACK (acknowledgement character) to the temporary data stream distribution process; and if not, processing the log data, and continuously receiving the log data sent by the temporary data stream distribution process until all the log data in the recurrence data are sent.
Further, in order to improve the data processing efficiency of the data processing process, after receiving the log data sent by the temporary data stream distribution process, the data processing process obtains a timestamp of the log data, and determines whether the timestamp is located before a time corresponding to when the target data stream distribution process starts to distribute the data stream or is located after the time corresponding to when the target data stream distribution process starts to distribute the data stream;
under the condition that the timestamp corresponding to the piece of data is located before the time corresponding to the time when the target data stream distribution process starts to distribute the data stream, the data processing process does not receive the log, and the backup data stream corresponding to the first configuration data is used for processing the piece of data;
and in the case that the timestamp corresponding to the piece of data is located after the time corresponding to the time when the target data stream distribution process starts to distribute the data stream, the piece of data is sent to the target data stream distribution process after the data processing process receives and processes the piece of log data.
By applying the embodiment 5, even if the problem of processing omission occurs in the case of high concurrency, for example, greater than 50tps, the log data corresponding to the time when the target data stream distribution process starts to distribute the data stream, that is, in the first duration and the second duration before and after the switching time, is sent again in the embodiment of the present invention, so that the problem of processing omission is completely avoided.
Example 6
Further, in order to avoid processing omission of part of data in the log stream data during the process of implementing load balancing, the embodiment of the present invention determines the first duration and/or the second duration by using the following method:
acquiring the proportion p of abnormal data streams contained in the data streams within a first preset duration range; the first preset duration may be equal to or a multiple of the first duration, and generally has a value range of 100-.
Acquiring the weight w of the data stream;
calculating the first duration and/or the second duration using the formula, T-T1 + (w + p) b, wherein,
t is a first time length and/or a second time length; t1 is consumed time for switching between the target data stream distribution process and the backup data stream distribution process, that is, consumed time for executing the complete process shown in embodiment 1 of the present invention:
and is
Figure BDA0002806853040000161
t2 is the maximum value of the historical switching time between the target data stream distribution process and the backup data stream distribution process; a is a second preset time length which is greater than t1 and is generally 20-80 milliseconds; w is a data stream weight, an
Figure BDA0002806853040000162
c1 is the total duration of the processing result of the data stream distribution process corresponding to the data stream viewed by the operator; c2 is the accumulated total duration of the processing results of the data stream distribution process corresponding to all the data streams viewed by the operator; p is the data screened by the data processing process in the data stream, such as the proportion of an abnormal log; b is a third preset time duration, which is generally 20-100 milliseconds.
It is emphasized that the first time period may be the same as the second time period, or may be different.
By applying the embodiment of the invention, the first time length and the second time length can be changed in real time according to the concurrency quantity of the log data, the proportion of the abnormal data stream in the log data stream and the importance of the log data, so that the user is prevented from manually setting related parameters, and the user experience is improved.
Example 7
Based on the method in any one of embodiments 1 to 6, embodiment 7 of the present invention provides a system for dynamically updating a data stream distribution process configuration, where the system includes: a scheduling process and a data processing process, wherein,
the scheduling process is to:
a: the method comprises the steps of obtaining first configuration data of a target data stream distribution process at the current moment, and obtaining second configuration data of the target data stream distribution process at the current moment, wherein the target data stream distribution process comprises the following steps: kafka stream;
b: comparing whether the first configuration data is the same as the second configuration data;
c: if not, generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache; taking the backup data stream distribution process as a target data stream distribution process so that the target data stream distribution process distributes the data stream to the data processing process;
and the data processing process is used for receiving the data stream sent by the target data stream distribution process.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A dynamic update method for data stream distribution process configuration is applied to a scheduling process, and the method comprises the following steps:
a: the method comprises the steps of obtaining first configuration data of a target data stream distribution process at the current moment, and obtaining second configuration data of the target data stream distribution process at the current moment, wherein the target data stream distribution process comprises the following steps: kafka stream, and the first configuration data is first operation type data, and the second configuration data is second operation type data;
b: comparing whether the first configuration data is the same as the second configuration data;
c: if not, generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache; taking the backup data stream distribution process as a target data stream distribution process so as to enable the target data stream distribution process to distribute the data stream to the data processing process, wherein the data stream comprises: a log data stream.
2. The method according to claim 1, wherein the step of obtaining the second configuration data for the target data stream distribution process at the current time in step a includes:
intercepting second configuration data input by the user using a dynamic proxy method.
3. The method for dynamically updating the configuration of the data stream distribution process according to claim 1, wherein the step B comprises:
generating a first key value pair by taking the first configuration data as a value and taking the identification information of the corresponding data processing process as a key word; generating a second key value pair by taking the second configuration data as a value and the identification information of the corresponding second data processing process as a key word;
comparing whether the first key-value pair and the second key-value pair are completely consistent;
the generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache, and distributing the data stream to the data processing process by using the target data stream distribution process includes:
taking the backup data stream distribution process as a value, taking the identification information of the target data stream distribution process as a keyword, and generating a third key value pair; generating a fourth key value pair by taking the target data stream distribution process as a value and taking the identification information of the target data stream distribution process as a keyword; and replacing the fourth key value with the third key value according to the identification information of the target data stream distribution process so as to distribute the data stream to the data processing process.
4. The method according to claim 1, wherein the step of generating a backup data stream distribution process in the cache corresponding to the target data stream distribution process in step C includes:
and each target data stream distribution process deployed on the same server competes for the zookeeper lock from the first zookeeper, and if the competition is successful, a backup data stream distribution process corresponding to the target data stream distribution process is generated in the cache.
5. The method according to claim 1, wherein the step of generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache in step C includes:
all target data stream distribution processes deployed on each server request the second zookeeper so that the second zookeeper generates a coordination instruction according to the residual computing power of each server, wherein the coordination instruction is used for distributing the number of tasks for generating the backup data stream distribution process for each server;
after receiving the coordination instruction sent by each server by the second zookeeper, generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache according to the coordination instruction.
6. The method according to claim 1, wherein while the step of taking the backup data stream distribution process as the target data stream distribution process in step C is executed, the method further comprises:
taking the target data stream distribution process as a backup data stream distribution process;
acquiring third configuration data of the target data stream distribution process at the current moment, and comparing whether the third configuration data are the same as the first configuration data of the backup data stream distribution process or not;
and if so, using the backup data stream distribution process as a target data stream distribution process.
7. The method of claim 1, wherein the method further comprises:
acquiring a corresponding moment when a target data stream distribution process starts to distribute data streams, and taking a first data set to be distributed in a first time length range before the moment and a second data set to be distributed in a second time length range after the moment as recurrence data;
establishing a temporary data flow distribution process so that the temporary data flow distribution process sends an inquiry instruction whether the data processing process is processed or not to each piece of data in the recurrent data, and the data processing process returns an inquiry result to the temporary data flow distribution process according to the self processing data flow under the condition that the data processing process receives the inquiry instruction; in the case that the inquiry result is no, the temporary data flow distribution process recurs the piece of data.
8. The method according to claim 7, wherein when the data processing process receives the piece of data that has been returned by the temporary data stream distribution process, the data processing process determines whether the timestamp corresponding to the piece of data is located before the time corresponding to the target data stream distribution process when starting to distribute the data stream or is located after the time corresponding to the target data stream distribution process when starting to distribute the data stream;
processing the data by using the backup data stream corresponding to the first configuration data under the condition that the timestamp corresponding to the data is positioned before the corresponding time when the target data stream distribution process starts to distribute the data stream;
and when the time stamp corresponding to the piece of data is located after the time corresponding to the time when the target data stream distribution process starts to distribute the data stream, the piece of data is sent to the target data stream distribution process.
9. The method according to claim 7, wherein the determining of the first duration and/or the second duration comprises:
acquiring the proportion p of abnormal data streams contained in the data streams within a first preset duration range;
acquiring the weight w of the data stream;
calculating the first duration and/or the second duration using the formula, T-T1 + (w + p) b, wherein,
t is a first time length and/or a second time length; t1 is the time consumed for switching between the target data stream distribution process and the backup data stream distribution process, and
Figure FDA0003634360620000041
t2 is the maximum value of the historical switching time between the target data stream distribution process and the backup data stream distribution process; a is a second preset time length; w is a data stream weight, an
Figure FDA0003634360620000042
c1 is the total duration of the processing result of the data stream distribution process corresponding to the data stream viewed by the operator; c2 is the accumulated total duration of the processing results of the data stream distribution process corresponding to all the data streams viewed by the operator; p is the proportion of abnormal data flow; b is a third preset duration.
10. A system for dynamically updating a configuration of a data stream distribution process, the system comprising: a scheduling process and a data processing process, wherein,
the scheduling process is to:
a: the method comprises the steps of obtaining first configuration data of a target data stream distribution process at the current moment, and obtaining second configuration data of the target data stream distribution process at the current moment, wherein the target data stream distribution process comprises the following steps: kafka stream, and the first configuration data is first operation type data, and the second configuration data is second operation type data;
b: comparing whether the first configuration data is the same as the second configuration data;
c: if not, generating a backup data stream distribution process corresponding to the target data stream distribution process in the cache; taking the backup data stream distribution process as a target data stream distribution process so that the target data stream distribution process distributes the data stream to the data processing process;
and the data processing process is used for receiving the data stream sent by the target data stream distribution process.
CN202011374412.3A 2020-11-30 2020-11-30 Dynamic updating method and system for data stream distribution process configuration Active CN112532450B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011374412.3A CN112532450B (en) 2020-11-30 2020-11-30 Dynamic updating method and system for data stream distribution process configuration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011374412.3A CN112532450B (en) 2020-11-30 2020-11-30 Dynamic updating method and system for data stream distribution process configuration

Publications (2)

Publication Number Publication Date
CN112532450A CN112532450A (en) 2021-03-19
CN112532450B true CN112532450B (en) 2022-08-23

Family

ID=74995191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011374412.3A Active CN112532450B (en) 2020-11-30 2020-11-30 Dynamic updating method and system for data stream distribution process configuration

Country Status (1)

Country Link
CN (1) CN112532450B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943840A (en) * 2017-10-30 2018-04-20 深圳前海微众银行股份有限公司 Data processing method, system and computer-readable recording medium
CN111459954A (en) * 2020-03-04 2020-07-28 深圳壹账通智能科技有限公司 Distributed data synchronization method, device, equipment and medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7730364B2 (en) * 2007-04-05 2010-06-01 International Business Machines Corporation Systems and methods for predictive failure management
CN102629250A (en) * 2012-02-28 2012-08-08 杭州丰城信息技术有限公司 Recovery method of redo log files for main memory database
CN104092718B (en) * 2013-12-12 2017-10-24 腾讯数码(天津)有限公司 The update method of configuration information in distributed system and distributed system
CN104050261B (en) * 2014-06-16 2018-01-05 深圳先进技术研究院 The general data processing system and method for variable logic based on Storm
US10346272B2 (en) * 2016-11-01 2019-07-09 At&T Intellectual Property I, L.P. Failure management for data streaming processing system
CN106843930A (en) * 2016-12-23 2017-06-13 江苏途致信息科技有限公司 Streaming dynamic configuration more new architecture and method based on zookeeper
US10705868B2 (en) * 2017-08-07 2020-07-07 Modelop, Inc. Dynamically configurable microservice model for data analysis using sensors
CN110737670B (en) * 2019-10-21 2023-06-13 中国民航信息网络股份有限公司 Method, device and system for guaranteeing consistency of cluster data
CN111026400A (en) * 2019-11-20 2020-04-17 中国铁道科学研究院集团有限公司电子计算技术研究所 Method and device for analyzing service data stream

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943840A (en) * 2017-10-30 2018-04-20 深圳前海微众银行股份有限公司 Data processing method, system and computer-readable recording medium
CN111459954A (en) * 2020-03-04 2020-07-28 深圳壹账通智能科技有限公司 Distributed data synchronization method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种使用Node.js构建的分布式数据流日志服务系统;张煜;《计算机系统应用》;20130215(第02期);全文 *
基于Storm的面向大数据实时流查询系统设计研究;蒋晨晨等;《南京邮电大学学报(自然科学版)》;20160629(第03期);全文 *

Also Published As

Publication number Publication date
CN112532450A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
CN110401696B (en) Decentralized processing method, communication agent, host and storage medium
CN112162865A (en) Server scheduling method and device and server
EP3264723B1 (en) Method, related apparatus and system for processing service request
WO2022105138A1 (en) Decentralized task scheduling method, apparatus, device, and medium
CN111818159A (en) Data processing node management method, device, equipment and storage medium
CN110933137A (en) Data synchronization method, system, equipment and readable storage medium
US20220070099A1 (en) Method, electronic device and computer program product of load balancing
CN112231108A (en) Task processing method and device, computer readable storage medium and server
CN115499447A (en) Cluster master node confirmation method and device, electronic equipment and storage medium
CN113885794B (en) Data access method and device based on multi-cloud storage, computer equipment and medium
Rotter et al. A queueing model for threshold-based scaling of UPF instances in 5G core
CN110868323A (en) Bandwidth control method, device, equipment and medium
CN112631756A (en) Distributed regulation and control method and device applied to space flight measurement and control software
Ali et al. Probabilistic normed load monitoring in large scale distributed systems using mobile agents
CN112231223A (en) Distributed automatic software testing method and system based on MQTT
CN112532450B (en) Dynamic updating method and system for data stream distribution process configuration
CN110995802A (en) Task processing method and device, storage medium and electronic device
CN114900449B (en) Resource information management method, system and device
US20230246911A1 (en) Control device, control method, control program and control system
CN113422696B (en) Monitoring data updating method, system, equipment and readable storage medium
CN110209475B (en) Data acquisition method and device
CN115242718A (en) Cluster current limiting method, device, equipment and medium
CN113485828A (en) Distributed task scheduling system and method based on quartz
CN112367386A (en) Ignite-based automatic operation and maintenance method, apparatus and computer equipment
Özcan et al. A hybrid load balancing model for multi-agent systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 10th floor, R & D building, Hefei Institute of technology innovation, Chinese Academy of Sciences, 2666 Xiyou Road, Hefei hi tech Zone, Anhui Province, 230000

Applicant after: Zhongke Meiluo Technology Co., Ltd.

Address before: 10th floor, R & D building, Hefei Institute of technology innovation, Chinese Academy of Sciences, 2666 Xiyou Road, Hefei hi tech Zone, Anhui Province, 230000

Applicant before: ANHUI ZHONGKE MEILUO INFORMATION TECHNOLOGY CO.,LTD.

GR01 Patent grant
GR01 Patent grant