CN110647407A - Data configuration method and system - Google Patents

Data configuration method and system Download PDF

Info

Publication number
CN110647407A
CN110647407A CN201910816319.4A CN201910816319A CN110647407A CN 110647407 A CN110647407 A CN 110647407A CN 201910816319 A CN201910816319 A CN 201910816319A CN 110647407 A CN110647407 A CN 110647407A
Authority
CN
China
Prior art keywords
sink
channel
source
data
setting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910816319.4A
Other languages
Chinese (zh)
Inventor
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Inspur Data Technology Co Ltd
Original Assignee
Beijing Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Inspur Data Technology Co Ltd filed Critical Beijing Inspur Data Technology Co Ltd
Priority to CN201910816319.4A priority Critical patent/CN110647407A/en
Publication of CN110647407A publication Critical patent/CN110647407A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a data configuration method and a system, wherein Source is configured in a Flume configuration file, and the number of thread numbers of the Source is set to be multiple; setting the type of the channel as a multithreading file channel MultireadingFileChannel, wherein the MultireadingFileChannel comprises a plurality of file channels; and creating a plurality of channel consumer ChannelConsumer examples corresponding to the sink for the sink, so that the sink realizes multithreading. Through the data configuration mode, the data Source, the Channel and the data pool sink in the distributed Flume system are set and configured, so that the distributed Flume system can process data in multiple threads, and the data processing efficiency is improved.

Description

Data configuration method and system
Technical Field
The invention relates to the technical field of data transmission, in particular to a data configuration method and a data configuration system.
Background
The Apache flash is a distributed, highly available and highly reliable mass log aggregation system, by which large amounts of log data can be efficiently collected, aggregated and moved from many different sources to a centralized data storage area. The Apache flux distributed system mainly comprises three blocks, namely a Source, a Channel and a sink, wherein the source is responsible for acquiring data from a data source, producing the data to the Channel, the Channel is used as a message queue, and finally the sink consumes the data in the Channel.
However, in the Apache flux distributed system, only the Souce has a multithreading method, so that data acquired by the Souce from a data source can only be produced to a channel one by one, the channel transmits the data to a sink, and the sink sequentially consumes the data in the channel one by one, so that the data processing efficiency in the Apache flux distributed system is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data configuration method and system, so as to solve the problem of low data processing efficiency in an Apache flux distributed system.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
the invention discloses a data configuration method in a first aspect, which comprises the following steps:
configuring the Source in a configuration file of Flume, and setting the thread number workthreads of the Source to be multiple;
setting the type of the channel as a multithreading file channel MultireadingFileChannel, wherein the MultireadingFileChannel comprises a plurality of file channels;
and creating a plurality of channel consumer ChannelConsumer examples corresponding to the sink for the sink, so that the sink realizes multithreading.
Preferably, the configuring the data Source in the Flume configuration file includes:
configuring the type of the Source as a scribes Source in a configuration file of the Flume; wherein the Scribe source is used for receiving a data source of Scribe;
the port of Source is configured as the target port in the configuration file of Flume.
Preferably, the setting of the type of the channel as a multithreading file channel multitreadingfilechannel includes:
defining a multitreadingFileChannel based on a custom Channel mechanism;
setting the type of the channel to the defined MultithreadingFileChannel;
setting the name of the channel as a preset name;
and setting the number of the channels as a preset number.
Preferably, the defining a multitreadingfilechannel based on the custom Channel mechanism includes:
based on a user-defined Channel mechanism, realizing user-defined multitreadingFileChannel by inheriting a basic Channel semantic basicChannelSemantics method;
creating a list of FileChannels, and storing a user-defined number of FileChannels in the list;
creating a transaction and obtaining the FileChannel in the list.
Preferably, the creating a plurality of channel consumer ChannelConsumer instances corresponding to the sink for the sink includes:
defining and realizing multi-thread sink;
and setting the name of the sink as a preset name, setting the number of ChannelconSumers of the sink as a preset number, and setting the type of the sink as the MultithreeadingKafkassink in a configuration file corresponding to the sink.
Preferably, the defining and implementing multitreadingkafkassink includes:
setting sink as a multithreading data pool of kafka type;
initializing a preset number of kafkassink instances, storing the instances in a thread pool, and acquiring the kafkassink instances from the thread pool by using a multithreading technology to realize multithreading kafkassink.
The second aspect of the present invention discloses a data configuration system, which is suitable for a distributed Flume system, and the system at least comprises three modules: the method comprises a data Source, a Channel and a data pool sink, wherein the data Source is used for receiving data, the Channel is used for transmitting the data received by the data Source to the data pool sink for consumption, and the method comprises the following steps:
the first configuration module is used for configuring the Source in a Flume configuration file and setting a plurality of thread numbers, namely, the threads of the Source;
a second configuration module, configured to set a type of the channel as a multithreading file channel polytreadingfilechannel, where the polytreadingfilechannel includes a plurality of file channels;
and the third configuration module is used for creating a plurality of channel consumer ChannelConsumer instances corresponding to the sink for the sink so as to enable the sink to realize multithreading.
Preferably, the first configuration module includes:
a first configuration unit, configured to configure the type of Source as scribe Source in the configuration file of Flume; wherein the Scribe source is used for receiving a data source of Scribe;
and the second configuration unit is used for configuring the port of the Source as the target port in the configuration file of the flash.
Preferably, the second configuration module includes:
a first defining unit, configured to define a multitreadingfilechannel based on a custom Channel mechanism;
a second defining unit configured to set a type of the channel to the defined multitreadingfilechannel;
the first setting unit is used for setting the name of the channel as a preset name;
and the second setting unit is used for setting the number of the channels to be a preset number.
Preferably, the third configuration module includes:
the third definition unit is used for defining and realizing multi-thread sink;
and the third setting unit is used for setting the name of the sink as a preset name, setting the number of ChannelconSumers of the sink as a preset number and setting the type of the sink as the MultithreeadingKafkassink in a configuration file corresponding to the sink.
From the above, the invention discloses a data configuration method and system, wherein the Source is configured in the Flume configuration file, and the number of thread threads of the Source is set to be multiple; setting the type of the channel as a multithreading file channel MultireadingFileChannel, wherein the MultireadingFileChannel comprises a plurality of file channels; and creating a plurality of channel consumer ChannelConsumer examples corresponding to the sink for the sink, so that the sink realizes multithreading. Through the data configuration mode, the data Source, the Channel and the data pool sink in the distributed Flume system are set and configured, so that the distributed Flume system can process data in multiple threads, and the data processing efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a data configuration method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a Source configuration method according to an embodiment of the present invention;
fig. 3 is a flowchart of a Channel configuration method according to an embodiment of the present invention;
fig. 4 is a flowchart of a sink configuration method according to an embodiment of the present invention;
fig. 5 is a schematic diagram of prior art Source, Channel, and sink configuration connections according to an embodiment of the present invention;
fig. 6 is a schematic diagram of Source, Channel, and sink configuration connections according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a data configuration system according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a first configuration module 701 according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a second configuration module 702 according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a third configuration module 703 according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
An embodiment of the present invention provides a data configuration method, and referring to fig. 1, the method at least includes step S101 to step S103.
Step S101: and configuring the Source in a configuration file of Flume, and setting the thread number workthreads of the Source to be multiple.
In step S101, the flute configuration file is a sink configuration file in an Apache flute distribution system, and the Souce, Channel, and sink may be configured in the flute configuration file.
It should be noted that the Souce has a multithreading method, so that a plurality of threads can be directly set for the Souce, and therefore, the Souce multithreading can be realized.
In the process of executing step S101, as shown in fig. 2, the specific execution process includes step S201 to step S202.
Step S201: the type of Source is configured as scribes Source in the Flume's configuration file.
In step S201, a data source of the script may be received through the script source.
It should be noted that the Scribe Source method of the Source can receive a data Source of the Scribe, where the Scribe is a Facebook open Source log collection system, and can collect logs from various log sources and store the logs on a central storage system, so as to perform centralized statistical analysis processing.
Step S202: the port of Source is configured as the target port in the configuration file of Flume.
In step S202, the port of the Source is configured as a target port, so that the Source can receive data, where the target port is a preset port.
To facilitate understanding of how the Source is configured in step S101, a Source specific configuration statement is shown below.
a1.sources=scribe_source
V/set Source name to script _ Source
a1.sources.scribe_source.type=org.apache.flume.source.scribe.ScribeSource
V/set the type of script _ Source to ScribSource
a1.sources.scribe_source.port=1466
V/set the port of script _ source to 1466
a1.sources.scribe_source.workerThreads=10
// set the number of threads for script _ source to 10
Step S102: and setting the type of the channel as a multithreading file channel multitreadingfilechannel.
In step S102, the multitreadingfilechannel includes a plurality of file channels filechannels.
It should be noted that the channel does not have a multithreading method, and therefore, the channel needs to be set to have a multithreading file channel multitreadingfilechannel, so that the channel has a multithreading method.
In the process of executing step S102, as shown in fig. 3, steps S301 to S304 are specifically included.
Step S301: a multitreadingfilechannel is defined based on a custom Channel mechanism.
In step S301, the Channel has a customization mechanism, so that a multitreadingfilechannel can be defined by the customization mechanism of the Channel.
It should be noted that, because the Channel does not have a multithreading function, a multitreadingfilechannel, that is, a multithreading file Channel, needs to be defined first through a customization mechanism of the Channel.
It should be further noted that, based on the customized Channel mechanism, a multitheradingfilechannel is defined, and the customized multitheradingfilechannel can be realized by inheriting a basic Channel semantic basicchannels method based on the customized Channel mechanism; then creating a list of FileChannels, and storing a user-defined number of FileChannels in the list; and finally, creating a transaction and acquiring the FileChannel in the list.
Step S302: setting the type of the channel to the defined MultithreadingFileChannel.
In step S302, since the polytreadingfilechannel is a FileChannel containing a plurality of file channels, the type of the Channel needs to be set as the defined polytreadingfilechannel, so that the Channel becomes a Channel with one multi-thread file Channel, and thus the Channel has a multi-thread function.
Step S303: setting the name of the Channel as a preset name.
In step S303, since the type of the Channel is set, the Channel needs to be named.
Step S304: and setting the number of the channels as a preset number.
In step S304, since the type of the channel is a polytreadingfilechannel, the channel has a multithreading function, and therefore, the number of threads of the channel needs to be set to specify the number of threads of the channel.
To facilitate understanding of how step S102 configures the channel, a channel specific configuration statement is shown below.
a1.channels=file_channel
V/naming channels as file _ channel
a1.channels.file_channel.type=org.apache.flume.extension.channel.Multithrea dingFileChannel
// set the type of file _ channel to multitreadingFileChannel
a1.channels.file_channel.channels=10
// set the thread count for the file _ channel to 10
a1.channels.file_channel.checkpointDir=/data0/flume/checkpoint
// set the checkpoint directory for file _ channel to checkpoint
a1.channels.file_channel.dataDir=/data0/flume/data
Setting directory for storing data of file _ channel as data
Step S103: and creating a plurality of channel consumer ChannelConsumer examples corresponding to the sink for the sink, so that the sink realizes multithreading.
In step S103, since the sink does not have a multithreading function, a plurality of ChannelConsumer instances of the channel corresponding to the sink need to be created for the sink, so that the sink realizes multithreading.
When step S103 is executed, as shown in fig. 4, step S401 to step S402 are specifically included.
Step S401: and defining and realizing the multi-thread sink.
In step S401, defining and implementing a multi-threaded sink, a multi-threaded data pool with the sink being kafka type may be set first; and initializing a preset number of kafkassink instances to be stored in a thread pool, and acquiring the kafkassink instances from the thread pool by using a multithreading technology to realize multithreading sink.
It should be noted that, setting sink to Kafka is required to implement multi-threaded sink by self-definition.
Step S402: and setting the name of the sink as a preset name, setting the number of ChannelconSumers of the sink as a preset number, and setting the type of the sink as the MultithreeadingKafkassink in a configuration file corresponding to the sink.
In step S402, the number of channelnconsumers refers to the number of threads of the sink.
It should be noted that, by setting the type of the sink as the polytreadingkafkassink, the sink can be provided with multiple threads.
It should be noted that, when configuring a sink, in addition to setting the name of the sink, the number and type of channelconSumers, the topicHeaderName, brokerList, and bankSize of the sink may also be set.
To facilitate understanding of how step S103 configures the sink, a sink specific configuration statement is shown below.
a1.sinks=kafka_sink
// set the name of sink to kafka _ sink
a1.sinks.kafka_sink.type=org.apache.flume.extension.sink.MultithreadingKaf kasink
V/set the type of sink to predefined MultithreadingKafkassink
a1.sinks.kafka_sink.topicHeaderName=category
// set the subject title name of sink to category
a1.sinks.kafka_sink.consumers=10
// set the number of threads on sink to 10
a1.sinks.kafka_sink.brokerList=kafkaHost:9092
// set the sink's Kafka Server List as kafkaHost 9092
a1.sinks.kafka_sink.batchSize=1000
V/set the number of batches processed to 1000
It should be noted that, in step S101, step S102, and step S103 of the present application, the data Source, the Channel, and the data pool sink in the distributed Flume system are mainly configured, and the data Source, the Channel, and the data pool sink all have a multithreading function, so that the distributed Flume system can perform multithreading processing on data.
It should be noted that step S101, step S102, and step S103 are not limited to a sequential order, and may be executed simultaneously.
Configuring the Source in a configuration file of Flume, and setting a plurality of thread numbers, namely, worerthreads, of the Source; setting the type of the channel as a multithreading file channel MultireadingFileChannel, wherein the MultireadingFileChannel comprises a plurality of file channels; and creating a plurality of channel consumer ChannelConsumer examples corresponding to the sink for the sink, so that the sink realizes multithreading. Through the data configuration mode, the data Source, the Channel and the data pool sink in the distributed Flume system are set and configured, so that the distributed Flume system can process data in multiple threads, and the data processing efficiency is improved.
It should be noted that, as shown in fig. 5, in the prior art, in the method, the number of threads of a data Source with multiple threads is set to N, and then N channels and N sinks are set in a distributed Flume system, so that the distributed Flume system is finally implemented to process data with multiple threads.
Compared with the prior art, when the distributed Flume system is configured for the data Source, the Channel and the data pool sink, only simple program codes are needed to complete the configuration, and only the number of threads in the program needs to be modified during maintenance, so that the configuration is simplified while the data processing of the distributed Flume system is improved.
Corresponding to the data configuration method provided in the embodiment of the present application, a corresponding data configuration system is also provided in the embodiment of the present application, as shown in fig. 7, for the data configuration system provided in the embodiment of the present application, the data configuration system includes:
a first configuration module 701, configured to configure the Source in a Flume configuration file, and set a plurality of thread numbers worerthreads of the Source;
a second configuration module 702, configured to set the type of the channel as a multithreading file channel polytreadingfilechannel, where the polytreadingfilechannel includes multiple file channels;
a third configuration module 703, configured to create, for the sink, multiple channel consumer ChannelConsumer instances corresponding to the sink, so that the sink realizes multithreading.
Preferably, as shown in fig. 8, the first configuration module 701 includes:
a first configuration unit 801, configured to configure the type of Source as scribes Source in the configuration file of Flume; wherein the Scribe source is used for receiving a data source of Scribe;
a second configuration unit 802, configured to configure the port of Source as a target port in the configuration file of Flume.
Preferably, as shown in fig. 9, the second configuration module 702 includes:
a first defining unit 901, configured to define a multitreadingfilechannel based on a custom Channel mechanism;
a second defining unit 902, configured to set the type of the channel as the defined multitreadingfilechannel;
a first setting unit 903, configured to set a name of the channel as a preset name;
a second setting unit 904, configured to set the number of channels to a preset number.
Preferably, the first defining unit 901 includes:
the first acquisition subunit is used for realizing self-defined multitreadingFileChannel by inheriting a basic Channel semantic basicChannelSemantics method based on a self-defined Channel mechanism;
the first creating subunit is used for creating a list of the FileChannels and storing the FileChannels with the user-defined number in the list;
and the second creating subunit is used for creating the transaction and acquiring the FileChannel in the list.
Preferably, as shown in fig. 10, the third configuration module 703 includes:
a third defining unit 1001 for defining and implementing a multi-thread sink;
a third setting unit 1002, configured to set, in a configuration file corresponding to the sink, the name of the sink as a preset name, set the number of channelnconsumers of the sink as a preset number, and set the type of the sink as the multitreadingkafkassink.
Preferably, the third defining unit 1001 includes:
the setting subunit is used for setting the sink as a multithreading data pool of the kafka type;
and the initialization subunit is used for initializing a preset number of kafkassink instances to be stored in a thread pool, and acquiring the kafkassink instances from the thread pool by using a multithreading technology so as to realize multithreading sink.
It should be noted that, for the specific implementation process and implementation principle of each module and unit in the data configuration system disclosed in the foregoing embodiment of the present application, reference may be made to corresponding parts related to data configuration in the data configuration method disclosed in the foregoing embodiment of the present application, and details are not described here again.
Configuring the Source in a configuration file of the Flume through a first configuration module, and setting a plurality of thread numbers, namely, worerthreads, of the Source; the second configuration module sets the type of the channel as a multithreading file channel multitreadingfilechannel, wherein the multitreadingfilechannel comprises a plurality of file channels; and a third configuration module creates a plurality of channel consumer ChannelConsumer instances corresponding to the sink for the sink, so that the sink realizes multithreading. Through the data configuration system, the data Source, the Channel and the data pool sink in the distributed Flume system are set and configured, so that the distributed Flume system can process data in multiple threads, and the data processing efficiency is improved.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A data configuration method is applicable to a distributed Flume system, and the system at least comprises three modules: the method comprises the following steps that a data Source, a Channel and a data pool sink are provided, wherein the data Source is used for receiving data, the Channel is used for transmitting the data received by the data Source to the data pool sink for consumption, and the method comprises the following steps:
configuring the Source in a configuration file of Flume, and setting the thread number workthreads of the Source to be multiple;
setting the type of the channel as a multithreading file channel MultireadingFileChannel, wherein the MultireadingFileChannel comprises a plurality of file channels;
and creating a plurality of channel consumer ChannelConsumer examples corresponding to the sink for the sink, so that the sink realizes multithreading.
2. The method according to claim 1, wherein the configuring the data Source in the Flume configuration file comprises:
configuring the type of the Source as a scribes Source in a configuration file of the Flume; wherein the Scribe source is used for receiving a data source of Scribe;
the port of Source is configured as the target port in the configuration file of Flume.
3. The method of claim 1, wherein the setting the type of the channel to a multithreaded filechannel comprises:
defining a multitreadingFileChannel based on a custom Channel mechanism;
setting the type of the channel to the defined MultithreadingFileChannel;
setting the name of the channel as a preset name;
and setting the number of the channels as a preset number.
4. The method of claim 3, wherein defining a MultithreadingFileChannel based on a custom Channel mechanism comprises:
based on a user-defined Channel mechanism, realizing user-defined multitreadingFileChannel by inheriting a basic Channel semantic basicChannelSemantics method;
creating a list of FileChannels, and storing a user-defined number of FileChannels in the list;
creating a transaction and obtaining the FileChannel in the list.
5. The method of claim 1, wherein creating a plurality of channel consumer ChannelConsumer instances corresponding to the sink for the sink comprises:
defining and realizing multi-thread sink;
and setting the name of the sink as a preset name, setting the number of ChannelconSumers of the sink as a preset number, and setting the type of the sink as the MultithreeadingKafkassink in a configuration file corresponding to the sink.
6. The method of claim 5, wherein defining and implementing a multi-threaded sink comprises:
setting sink as a multithreading data pool of kafka type;
initializing a preset number of kafkassink instances, storing the instances in a thread pool, and acquiring the kafkassink instances from the thread pool by using a multithreading technology to realize multithreading kafkassink.
7. A data configuration system, adapted for a distributed flash system, the distributed flash system comprising at least three modules: the data configuration system comprises a data Source, a Channel and a data pool sink, wherein the data Source is used for receiving data, the Channel is used for transmitting the data received by the data Source to the data pool sink for consumption, and the data configuration system comprises:
the first configuration module is used for configuring the Source in a Flume configuration file and setting a plurality of thread numbers, namely, the threads of the Source;
a second configuration module, configured to set a type of the channel as a multithreading file channel polytreadingfilechannel, where the polytreadingfilechannel includes a plurality of file channels;
and the third configuration module is used for creating a plurality of channel consumer ChannelConsumer instances corresponding to the sink for the sink so as to enable the sink to realize multithreading.
8. The system of claim 7, wherein the first configuration module comprises:
a first configuration unit, configured to configure the type of Source as scribe Source in the configuration file of Flume; wherein the Scribe source is used for receiving a data source of Scribe;
and the second configuration unit is used for configuring the port of the Source as the target port in the configuration file of the flash.
9. The system of claim 7, wherein the second configuration module comprises:
a first defining unit, configured to define a multitreadingfilechannel based on a custom Channel mechanism;
a second defining unit configured to set a type of the channel to the defined multitreadingfilechannel;
the first setting unit is used for setting the name of the channel as a preset name;
and the second setting unit is used for setting the number of the channels to be a preset number.
10. The system of claim 7, wherein the third configuration module comprises:
the third definition unit is used for defining and realizing multi-thread sink;
and the third setting unit is used for setting the name of the sink as a preset name, setting the number of ChannelconSumers of the sink as a preset number and setting the type of the sink as the MultithreeadingKafkassink in a configuration file corresponding to the sink.
CN201910816319.4A 2019-08-30 2019-08-30 Data configuration method and system Pending CN110647407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910816319.4A CN110647407A (en) 2019-08-30 2019-08-30 Data configuration method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910816319.4A CN110647407A (en) 2019-08-30 2019-08-30 Data configuration method and system

Publications (1)

Publication Number Publication Date
CN110647407A true CN110647407A (en) 2020-01-03

Family

ID=69010023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910816319.4A Pending CN110647407A (en) 2019-08-30 2019-08-30 Data configuration method and system

Country Status (1)

Country Link
CN (1) CN110647407A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475367A (en) * 2020-03-13 2020-07-31 苏州浪潮智能科技有限公司 Method and system for Flume multithreading test and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160321308A1 (en) * 2015-05-01 2016-11-03 Ebay Inc. Constructing a data adaptor in an enterprise server data ingestion environment
CN106648722A (en) * 2016-05-10 2017-05-10 深圳前海信息技术有限公司 Flume receiving side data processing method and device based on big data
CN106777046A (en) * 2016-12-09 2017-05-31 武汉卓尔云市集团有限公司 A kind of data analysing method based on nginx daily records
CN109542733A (en) * 2018-12-05 2019-03-29 焦点科技股份有限公司 A kind of highly reliable real-time logs collection and visual m odeling technique method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160321308A1 (en) * 2015-05-01 2016-11-03 Ebay Inc. Constructing a data adaptor in an enterprise server data ingestion environment
CN106648722A (en) * 2016-05-10 2017-05-10 深圳前海信息技术有限公司 Flume receiving side data processing method and device based on big data
CN106777046A (en) * 2016-12-09 2017-05-31 武汉卓尔云市集团有限公司 A kind of data analysing method based on nginx daily records
CN109542733A (en) * 2018-12-05 2019-03-29 焦点科技股份有限公司 A kind of highly reliable real-time logs collection and visual m odeling technique method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DEMIGELEMIAO: "Flume FileChannel优化(扩展)指南", 《博客园CNBLOGS》 *
刘荣辉: "《大数据架构技术与实例分析》", 31 January 2018, 东北师范大学出版社 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475367A (en) * 2020-03-13 2020-07-31 苏州浪潮智能科技有限公司 Method and system for Flume multithreading test and computer storage medium
CN111475367B (en) * 2020-03-13 2023-01-06 苏州浪潮智能科技有限公司 Method and system for Flume multithreading test and computer storage medium

Similar Documents

Publication Publication Date Title
CN109684053B (en) Task scheduling method and system for big data
CN111767143B (en) Transaction data processing method, device, equipment and system
US10320623B2 (en) Techniques for tracking resource usage statistics per transaction across multiple layers of protocols
CN110609742B (en) Method and device for configuring queues of Kubernetes scheduler
US20170171025A1 (en) Provisioning high performance computing clusters
US20180101364A1 (en) Designer tool for managing cloud computing services
EP3454210B1 (en) Prescriptive analytics based activation timetable stack for cloud computing resource scheduling
CN103209439A (en) Method, apparatus and device for monitoring data traffic
CN109117141B (en) Method, device, electronic equipment and computer readable storage medium for simplifying programming
CN103927338A (en) Log information storage processing method and log information storage processing device
US9519537B2 (en) Apparatus, system and method for application log data processing
WO2014206289A1 (en) Method and apparatus for outputting log information
RU2016119160A (en) METHOD AND DEVICE FOR PROVIDING AN ELECTRONIC TRANSACTION GATEWAY
CN105450684B (en) Cloud computing resource scheduling method and system
CN108900627B (en) Network request method, terminal device and storage medium
CN113722055A (en) Data processing method and device, electronic equipment and computer readable medium
CN110647407A (en) Data configuration method and system
CN113127225A (en) Method, device and system for scheduling data processing tasks
EP2052325B1 (en) Reduction of message flow between bus-connected consumers and producers
CN113326305A (en) Method and device for processing data
WO2018042313A2 (en) Techniques for implementing universal commands in a welding or cutting system
CN106408490A (en) Active work order processing method and active work order processing apparatus
CN107704362A (en) A kind of method and device based on Ambari monitoring big data components
CN102929721B (en) Balanced scheduling system and method based on station quota
AU2017319597A1 (en) Computer implemented methods, welding systems and articles for event driven scheduling in a welding or cutting system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200103

RJ01 Rejection of invention patent application after publication