CN109117286A - A kind of method of data collection and adjusting - Google Patents

A kind of method of data collection and adjusting Download PDF

Info

Publication number
CN109117286A
CN109117286A CN201810851405.4A CN201810851405A CN109117286A CN 109117286 A CN109117286 A CN 109117286A CN 201810851405 A CN201810851405 A CN 201810851405A CN 109117286 A CN109117286 A CN 109117286A
Authority
CN
China
Prior art keywords
multiple data
data records
formatting
queue
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810851405.4A
Other languages
Chinese (zh)
Inventor
刘聪玲
易卜拉欣·卡赛木
孙小艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Tianmu Chain Technology Co Ltd
Original Assignee
Foshan Tianmu Chain Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Tianmu Chain Technology Co Ltd filed Critical Foshan Tianmu Chain Technology Co Ltd
Priority to CN201810851405.4A priority Critical patent/CN109117286A/en
Publication of CN109117286A publication Critical patent/CN109117286A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/37Compiler construction; Parser generation

Abstract

The invention discloses a kind of methods of data collection and adjusting, comprising: receives multiple data records, carries out data processing and waiting;By treated, data are pulled out from first queue to handle;At least one of the rate of multiple data records of formatting is continuously monitored during this method;Determine size or acceptance rate whether except range of receiving;Second queue is sent by multiple data records of formatting and is saved;At least one of the rate of multiple data records of formatting is continuously monitored during this method;Determine size or acceptance rate whether except range of receiving;According to determination, data destination node is automatically distributed to during processing or is cancelled the data destination node for distributing to specified quantity;At least one of multiple data sinks simultaneously are sent to be stored therein by multiple data records of each formatting, wherein the multiple data records formatted use for multiple application programs.

Description

A kind of method of data collection and adjusting
Technical field
The present invention relates to big data technical fields, in particular to a kind of method of data collection and adjusting.
Background technique
Usually to solve the problems, such as it is how to manage and analyze big data, for example, the data of big approximate number PB.Big data is wide Free burial ground for the destitute is defined as data set, and size has exceeded popular software tool and captures within reasonable time, management, management and processing number According to ability.About double about every two years for the information in the world.These information (or data) include crucial information, but to this information Excavation to become cost excessively high, and need the long time for many end users and application program.Traditional data collection History including filling relational database, structuring, the narrow subset of static data;Big data provides one for end user Especially difficult problem because it be it is unlimited, can be structuring and non-structured, can often obtain in real time and It can be iteration.For current relational database management system, such big data for no important process and Say hell to pay, this is time-consuming and finally makes most of data out-of-date and be worth limited.
Summary of the invention
The invention proposes the methods of data collection and adjusting, comprising:
From the multiple data records of multiple data sources at processing engine;
Each multiple data records are processed into identical internal form from their own native format;
Will receive and format multiple data records be stored in first queue with etc. it is to be processed;
By the intake node of specified quantity by multiple data records of formatting from pulling out in first queue to carry out Reason;
First queue size is continuously monitored during this method and multiple data note of formatting is pulled out from first queue At least one of rate of record;
Determine one or two of first queue size or acceptance rate all except first acceptable range;
According to determination, from the intake node distribution of trend specified quantity or cancellation distribution intake node in processing method;
Second queue is sent by multiple data records of formatting from the intake node near-real-time of specified quantity;
The multiple data records of the formatting received are stored in second queue;
The multiple data records of formatting received are pulled out second queue from the data destination node of specified quantity to deposit Storage;
Second queue size is continuously monitored during this method and the received multiple numbers of formatting are pulled out from second queue According at least one of the rate of record;
Determine one or two of second queue size or acceptance rate all except the second tolerance interval;
According to determination, data destination node is automatically distributed to during processing or is cancelled the data sink for distributing to specified quantity Node;With
At least one of multiple data sinks are sent by multiple data records of each formatting in nearly real time To be stored therein, wherein the multiple data records formatted use for multiple application programs.
The method, further includes: when second queue size reaches predetermined limit, be automatically stopped distribution intake node.
The method, further includes:
Response of the continuous monitoring to the distribution of one of intake node and data gathering node, to determine whether to improve processing Handling capacity;With if it is determined that processing handling capacity do not improved, then stop distributing.
The method, wherein the first and second queues are Java Message Service (JMS) queues, and internal form is JMS format.
The method, further includes:
By handle engine compare near real-time from intake node each formatting multiple data records with extremely Few first enrichment rule, to determine whether at least first enrichment rule is suitable at least one data element one or more Element.The multiple data records formatted;With
If applicable, by the abundant one or more multiple data records formatted of processing engine near-real-time At least one data element, the data element have according at least first enrichment rule additional data, with formed one or Multiple format data records abundant.
The method, wherein by each of the multiple data record from every in their own native format One is processed into identical internal form further include:
Each of multiple data records near real-time is parsed into multiple component parts by least one resolver; Each of multiple data records are converted into phase using multiple component parts of its parsing with by least one translater Same internal form, is translated near real-time.
The method, wherein identical internal form includes multiple fields, wherein at least in the multiple field One is common for all multiple data records from the multiple data source, and in the multiple field at least Second be unique multiple data records single class.
The method, wherein the multiple data source includes from by relational database, website, RSS feeds, SIEM text Part, at least two sources that select in the group of email archives composition.
The method, wherein it includes comma separated value resolver, Email solution that at least one described resolver, which is selected from, Parser, exchangeable image file format (EXIF) resolver, the group of JavaScript open symbol (JSON).) resolver, The XML parser of Libcap resolver and one or more native formats according to multiple data records.
The method, wherein at least first enrichment rule is selected from algorithm enrichment rule and dimension enrichment rule.
The method, wherein the algorithm enriches the addition that rule is geographical location.
The method, wherein dimension enrichment includes:
Secondary data in the data element of multiple data records from each formatting and data rich table is carried out Compare;Data element is modified with according to auxiliary data.
Detailed description of the invention
From following description with reference to the accompanying drawings it will be further appreciated that the present invention.Component in figure is not drawn necessarily to scale, But it focuses on and shows in the principle of embodiment.In the figure in different views, identical appended drawing reference is specified to be corresponded to Part.
Fig. 1 is the method schematic diagram of data collection and adjusting of the invention.
Specific embodiment
In order to enable the objectives, technical solutions, and advantages of the present invention are more clearly understood, below in conjunction with embodiment, to this Invention is further elaborated;It should be appreciated that described herein, the specific embodiments are only for explaining the present invention, and does not have to It is of the invention in limiting.To those skilled in the art, after access is described in detail below, other systems of the present embodiment System, method and/or feature will become obvious.All such additional systems, method, feature and advantage are intended to be included in It in this specification, is included within the scope of the invention, and by the protection of the appended claims.In description described in detail below The other feature of the disclosed embodiments, and these characteristic roots will be apparent according to described in detail below.
Embodiment one:
As shown in Figure 1, for the schematic diagram of data collection of the present invention and the method for adjusting, comprising: include: to receive multiple data Record carries out data processing and waiting;By treated, data are pulled out from first queue to handle;During this method At least one of the rate for multiple data records that continuous monitoring formats;Determine size or acceptance rate whether in range of receiving Except;Second queue is sent by multiple data records of formatting and is saved;Format is continuously monitored during this method At least one of the rate for the multiple data records changed;Determine size or acceptance rate whether except range of receiving;According to true It is fixed, data destination node is automatically distributed to during processing or is cancelled the data destination node for distributing to specified quantity;Simultaneously At least one of multiple data sinks are sent by multiple data records of each formatting to be stored therein, wherein The multiple data records formatted use for multiple application programs.
From the multiple data records of multiple data sources at processing engine;
Each multiple data records are processed into identical internal form from their own native format;
Will receive and format multiple data records be stored in first queue with etc. it is to be processed;
By the intake node of specified quantity by multiple data records of formatting from pulling out in first queue to carry out Reason;
First queue size is continuously monitored during this method and multiple data note of formatting is pulled out from first queue At least one of rate of record;
Determine one or two of first queue size or acceptance rate all except first acceptable range;
According to determination, from the intake node distribution of trend specified quantity or cancellation distribution intake node in processing method;
Second queue is sent by multiple data records of formatting from the intake node near-real-time of specified quantity;
The multiple data records of the formatting received are stored in second queue;
The multiple data records of formatting received are pulled out second queue from the data destination node of specified quantity to deposit Storage;
Second queue size is continuously monitored during this method and the received multiple numbers of formatting are pulled out from second queue According at least one of the rate of record;
Determine one or two of second queue size or acceptance rate all except the second tolerance interval;
According to determination, data destination node is automatically distributed to during processing or is cancelled the data sink for distributing to specified quantity Node;With
At least one of multiple data sinks are sent by multiple data records of each formatting in nearly real time To be stored therein, wherein the multiple data records formatted use for multiple application programs.
The method, further includes: when second queue size reaches predetermined limit, be automatically stopped distribution intake node.
The method, further includes:
Response of the continuous monitoring to the distribution of one of intake node and data gathering node, to determine whether to improve processing Handling capacity;With if it is determined that processing handling capacity do not improved, then stop distributing.
The method, wherein the first and second queues are Java Message Service (JMS) queues, and internal form is JMS format.
The method, further includes:
By handle engine compare near real-time from intake node each formatting multiple data records with extremely Few first enrichment rule, to determine whether at least first enrichment rule is suitable at least one data element one or more Element.The multiple data records formatted;With
If applicable, by the abundant one or more multiple data records formatted of processing engine near-real-time At least one data element, the data element have according at least first enrichment rule additional data, with formed one or Multiple format data records abundant.
The method, wherein by each of the multiple data record from every in their own native format One is processed into identical internal form further include:
Each of multiple data records near real-time is parsed into multiple component parts by least one resolver; With
Each of multiple data records are turned using multiple component parts of its parsing by least one translater It changes identical internal form into, translates near real-time.
The method, wherein identical internal form includes multiple fields, wherein at least in the multiple field One is common for all multiple data records from the multiple data source, and in the multiple field at least Second be unique multiple data records single class.
The method, wherein the multiple data source includes from by relational database, website, RSS feeds, SIEM text Part, at least two sources that select in the group of email archives composition.
The method, wherein it includes comma separated value resolver, Email solution that at least one described resolver, which is selected from, Parser, exchangeable image file format (EXIF) resolver, the group of JavaScript open symbol (JSON).) resolver, The XML parser of Libcap resolver and one or more native formats according to multiple data records.
The method, wherein at least first enrichment rule is selected from algorithm enrichment rule and dimension enrichment rule.
The method, wherein the algorithm enriches the addition that rule is geographical location.
The method, wherein dimension enrichment includes:
Secondary data in the data element of multiple data records from each formatting and data rich table is carried out Compare;With
Data element is modified according to auxiliary data.
Embodiment two:
Data collection and adjusting method, comprising:
From multiple data records of multiple data sources difference acceptance rates at processing engine;
Each multiple data records are processed into identical internal form from their own native format, wherein handling Including,
Each of multiple data records near real-time is parsed into multiple component parts by least one resolver; With
It is by least one translater that each of multiple data records are close using multiple component parts of its parsing Identical internal form is seemingly translated into real time;
Will receive and format multiple data records be stored in first queue with etc. it is to be processed;
By the intake node of specified quantity by multiple data records of formatting from pulling out in first queue to carry out Reason;
First queue size is continuously monitored during this method and multiple data note of formatting is pulled out from first queue At least one of rate of record;
Determine one or two of first queue size or acceptance rate all except first acceptable range;With
According to the acceptance rate for determining the peak being approximately equal in the acceptance rate of variation during this method, will automatically take the photograph It takes node to distribute or cancel the intake node for being assigned to specified quantity or distributes intake node from the intake node of specified quantity.
The method, wherein by each of the multiple data record from every in their own native format One is processed into identical internal form further include:
Each of multiple data records near real-time is parsed into multiple component parts by least one resolver; With
Each multiple data records are converted into phase using multiple component parts of its parsing by least one translater Same internal form is translated with carrying out near real-time.
The method, wherein the identical internal form includes multiple fields, wherein in the multiple field extremely Few first is common for all multiple data records from the multiple data source, and in the multiple field At least second be unique multiple data records single class.
The method, wherein the multiple data source includes from by relational database, website, RSS feeds, SIEM text Part, at least two sources that select in the group of email archives composition.
The method, wherein it includes comma separated value resolver, Email solution that at least one described resolver, which is selected from, Parser, exchangeable image file format (EXIF) resolver, the group of JavaScript open symbol (JSON).) resolver, The XML parser of Libcap resolver and one or more native formats according to multiple data records.
The method, further includes:
By handle engine compare near real-time from intake node each formatting multiple data records with extremely Few first enrichment rule, to determine whether at least first enrichment rule is suitable at least one data element one or more Element.The multiple data records formatted;With
If applicable, by the abundant one or more multiple data records formatted of processing engine near-real-time At least one data element, the data element have according at least first enrichment rule additional data, with formed one or Multiple format data records abundant.
The method, wherein at least first enrichment rule is selected from algorithm enrichment rule and dimension enrichment rule.
The method, wherein the algorithm enriches the addition that rule is geographical location.
The method, wherein dimension enrichment includes:
Secondary data in the data element of multiple data records from each formatting and data rich table is carried out Compare;Data element is modified with according to auxiliary data.
Although describing the present invention by reference to various embodiments above, but it is to be understood that of the invention not departing from In the case where range, many changes and modifications can be carried out.That is methods discussed above, system or equipment etc. show Example.Various configurations can be omitted suitably, replace or add various methods or component.For example, in alternative configuration, can with Described order in a different order executes method, and/or can add, and omits and/or combine the various stages.Moreover, about The feature of certain configuration descriptions can be combined with various other configurations.Can combine in a similar way configuration different aspect and Element.In addition, many elements are only range of the example without limiting the disclosure or claims with the development of technology.
Give detail in the description to provide to the thorough understanding for including the exemplary configuration realized.However, Configuration can be practiced without these specific details for example, having been illustrated with well-known circuit, method, calculation Method, structure and technology are without unnecessary details, to avoid fuzzy configuration.The description only provides example arrangement, and unlimited The scope of the claims processed, applicability or configuration.It is used on the contrary, front will provide the description of configuration for those skilled in the art Realize the enabled description of described technology.It, can be to the function of element without departing from the spirit or the scope of the present disclosure It can and arrange and carry out various changes.
In addition, many operations can be in parallel or concurrently although each operation can describe the operations as sequential grammar It executes.Furthermore it is possible to rearrange the sequence of operation.One method may have other steps.Furthermore, it is possible to pass through hardware, soft Part, firmware, middleware, code, hardware description language or any combination thereof carry out the example of implementation method.When software, firmware, in Between when realizing in part or code, program code or code segment for executing necessary task can store in such as storage medium In non-transitory computer-readable medium, and described task is executed by processor.
To sum up, be intended to foregoing detailed description be considered as it is illustrative and not restrictive, and it is to be understood that described Claim (including all equivalents) is intended to limit the spirit and scope of the present invention.The above embodiment is interpreted as only using In illustrating the present invention rather than limit the scope of the invention.After the content for having read record of the invention, technology Personnel can make various changes or modifications the present invention, these equivalence changes and modification equally fall into the claims in the present invention and limited Fixed range.

Claims (10)

1. a kind of method of data collection and adjusting characterized by comprising more from multiple data sources at processing engine A data record;
Each multiple data records are processed into identical internal form from their own native format;
Will receive and format multiple data records be stored in first queue with etc. it is to be processed;
Multiple data records of formatting are pulled out from first queue to handle by the intake node of specified quantity;
First queue size is continuously monitored during this method and multiple data records of formatting are pulled out from first queue At least one of rate;
Determine one or two of first queue size or acceptance rate all except first acceptable range;
According to determination, from the intake node distribution of trend specified quantity or cancellation distribution intake node in processing method;
Second queue is sent by multiple data records of formatting from the intake node near-real-time of specified quantity;
The multiple data records of the formatting received are stored in second queue;
The multiple data records of formatting received are pulled out second queue from the data destination node of specified quantity to store;
Second queue size is continuously monitored during method and the received multiple data records of formatting are pulled out from second queue At least one of rate;
Determine one or two of second queue size or acceptance rate all except the second tolerance interval;
According to determination, data destination node is automatically distributed to during processing or is cancelled the data sink section for distributing to specified quantity Point;With
In nearly real time by multiple data records of each formatting be sent at least one of multiple data sinks so as to It is stored therein, wherein the multiple data records formatted use for multiple application programs.
2. the method as described in claim 1, which is characterized in that further include: when second queue size reaches predetermined limit, from It is dynamic to stop distribution intake node.
3. the method as described in claim 1, which is characterized in that further include:
Response of the continuous monitoring to the distribution of one of intake node and data gathering node, is handled up with determining whether to improve processing Amount;With if it is determined that processing handling capacity do not improved, then stop distributing.
4. the method as described in claim 1, which is characterized in that the first and second queues are Java Message Service (JMS) queues, And internal form is JMS format.
5. the method as described in claim 1, which is characterized in that further include:
Compare multiple data records and at least the of each formatting from intake node near real-time by handling engine One enrichment rule, to determine whether at least first enrichment rule is suitable at least one data element one or more.Lattice Multiple data records of formula;With
If applicable, it is enriched in one or more multiple data records formatted extremely by processing engine near-real-time A few data element, which has the additional data according at least first enrichment rule, to form one or more Format data record abundant.
6. the method as described in claim 1, which is characterized in that by each of the multiple data record from each Each of native format be processed into identical internal form further include:
Each of multiple data records near real-time is parsed into multiple component parts by least one resolver;With
Each of multiple data records are converted into using multiple component parts of its parsing by least one translater Identical internal form is translated near real-time.
7. the method as described in claim 1, which is characterized in that identical internal form includes multiple fields, wherein described more At least first in a field is common, and described for all multiple data records from the multiple data source At least second in multiple fields be unique multiple data records single class.
8. the method as described in claim 1, which is characterized in that the multiple data source includes the website from by relational database, RSS feeds, SIEM file, at least two sources that select in the group of email archives composition.
9. method as claimed in claim 6, which is characterized in that it includes comma separated value solution that at least one described resolver, which is selected from, Parser, email parser, exchangeable image file format (EXIF) resolver, the open symbol parser of JavaScript, The XML parser of Libcap resolver and one or more native formats according to multiple data records.
10. method as claimed in claim 5, which is characterized in that at least first enrichment rule is selected from algorithm enrichment rule Rule is enriched with dimension;The algorithm enriches the addition that rule is geographical location;The dimension is enriched with
The data element of multiple data records from each formatting is compared with the secondary data in data rich table; Data element is modified with according to auxiliary data.
CN201810851405.4A 2018-07-30 2018-07-30 A kind of method of data collection and adjusting Pending CN109117286A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810851405.4A CN109117286A (en) 2018-07-30 2018-07-30 A kind of method of data collection and adjusting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810851405.4A CN109117286A (en) 2018-07-30 2018-07-30 A kind of method of data collection and adjusting

Publications (1)

Publication Number Publication Date
CN109117286A true CN109117286A (en) 2019-01-01

Family

ID=64863552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810851405.4A Pending CN109117286A (en) 2018-07-30 2018-07-30 A kind of method of data collection and adjusting

Country Status (1)

Country Link
CN (1) CN109117286A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824691A (en) * 2015-01-08 2016-08-03 平安科技(深圳)有限公司 Method and device for dynamically regulating threads
CN105978968A (en) * 2016-05-11 2016-09-28 山东合天智汇信息技术有限公司 Real-time transmission processing method, server and system of mass data
CN107818120A (en) * 2016-09-14 2018-03-20 博雅网络游戏开发(深圳)有限公司 Data processing method and device based on big data
CN108134814A (en) * 2017-11-27 2018-06-08 海尔优家智能科技(北京)有限公司 A kind of business data processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824691A (en) * 2015-01-08 2016-08-03 平安科技(深圳)有限公司 Method and device for dynamically regulating threads
CN105978968A (en) * 2016-05-11 2016-09-28 山东合天智汇信息技术有限公司 Real-time transmission processing method, server and system of mass data
CN107818120A (en) * 2016-09-14 2018-03-20 博雅网络游戏开发(深圳)有限公司 Data processing method and device based on big data
CN108134814A (en) * 2017-11-27 2018-06-08 海尔优家智能科技(北京)有限公司 A kind of business data processing method and device

Similar Documents

Publication Publication Date Title
CN110908997B (en) Data blood relationship construction method and device, server and readable storage medium
CN110008045B (en) Method, device and equipment for aggregating microservices and storage medium
US8521871B2 (en) System and method for merging monitoring data streams from a server and a client of the server
US8392465B2 (en) Dependency graphs for multiple domains
US20090187534A1 (en) Transaction prediction modeling method
US8412721B2 (en) Efficient data extraction by a remote application
CN107506383B (en) Audit data processing method and computer equipment
WO2019085307A1 (en) Data sampling method, terminal, and device, and computer readable storage medium
CN111008020B (en) Method for analyzing logic expression into general query statement
CN108073625A (en) For the system and method for metadata information management
CN110147470B (en) Cross-machine-room data comparison system and method
CN102915344B (en) SQL (structured query language) statement processing method and device
CN112948492A (en) Data processing system, method and device, electronic equipment and storage medium
CN111382182A (en) Data processing method and device, electronic equipment and storage medium
CN109033312A (en) Method and apparatus for obtaining information
CN108182204A (en) The processing method and processing device of data query based on house prosperity transaction multi-dimensional data
CN105138676A (en) Sub-library and sub-table merge query method based on high-level language concurrent aggregation calculation
CN116842090A (en) Accounting system, method, equipment and storage medium
CN110874366A (en) Data processing and query method and device
US8229946B1 (en) Business rules application parallel processing system
CN109117286A (en) A kind of method of data collection and adjusting
US9092338B1 (en) Multi-level caching event lookup
CN109815118A (en) Data base management method and device, electronic equipment and computer readable storage medium
CN114116908A (en) Data management method and device and electronic equipment
CN109063201B (en) Impala online interactive query method based on mixed storage scheme

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190101

RJ01 Rejection of invention patent application after publication