CN109117286A - A kind of method of data collection and adjusting - Google Patents
A kind of method of data collection and adjusting Download PDFInfo
- Publication number
- CN109117286A CN109117286A CN201810851405.4A CN201810851405A CN109117286A CN 109117286 A CN109117286 A CN 109117286A CN 201810851405 A CN201810851405 A CN 201810851405A CN 109117286 A CN109117286 A CN 109117286A
- Authority
- CN
- China
- Prior art keywords
- multiple data
- data records
- formatting
- queue
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/37—Compiler construction; Parser generation
Abstract
The invention discloses a kind of methods of data collection and adjusting, comprising: receives multiple data records, carries out data processing and waiting;By treated, data are pulled out from first queue to handle;At least one of the rate of multiple data records of formatting is continuously monitored during this method;Determine size or acceptance rate whether except range of receiving;Second queue is sent by multiple data records of formatting and is saved;At least one of the rate of multiple data records of formatting is continuously monitored during this method;Determine size or acceptance rate whether except range of receiving;According to determination, data destination node is automatically distributed to during processing or is cancelled the data destination node for distributing to specified quantity;At least one of multiple data sinks simultaneously are sent to be stored therein by multiple data records of each formatting, wherein the multiple data records formatted use for multiple application programs.
Description
Technical field
The present invention relates to big data technical fields, in particular to a kind of method of data collection and adjusting.
Background technique
Usually to solve the problems, such as it is how to manage and analyze big data, for example, the data of big approximate number PB.Big data is wide
Free burial ground for the destitute is defined as data set, and size has exceeded popular software tool and captures within reasonable time, management, management and processing number
According to ability.About double about every two years for the information in the world.These information (or data) include crucial information, but to this information
Excavation to become cost excessively high, and need the long time for many end users and application program.Traditional data collection
History including filling relational database, structuring, the narrow subset of static data;Big data provides one for end user
Especially difficult problem because it be it is unlimited, can be structuring and non-structured, can often obtain in real time and
It can be iteration.For current relational database management system, such big data for no important process and
Say hell to pay, this is time-consuming and finally makes most of data out-of-date and be worth limited.
Summary of the invention
The invention proposes the methods of data collection and adjusting, comprising:
From the multiple data records of multiple data sources at processing engine;
Each multiple data records are processed into identical internal form from their own native format;
Will receive and format multiple data records be stored in first queue with etc. it is to be processed;
By the intake node of specified quantity by multiple data records of formatting from pulling out in first queue to carry out
Reason;
First queue size is continuously monitored during this method and multiple data note of formatting is pulled out from first queue
At least one of rate of record;
Determine one or two of first queue size or acceptance rate all except first acceptable range;
According to determination, from the intake node distribution of trend specified quantity or cancellation distribution intake node in processing method;
Second queue is sent by multiple data records of formatting from the intake node near-real-time of specified quantity;
The multiple data records of the formatting received are stored in second queue;
The multiple data records of formatting received are pulled out second queue from the data destination node of specified quantity to deposit
Storage;
Second queue size is continuously monitored during this method and the received multiple numbers of formatting are pulled out from second queue
According at least one of the rate of record;
Determine one or two of second queue size or acceptance rate all except the second tolerance interval;
According to determination, data destination node is automatically distributed to during processing or is cancelled the data sink for distributing to specified quantity
Node;With
At least one of multiple data sinks are sent by multiple data records of each formatting in nearly real time
To be stored therein, wherein the multiple data records formatted use for multiple application programs.
The method, further includes: when second queue size reaches predetermined limit, be automatically stopped distribution intake node.
The method, further includes:
Response of the continuous monitoring to the distribution of one of intake node and data gathering node, to determine whether to improve processing
Handling capacity;With if it is determined that processing handling capacity do not improved, then stop distributing.
The method, wherein the first and second queues are Java Message Service (JMS) queues, and internal form is
JMS format.
The method, further includes:
By handle engine compare near real-time from intake node each formatting multiple data records with extremely
Few first enrichment rule, to determine whether at least first enrichment rule is suitable at least one data element one or more
Element.The multiple data records formatted;With
If applicable, by the abundant one or more multiple data records formatted of processing engine near-real-time
At least one data element, the data element have according at least first enrichment rule additional data, with formed one or
Multiple format data records abundant.
The method, wherein by each of the multiple data record from every in their own native format
One is processed into identical internal form further include:
Each of multiple data records near real-time is parsed into multiple component parts by least one resolver;
Each of multiple data records are converted into phase using multiple component parts of its parsing with by least one translater
Same internal form, is translated near real-time.
The method, wherein identical internal form includes multiple fields, wherein at least in the multiple field
One is common for all multiple data records from the multiple data source, and in the multiple field at least
Second be unique multiple data records single class.
The method, wherein the multiple data source includes from by relational database, website, RSS feeds, SIEM text
Part, at least two sources that select in the group of email archives composition.
The method, wherein it includes comma separated value resolver, Email solution that at least one described resolver, which is selected from,
Parser, exchangeable image file format (EXIF) resolver, the group of JavaScript open symbol (JSON).) resolver,
The XML parser of Libcap resolver and one or more native formats according to multiple data records.
The method, wherein at least first enrichment rule is selected from algorithm enrichment rule and dimension enrichment rule.
The method, wherein the algorithm enriches the addition that rule is geographical location.
The method, wherein dimension enrichment includes:
Secondary data in the data element of multiple data records from each formatting and data rich table is carried out
Compare;Data element is modified with according to auxiliary data.
Detailed description of the invention
From following description with reference to the accompanying drawings it will be further appreciated that the present invention.Component in figure is not drawn necessarily to scale,
But it focuses on and shows in the principle of embodiment.In the figure in different views, identical appended drawing reference is specified to be corresponded to
Part.
Fig. 1 is the method schematic diagram of data collection and adjusting of the invention.
Specific embodiment
In order to enable the objectives, technical solutions, and advantages of the present invention are more clearly understood, below in conjunction with embodiment, to this
Invention is further elaborated;It should be appreciated that described herein, the specific embodiments are only for explaining the present invention, and does not have to
It is of the invention in limiting.To those skilled in the art, after access is described in detail below, other systems of the present embodiment
System, method and/or feature will become obvious.All such additional systems, method, feature and advantage are intended to be included in
It in this specification, is included within the scope of the invention, and by the protection of the appended claims.In description described in detail below
The other feature of the disclosed embodiments, and these characteristic roots will be apparent according to described in detail below.
Embodiment one:
As shown in Figure 1, for the schematic diagram of data collection of the present invention and the method for adjusting, comprising: include: to receive multiple data
Record carries out data processing and waiting;By treated, data are pulled out from first queue to handle;During this method
At least one of the rate for multiple data records that continuous monitoring formats;Determine size or acceptance rate whether in range of receiving
Except;Second queue is sent by multiple data records of formatting and is saved;Format is continuously monitored during this method
At least one of the rate for the multiple data records changed;Determine size or acceptance rate whether except range of receiving;According to true
It is fixed, data destination node is automatically distributed to during processing or is cancelled the data destination node for distributing to specified quantity;Simultaneously
At least one of multiple data sinks are sent by multiple data records of each formatting to be stored therein, wherein
The multiple data records formatted use for multiple application programs.
From the multiple data records of multiple data sources at processing engine;
Each multiple data records are processed into identical internal form from their own native format;
Will receive and format multiple data records be stored in first queue with etc. it is to be processed;
By the intake node of specified quantity by multiple data records of formatting from pulling out in first queue to carry out
Reason;
First queue size is continuously monitored during this method and multiple data note of formatting is pulled out from first queue
At least one of rate of record;
Determine one or two of first queue size or acceptance rate all except first acceptable range;
According to determination, from the intake node distribution of trend specified quantity or cancellation distribution intake node in processing method;
Second queue is sent by multiple data records of formatting from the intake node near-real-time of specified quantity;
The multiple data records of the formatting received are stored in second queue;
The multiple data records of formatting received are pulled out second queue from the data destination node of specified quantity to deposit
Storage;
Second queue size is continuously monitored during this method and the received multiple numbers of formatting are pulled out from second queue
According at least one of the rate of record;
Determine one or two of second queue size or acceptance rate all except the second tolerance interval;
According to determination, data destination node is automatically distributed to during processing or is cancelled the data sink for distributing to specified quantity
Node;With
At least one of multiple data sinks are sent by multiple data records of each formatting in nearly real time
To be stored therein, wherein the multiple data records formatted use for multiple application programs.
The method, further includes: when second queue size reaches predetermined limit, be automatically stopped distribution intake node.
The method, further includes:
Response of the continuous monitoring to the distribution of one of intake node and data gathering node, to determine whether to improve processing
Handling capacity;With if it is determined that processing handling capacity do not improved, then stop distributing.
The method, wherein the first and second queues are Java Message Service (JMS) queues, and internal form is
JMS format.
The method, further includes:
By handle engine compare near real-time from intake node each formatting multiple data records with extremely
Few first enrichment rule, to determine whether at least first enrichment rule is suitable at least one data element one or more
Element.The multiple data records formatted;With
If applicable, by the abundant one or more multiple data records formatted of processing engine near-real-time
At least one data element, the data element have according at least first enrichment rule additional data, with formed one or
Multiple format data records abundant.
The method, wherein by each of the multiple data record from every in their own native format
One is processed into identical internal form further include:
Each of multiple data records near real-time is parsed into multiple component parts by least one resolver;
With
Each of multiple data records are turned using multiple component parts of its parsing by least one translater
It changes identical internal form into, translates near real-time.
The method, wherein identical internal form includes multiple fields, wherein at least in the multiple field
One is common for all multiple data records from the multiple data source, and in the multiple field at least
Second be unique multiple data records single class.
The method, wherein the multiple data source includes from by relational database, website, RSS feeds, SIEM text
Part, at least two sources that select in the group of email archives composition.
The method, wherein it includes comma separated value resolver, Email solution that at least one described resolver, which is selected from,
Parser, exchangeable image file format (EXIF) resolver, the group of JavaScript open symbol (JSON).) resolver,
The XML parser of Libcap resolver and one or more native formats according to multiple data records.
The method, wherein at least first enrichment rule is selected from algorithm enrichment rule and dimension enrichment rule.
The method, wherein the algorithm enriches the addition that rule is geographical location.
The method, wherein dimension enrichment includes:
Secondary data in the data element of multiple data records from each formatting and data rich table is carried out
Compare;With
Data element is modified according to auxiliary data.
Embodiment two:
Data collection and adjusting method, comprising:
From multiple data records of multiple data sources difference acceptance rates at processing engine;
Each multiple data records are processed into identical internal form from their own native format, wherein handling
Including,
Each of multiple data records near real-time is parsed into multiple component parts by least one resolver;
With
It is by least one translater that each of multiple data records are close using multiple component parts of its parsing
Identical internal form is seemingly translated into real time;
Will receive and format multiple data records be stored in first queue with etc. it is to be processed;
By the intake node of specified quantity by multiple data records of formatting from pulling out in first queue to carry out
Reason;
First queue size is continuously monitored during this method and multiple data note of formatting is pulled out from first queue
At least one of rate of record;
Determine one or two of first queue size or acceptance rate all except first acceptable range;With
According to the acceptance rate for determining the peak being approximately equal in the acceptance rate of variation during this method, will automatically take the photograph
It takes node to distribute or cancel the intake node for being assigned to specified quantity or distributes intake node from the intake node of specified quantity.
The method, wherein by each of the multiple data record from every in their own native format
One is processed into identical internal form further include:
Each of multiple data records near real-time is parsed into multiple component parts by least one resolver;
With
Each multiple data records are converted into phase using multiple component parts of its parsing by least one translater
Same internal form is translated with carrying out near real-time.
The method, wherein the identical internal form includes multiple fields, wherein in the multiple field extremely
Few first is common for all multiple data records from the multiple data source, and in the multiple field
At least second be unique multiple data records single class.
The method, wherein the multiple data source includes from by relational database, website, RSS feeds, SIEM text
Part, at least two sources that select in the group of email archives composition.
The method, wherein it includes comma separated value resolver, Email solution that at least one described resolver, which is selected from,
Parser, exchangeable image file format (EXIF) resolver, the group of JavaScript open symbol (JSON).) resolver,
The XML parser of Libcap resolver and one or more native formats according to multiple data records.
The method, further includes:
By handle engine compare near real-time from intake node each formatting multiple data records with extremely
Few first enrichment rule, to determine whether at least first enrichment rule is suitable at least one data element one or more
Element.The multiple data records formatted;With
If applicable, by the abundant one or more multiple data records formatted of processing engine near-real-time
At least one data element, the data element have according at least first enrichment rule additional data, with formed one or
Multiple format data records abundant.
The method, wherein at least first enrichment rule is selected from algorithm enrichment rule and dimension enrichment rule.
The method, wherein the algorithm enriches the addition that rule is geographical location.
The method, wherein dimension enrichment includes:
Secondary data in the data element of multiple data records from each formatting and data rich table is carried out
Compare;Data element is modified with according to auxiliary data.
Although describing the present invention by reference to various embodiments above, but it is to be understood that of the invention not departing from
In the case where range, many changes and modifications can be carried out.That is methods discussed above, system or equipment etc. show
Example.Various configurations can be omitted suitably, replace or add various methods or component.For example, in alternative configuration, can with
Described order in a different order executes method, and/or can add, and omits and/or combine the various stages.Moreover, about
The feature of certain configuration descriptions can be combined with various other configurations.Can combine in a similar way configuration different aspect and
Element.In addition, many elements are only range of the example without limiting the disclosure or claims with the development of technology.
Give detail in the description to provide to the thorough understanding for including the exemplary configuration realized.However,
Configuration can be practiced without these specific details for example, having been illustrated with well-known circuit, method, calculation
Method, structure and technology are without unnecessary details, to avoid fuzzy configuration.The description only provides example arrangement, and unlimited
The scope of the claims processed, applicability or configuration.It is used on the contrary, front will provide the description of configuration for those skilled in the art
Realize the enabled description of described technology.It, can be to the function of element without departing from the spirit or the scope of the present disclosure
It can and arrange and carry out various changes.
In addition, many operations can be in parallel or concurrently although each operation can describe the operations as sequential grammar
It executes.Furthermore it is possible to rearrange the sequence of operation.One method may have other steps.Furthermore, it is possible to pass through hardware, soft
Part, firmware, middleware, code, hardware description language or any combination thereof carry out the example of implementation method.When software, firmware, in
Between when realizing in part or code, program code or code segment for executing necessary task can store in such as storage medium
In non-transitory computer-readable medium, and described task is executed by processor.
To sum up, be intended to foregoing detailed description be considered as it is illustrative and not restrictive, and it is to be understood that described
Claim (including all equivalents) is intended to limit the spirit and scope of the present invention.The above embodiment is interpreted as only using
In illustrating the present invention rather than limit the scope of the invention.After the content for having read record of the invention, technology
Personnel can make various changes or modifications the present invention, these equivalence changes and modification equally fall into the claims in the present invention and limited
Fixed range.
Claims (10)
1. a kind of method of data collection and adjusting characterized by comprising more from multiple data sources at processing engine
A data record;
Each multiple data records are processed into identical internal form from their own native format;
Will receive and format multiple data records be stored in first queue with etc. it is to be processed;
Multiple data records of formatting are pulled out from first queue to handle by the intake node of specified quantity;
First queue size is continuously monitored during this method and multiple data records of formatting are pulled out from first queue
At least one of rate;
Determine one or two of first queue size or acceptance rate all except first acceptable range;
According to determination, from the intake node distribution of trend specified quantity or cancellation distribution intake node in processing method;
Second queue is sent by multiple data records of formatting from the intake node near-real-time of specified quantity;
The multiple data records of the formatting received are stored in second queue;
The multiple data records of formatting received are pulled out second queue from the data destination node of specified quantity to store;
Second queue size is continuously monitored during method and the received multiple data records of formatting are pulled out from second queue
At least one of rate;
Determine one or two of second queue size or acceptance rate all except the second tolerance interval;
According to determination, data destination node is automatically distributed to during processing or is cancelled the data sink section for distributing to specified quantity
Point;With
In nearly real time by multiple data records of each formatting be sent at least one of multiple data sinks so as to
It is stored therein, wherein the multiple data records formatted use for multiple application programs.
2. the method as described in claim 1, which is characterized in that further include: when second queue size reaches predetermined limit, from
It is dynamic to stop distribution intake node.
3. the method as described in claim 1, which is characterized in that further include:
Response of the continuous monitoring to the distribution of one of intake node and data gathering node, is handled up with determining whether to improve processing
Amount;With if it is determined that processing handling capacity do not improved, then stop distributing.
4. the method as described in claim 1, which is characterized in that the first and second queues are Java Message Service (JMS) queues,
And internal form is JMS format.
5. the method as described in claim 1, which is characterized in that further include:
Compare multiple data records and at least the of each formatting from intake node near real-time by handling engine
One enrichment rule, to determine whether at least first enrichment rule is suitable at least one data element one or more.Lattice
Multiple data records of formula;With
If applicable, it is enriched in one or more multiple data records formatted extremely by processing engine near-real-time
A few data element, which has the additional data according at least first enrichment rule, to form one or more
Format data record abundant.
6. the method as described in claim 1, which is characterized in that by each of the multiple data record from each
Each of native format be processed into identical internal form further include:
Each of multiple data records near real-time is parsed into multiple component parts by least one resolver;With
Each of multiple data records are converted into using multiple component parts of its parsing by least one translater
Identical internal form is translated near real-time.
7. the method as described in claim 1, which is characterized in that identical internal form includes multiple fields, wherein described more
At least first in a field is common, and described for all multiple data records from the multiple data source
At least second in multiple fields be unique multiple data records single class.
8. the method as described in claim 1, which is characterized in that the multiple data source includes the website from by relational database,
RSS feeds, SIEM file, at least two sources that select in the group of email archives composition.
9. method as claimed in claim 6, which is characterized in that it includes comma separated value solution that at least one described resolver, which is selected from,
Parser, email parser, exchangeable image file format (EXIF) resolver, the open symbol parser of JavaScript,
The XML parser of Libcap resolver and one or more native formats according to multiple data records.
10. method as claimed in claim 5, which is characterized in that at least first enrichment rule is selected from algorithm enrichment rule
Rule is enriched with dimension;The algorithm enriches the addition that rule is geographical location;The dimension is enriched with
The data element of multiple data records from each formatting is compared with the secondary data in data rich table;
Data element is modified with according to auxiliary data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810851405.4A CN109117286A (en) | 2018-07-30 | 2018-07-30 | A kind of method of data collection and adjusting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810851405.4A CN109117286A (en) | 2018-07-30 | 2018-07-30 | A kind of method of data collection and adjusting |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109117286A true CN109117286A (en) | 2019-01-01 |
Family
ID=64863552
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810851405.4A Pending CN109117286A (en) | 2018-07-30 | 2018-07-30 | A kind of method of data collection and adjusting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109117286A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105824691A (en) * | 2015-01-08 | 2016-08-03 | 平安科技(深圳)有限公司 | Method and device for dynamically regulating threads |
CN105978968A (en) * | 2016-05-11 | 2016-09-28 | 山东合天智汇信息技术有限公司 | Real-time transmission processing method, server and system of mass data |
CN107818120A (en) * | 2016-09-14 | 2018-03-20 | 博雅网络游戏开发(深圳)有限公司 | Data processing method and device based on big data |
CN108134814A (en) * | 2017-11-27 | 2018-06-08 | 海尔优家智能科技(北京)有限公司 | A kind of business data processing method and device |
-
2018
- 2018-07-30 CN CN201810851405.4A patent/CN109117286A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105824691A (en) * | 2015-01-08 | 2016-08-03 | 平安科技(深圳)有限公司 | Method and device for dynamically regulating threads |
CN105978968A (en) * | 2016-05-11 | 2016-09-28 | 山东合天智汇信息技术有限公司 | Real-time transmission processing method, server and system of mass data |
CN107818120A (en) * | 2016-09-14 | 2018-03-20 | 博雅网络游戏开发(深圳)有限公司 | Data processing method and device based on big data |
CN108134814A (en) * | 2017-11-27 | 2018-06-08 | 海尔优家智能科技(北京)有限公司 | A kind of business data processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110908997B (en) | Data blood relationship construction method and device, server and readable storage medium | |
CN110008045B (en) | Method, device and equipment for aggregating microservices and storage medium | |
US8521871B2 (en) | System and method for merging monitoring data streams from a server and a client of the server | |
US8392465B2 (en) | Dependency graphs for multiple domains | |
US20090187534A1 (en) | Transaction prediction modeling method | |
US8412721B2 (en) | Efficient data extraction by a remote application | |
CN107506383B (en) | Audit data processing method and computer equipment | |
WO2019085307A1 (en) | Data sampling method, terminal, and device, and computer readable storage medium | |
CN111008020B (en) | Method for analyzing logic expression into general query statement | |
CN108073625A (en) | For the system and method for metadata information management | |
CN110147470B (en) | Cross-machine-room data comparison system and method | |
CN102915344B (en) | SQL (structured query language) statement processing method and device | |
CN112948492A (en) | Data processing system, method and device, electronic equipment and storage medium | |
CN111382182A (en) | Data processing method and device, electronic equipment and storage medium | |
CN109033312A (en) | Method and apparatus for obtaining information | |
CN108182204A (en) | The processing method and processing device of data query based on house prosperity transaction multi-dimensional data | |
CN105138676A (en) | Sub-library and sub-table merge query method based on high-level language concurrent aggregation calculation | |
CN116842090A (en) | Accounting system, method, equipment and storage medium | |
CN110874366A (en) | Data processing and query method and device | |
US8229946B1 (en) | Business rules application parallel processing system | |
CN109117286A (en) | A kind of method of data collection and adjusting | |
US9092338B1 (en) | Multi-level caching event lookup | |
CN109815118A (en) | Data base management method and device, electronic equipment and computer readable storage medium | |
CN114116908A (en) | Data management method and device and electronic equipment | |
CN109063201B (en) | Impala online interactive query method based on mixed storage scheme |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190101 |
|
RJ01 | Rejection of invention patent application after publication |