CN108694187A - The storage method and device of real-time streaming data - Google Patents
The storage method and device of real-time streaming data Download PDFInfo
- Publication number
- CN108694187A CN108694187A CN201710224721.4A CN201710224721A CN108694187A CN 108694187 A CN108694187 A CN 108694187A CN 201710224721 A CN201710224721 A CN 201710224721A CN 108694187 A CN108694187 A CN 108694187A
- Authority
- CN
- China
- Prior art keywords
- data
- time
- real
- streaming data
- time streaming
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of storage method of real-time streaming data and devices, it is related to technical field of information processing, main purpose is a large amount of parquet files produced by a data for solving to be transmitted by real-time system, inquiry system can be caused when carrying out inquiry operation, the problem of accessing all parquet files, influencing query performance.Technical solution includes:Receive real-time streaming data;The real-time streaming data is parsed, analysis result is obtained;According to the analysis result, the number of data of the real-time streaming data is determined;Judge whether the number of data of the real-time streaming data reaches preset data item number;If it is, distributed data query engine is written in the analysis result of the real-time streaming data.It is mainly used for the storage of real-time streaming data.
Description
Technical field
The present invention relates to technical field of information processing more particularly to the storage methods and device of a kind of real-time streaming data.
Background technology
As information processing gradually develops to big data processing direction, a kind of new distribution suitable for big data inquiry
Data query engine Impala comes into the visual field of people.Impala can provide determining for structured query language structure SQL
Justice, and in the data flow got in real time after parsing, it can be by real-time data memory in Impala.
Currently, existing often transmit a data in real-time system forward data stream, data will be stored into
In Impala, and the parquet files of this data are generated, in turn, will be produced after transmitting a data and being stored
A raw parquet file, a large amount of parquet files produced by the data transmitted by real-time system, can cause to look into
Inquiry system accesses all parquet files, influences query performance when carrying out inquiry operation, to reduce the efficiency of inquiry.
Invention content
In view of the above problems, it is proposed that the present invention in order to provide a kind of real-time streaming data storage method and device, mainly
Purpose is a large amount of parquet files produced by a data for solving to be transmitted by real-time system, inquiry system can be caused to exist
When carrying out inquiry operation, the problem of accessing all parquet files, influence query performance.
By above-mentioned technical proposal, a kind of storage method of real-time streaming data provided by the invention, including:
Receive real-time streaming data;
The real-time streaming data is parsed, analysis result is obtained;
According to the analysis result, the number of data of the real-time streaming data is determined;
Judge whether the number of data of the real-time streaming data reaches preset data item number;
If it is, distributed data query engine is written in the analysis result of the real-time streaming data.
Further, described to judge that the number of data of the real-time streaming data is not up to after preset data item number, it is described
Method further includes:
Judge whether the time for receiving the real-time streaming data for the first time reaches default to current time elapsed time
Time interval, alternatively, judging that the time that the distributed data query engine is written from last time data is passed through to current time
Time whether reach prefixed time interval;
If it is, distributed data query engine is written in the analysis result of the real-time streaming data.
Further, the method further includes:
Execute whether the number of data for judging the real-time streaming data reaches preset data item number using first thread,
And whether the time point that the judgement receives the real-time streaming data for the first time reaches pre- to current time elapsed time
If time interval, alternatively, judging that the time that the distributed data query engine is written from last time data is passed through to current time
The step of whether time crossed reaches prefixed time interval;
It is executed the analysis result write-in point of the real-time streaming data using with independent second thread of the first thread
The step of cloth data query engine.
Further, described before whether the number of data for judging the real-time streaming data reaches preset data item number
Method further includes:
The preset data item is being configured with the first thread, the third thread of second thread independently
Number, and in the first thread, second thread, the 4th thread of the third thread independently configuration described in
Prefixed time interval.
Further, it is parsed to the real-time streaming data, after obtaining analysis result, the method further includes:
The analysis result is stored into preset buffer memory;
According to the analysis result, determine that the number of data of the real-time streaming data includes:
According to the analysis result stored in the preset buffer memory, the number of data of the real-time streaming data is determined.
By above-mentioned technical proposal, a kind of storage device of real-time streaming data provided by the invention, including:
Receiving unit, for receiving real-time streaming data;
Resolution unit obtains analysis result for being parsed to the real-time streaming data;
Determination unit, for according to the analysis result, determining the number of data of the real-time streaming data;
First judging unit, for judging whether the number of data of the real-time streaming data reaches preset data item number;
Writing unit, if for reaching preset data item number, by the analysis result write-in point of the real-time streaming data
Cloth data query engine.
Further, described device further includes:Second judgment unit,
The second judgment unit, for judging that the time for receiving the real-time streaming data for the first time is passed through to current time
Whether the time crossed reaches prefixed time interval, alternatively, judging that the distributed data query engine is written from last time data
Whether time to current time elapsed time reaches prefixed time interval;
Said write unit, if being additionally operable to reach prefixed time interval, by the analysis result of the real-time streaming data
Distributed data query engine is written.
Further, described device further includes:
First judging unit, specifically for executing the data for judging the real-time streaming data using first thread
Whether item number reaches preset data item number;
The second judgment unit receives the real-time streams for the first time specifically for executing the judgement using first thread
Whether the time point of data to current time elapsed time reaches prefixed time interval, alternatively, judging to write from last time data
Whether the time for entering the distributed data query engine reaches the step of prefixed time interval to current time elapsed time
Suddenly;
Said write unit is specifically used for executing the real-time streams using with independent second thread of the first thread
The step of analysis result write-in distributed data query engine of data.
Further, described device further includes:
Dispensing unit, for configuring institute with the first thread, the third thread of second thread independently
State preset data item number, and with the first thread, second thread, the 4th thread of the third thread independently
The middle configuration prefixed time interval.
Further, described device further includes:Storage unit,
Storage unit, for storing the analysis result into preset buffer memory;
The determination unit, specifically for according to the analysis result stored in the preset buffer memory, determining the real-time streams
The number of data of data.
By above-mentioned technical proposal, technical solution provided in an embodiment of the present invention at least has following advantages:
The storage method and device of a kind of real-time streaming data provided in an embodiment of the present invention, receive real-time streaming data first;
The real-time streaming data is parsed, analysis result is obtained;According to the analysis result, the number of the real-time streaming data is determined
According to item number;Judge whether the number of data of the real-time streaming data reaches preset data item number;If it is, by the real-time streams
Distributed data query engine is written in the analysis result of data.With it is existing direct by a data for transmitting real-time system
It is stored in Impala, generates a large amount of parquet files and compare, the embodiment of the present invention passes through the real-time streaming data that will receive
It is parsed, whether real-time streaming data number of data reaches preset data item number after judging parsing, if reaching, by what is received
Distributed data query engine is written in all real-time streaming data analysis results so that multiple real-time streaming datas are looked into distributed data
It askes and is stored with a parquet document form in engine, reduce the quantity of parquet files, improve distributed number
When carrying out inquiry operation according to query engine, to the speed of parquet file accesss, to improve the efficiency of inquiry.
Above description is only the general introduction of technical solution of the present invention, in order to better understand the technical means of the present invention,
And can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, below the special specific implementation mode for lifting the present invention.
Description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit are common for this field
Technical staff will become clear.Attached drawing only for the purpose of illustrating preferred embodiments, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of the storage method for real-time streaming data that inventive embodiments provide;
Fig. 2 shows the flow charts of the storage method of another real-time streaming data of inventive embodiments offer
Fig. 3 shows a kind of block diagram of the storage device for real-time streaming data that inventive embodiments provide;
Fig. 4 shows the block diagram of the storage device for another real-time streaming data that inventive embodiments provide.
Specific implementation mode
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Completely it is communicated to those skilled in the art.
The embodiment of the present invention provides a kind of storage method of real-time streaming data, as shown in Figure 1, the method includes:
101, real-time streaming data is received.
Wherein, the real-time streaming data is the data that real-time system acquires in real time, for distributed data query engine
For Impala, the data of real-time streaming system real-time Transmission can be received, so as to operations such as the inquiry, the storages that carry out data.
Since real-time system is relatively high to the requirement of real-time of data, inquiry is stored in time for the real-time streaming data needs received
In engine, to meet the requirement of real-time to data, it is ensured that the accuracy of query result, still, directly by real-time reception
Every data be stored in query engine and will produce a large amount of intermediate files, it is relatively low to cause search efficiency, therefore can just propose
One kind not only may insure real-time property, but also the method that can improve search efficiency.
102, the real-time streaming data is parsed, obtains analysis result.
Since the data format of real-time streaming data or type that receive are not necessarily unified, need real-time fluxion
According to unification is carried out, for example, real-time streaming data may be binary system, character string etc., the real-time streaming data to receiving can be passed through
It is parsed, so that it is guaranteed that the real-time streaming data after parsing can keep the unification of data format or type, the embodiment of the present invention
The data format for parsing front and back real-time streaming data is not limited specifically.
103, according to the analysis result, the number of data of the real-time streaming data is determined.
Wherein, the number of data is the base unit that real-time streaming system carries out data counts, according to real-time streaming data
Analysis result is readily apparent that the number of data of received real-time streaming data, subsequently to judge whether number of data reaches
Preset data item number.
104, judge whether the number of data of the real-time streaming data reaches preset data item number.
Wherein, the preset data item number can require height and data storage, inquiry speed according to real-time property
Requirement of degree etc. carries out synthetic setting, requires real-time property higher occasion, and the value of preset data item number is smaller, and
The occasion higher to data storage, search efficiency then requires data storage item number fewer, generally requires the preset data item of setting
Number is bigger, needs to be obtained according to the requirement of various aspects progress COMPREHENSIVE CALCULATING in practice.
If 105, reaching preset data item number, the analysis result write-in distributed data of the real-time streaming data is looked into
Ask engine.
Wherein, said write distributed data query engine is that the real-time streaming data after parsing is written with a time point
In distributed data query engine Impala, to generate a parquet text in distributed data query engine Impala
Part.
It should be noted that Impala can be stored according to the data that transmit in real time, shape one by one according to this in full
Formula is stored, and every corresponding parquet file will be generated under the subregion of Impala, so, Impala is with time point
Form carries out storage and generates parquet files, therefore, when Impala is written in the analysis result of the real-time streaming data,
A parquet file is just will produce, to ensure the real-time of data, nor affects on data depositing in inquiry system
Storage.
For the embodiment of the present invention, if preset data item number is not achieved after step 104, return to step 101 waits counting
Write operation is executed again after reaching preset data item number according to item number.
The storage method of a kind of real-time streaming data provided in an embodiment of the present invention, with existing by transmitting real-time system
One data is stored directly in Impala, is generated a large amount of parquet files and is compared, the embodiment of the present invention will be by that will receive
To real-time streaming data parsed, judge parsing after real-time streaming data number of data whether reach preset data item number, if reaching
It arrives, then distributed data query engine is written into all real-time streaming data analysis results received so that multiple real-time fluxions
It is stored with a parquet document form according in distributed data query engine, reduces the number of parquet files
Amount, to the speed of parquet file accesss, is looked into when improving distributed data query engine progress inquiry operation to improve
The efficiency of inquiry.
The embodiment of the present invention additionally provides the storage method of another real-time streaming data, as shown in Fig. 2, the method packet
It includes:
201, real-time streaming data is received.
This step is identical as step 101 method shown in FIG. 1, and details are not described herein.
202, the real-time streaming data is parsed, obtains analysis result.
This step is identical as step 102 method shown in FIG. 1, and details are not described herein.
203, the analysis result is stored into preset buffer memory.
Wherein, the preset buffer memory is a spatial cache for storing the real-time streaming data after all parsings, can
Think local cache or high in the clouds caching etc., the embodiment of the present invention is not specifically limited.By storing the analysis result into pre-
If in caching so that the real-time streaming data after parsing can be carried out buffer memory before carrying out judging item number, avoid data
Loss, so as to directly from preset buffer memory determine real-time streaming data item number.
It should be noted that the cache size of preset buffer memory can be real according to the ability or reception that receive real-time streaming data
When flow data density degree set, the embodiment of the present invention is not specifically limited, and the real-time streaming data in preset buffer memory exists
It is written after distributed data query engine, the data in preset buffer memory is deleted, to receive real-time streams next time
Analysis result is stored in preset buffer memory after data.
204, according to the analysis result stored in the preset buffer memory, the number of data of the real-time streaming data is determined.
By according to the analysis result stored in the preset buffer memory, determining the number of data of the real-time streaming data, with
Just judge whether to need that distributed data query engine is written according to number of data, avoid the real-time streaming data received in determination
The case where going out active before number of data.
205, judge whether the number of data of the real-time streaming data reaches preset data item number.
This step is identical as step 104 method shown in FIG. 1, and details are not described herein.
For the embodiment of the present invention, step 205 is specifically as follows executes the judgement real-time streams using first thread
Whether the number of data of data reaches the step of preset data item number.
Wherein, the first thread is individually to be matched relative to other threads be currently running or operation suspension by system
The thread set can be responsible for executing involved in the embodiment of the present invention the step of receiving, parsing, storing, determine and judge, especially
It is to judge whether the number of data of the real-time streaming data reaches the step of preset data item number.Utilize relatively independent First Line
Journey executes the step of judging number of data so that even if which step is first thread go to and do not affect other journeys in system
The operation of sequence can both achieve the purpose that store real-time streaming data.
If 206a, reaching preset data item number, distributed data is written into the analysis result of the real-time streaming data
Query engine.
This step is identical as step 105 method shown in FIG. 1, and details are not described herein.
In present invention embodiment shown in FIG. 1 and the present embodiment, the real-time streaming data that is received by different moments
Density degree it is different, in order to prevent in the case of Sparse, Impala databases are not written in the long time, to lead
The accuracy of the real-time or data query result that cause data is affected, and can be not up in the real-time streaming data received
In the case of preset data item number, increase the step of time interval judges, thus when data transmission is sparse, it also can be according to pre-
If Impala databases are written in time interval.If as shown in Fig. 2, the step 206b arranged side by side with step 206a, not up to default
Number of data, judges whether the time for receiving the real-time streaming data for the first time reaches default to current time elapsed time
Time interval, alternatively, judging that the time that the distributed data query engine is written from last time data is passed through to current time
Time whether reach prefixed time interval.
Wherein, the time received for the first time is the time for receiving first real-time streaming data, due to judging data strip
Whether number reaches preset data item number or judges whether current time reaches the negative decision of prefixed time interval and be to revert to and connect
The step of receiving real-time streaming data, so may be needed by multiple before executing the step of distributed data query engine is written
Cycle, so the last time is the corresponding last time the step of relative to this needing that distributed data query engine is written, in
Between may include repeatedly only when judging whether number of data reaches preset data item number or judge whether current time reaches default
Between be spaced, without the step of writing data into distributed data query engine, for example, current time be judge for the 3rd time it is real-time
The time of as the 2nd time data write-in Impala in Impala is written in the time of the number of data of flow data, last time data.
It should be noted that when very few due to number of data, after some time it is possible to reach the time of preset data item number is longer, and longer
Time data is written less than the data query effect that in Impala, can influence Impala, therefore, can be according to the density journey of data
Configuration setting prefixed time interval inside degree or Impala can also will according to prefixed time interval so as to when data are less
Data are written in Impala, can be 1 hour, 10 minutes etc., the embodiment of the present invention is not specifically limited.
For the embodiment of the present invention, step 206b is specifically as follows to be received for the first time using the first thread execution judgement
Whether the time point of the real-time streaming data to current time elapsed time reaches prefixed time interval, alternatively, judging certainly
When whether time to the current time elapsed time that the distributed data query engine is written in last time data reaches default
Between the step of being spaced.
Wherein, it is identical thread that the first thread, which is with the thread in step 205, and details are not described herein.
It, will be described if step 207b after step 206b, reaching prefixed time interval for the embodiment of the present invention
Distributed data query engine is written in the analysis result of real-time streaming data.
By reaching prefixed time interval, distributed data query engine is written into the analysis result of the real-time streaming data
In, it is embodied as being written in distributed data query engine and increases Rule of judgment, overlong time is avoided to write data into not yet point
Cloth data query engine, influences the effect of data query.
For the embodiment of the present invention, further include after step 206b:If not reaching prefixed time interval, return to step
201。
For the embodiment of the present invention, specifically may be used in step 206a and step 207b and first thread independence
The second thread execute the step of distributed data query engine is written into the analysis result of the real-time streaming data.
Wherein, second thread is that system is separately configured relative to other threads be currently running or operation suspension
Thread, and also relatively independent with first thread, second thread is dedicated for executing the parsing of the real-time streaming data
As a result the step of distributed data query engine is written.I.e. whenever needing the distributed number of the analysis result of real-time streaming data write-in
It when according to query engine, is intended to be executed using the second thread, so that the write step in the embodiment of the present invention of execution can be with
It individually carries out, does not influence the execution of other steps.
Further, the embodiment of the present invention can also include:With the first thread, second thread independently
Third thread in configure the preset data item number, and with the first thread, second thread, the third line
The prefixed time interval is configured in the 4th thread of journey independently.
Wherein, the third thread, the 4th thread are mutually independent with first thread, the second thread, and third thread
It is also mutual indepedent with the 4th thread, it is also mutual indepedent with being carrying out in system or suspending the program executed.By using
Three thread configuration preset data item numbers, using the 4th thread configuration prefixed time interval, so as to preset data item number and it is default when
Between be spaced numerical value setting can be configured outside system program, realize hot-swap, not only can arbitrarily change numerical value, but also not
It needs to restart program.
For the embodiment of the present invention, specific application scenarios can be as follows, but not limited to this, including:It connects for the first time
Real-time streaming data is received, real-time streaming data is resolved into character string forms, the character string after parsing is then stored in preset buffer memory
In, determine that the item number of data in preset buffer memory is 30,000, preset data item number is 50,000, does not reach preset data item number, after
Real-time streaming data is received in continued access, judges to determine that the time of number of data 40,000 is 1 hour second, is judged 1 hour using first thread
Prefixed time interval half an hour is reached, then the real-time streaming data after parsing for being 40,000 by item number is using the write-in distribution of the second thread
Formula data query engine.
The storage method of another kind real-time streaming data provided in an embodiment of the present invention, the embodiment of the present invention will be by that will use the
One thread judges the number of data for the real-time streaming data that parsing is stored in preset buffer memory, if not reaching preset data item number,
Then judge whether time to the current time for receiving real-time streaming data for the first time reaches prefixed time interval, if reaching, profit
Distributed data query engine Impala is written into the data after parsing with the second thread, and is changed using third, the 4th thread
Preset data item number and prefixed time interval so that Impala generates a corresponding parquet file, and data is avoided to be stored in
When in preset buffer memory, does not achieve the effect that predetermined threshold value for a long time and influence output inquiry, reduce the number of parquet files
Amount, may be implemented the hot-swap between different step, be restarted when avoiding change preset data item number with prefixed time interval
System, when improving inquiry system progress inquiry operation, to the speed of parquet file accesss, to improve the efficiency of inquiry.
Further, the specific implementation as method shown in Fig. 1, the embodiment of the present invention provide a kind of depositing for real-time streaming data
Storage device, as shown in figure 3, described device may include:Receiving unit 31, resolution unit 32, determination unit 33, first judge single
Member 34, writing unit 35.
Receiving unit 31, for receiving real-time streaming data;The receiving unit 31 is that a kind of storage of real-time streaming data fills
Set the function module for executing and receiving real-time streaming data.
Resolution unit 32 obtains analysis result for being parsed to the real-time streaming data;The resolution unit 32 is
A kind of storage device execution of real-time streaming data parses the real-time streaming data, obtains the function module of analysis result.
Determination unit 33, for according to the analysis result, determining the number of data of the real-time streaming data;The determination
Unit 33 is that a kind of storage device of real-time streaming data is executed according to the analysis result, determines the data of the real-time streaming data
The function module of item number.
First judging unit 34, for judging whether the number of data of the real-time streaming data reaches preset data item number;
First judging unit 34 is that a kind of storage device of real-time streaming data executes the number of data for judging the real-time streaming data
Whether the function module of preset data item number is reached.
If the analysis result of the real-time streaming data is written for reaching preset data item number for writing unit 35
Distributed data query engine.If the storage device execution that said write unit 35 is a kind of real-time streaming data reaches present count
According to item number, then the analysis result of the real-time streaming data is written to the function module of distributed data query engine.
The device embodiment is corresponding with preceding method embodiment, and for ease of reading, present apparatus embodiment is no longer to aforementioned side
Detail content in method embodiment is repeated one by one, it should be understood that the device in the present embodiment can correspond to realize it is aforementioned
Full content in embodiment of the method.
The storage device of a kind of real-time streaming data provided in an embodiment of the present invention, with existing by transmitting real-time system
One data is stored directly in Impala, is generated a large amount of parquet files and is compared, the embodiment of the present invention will be by that will receive
To real-time streaming data parsed, judge parsing after real-time streaming data number of data whether reach preset data item number, if reaching
It arrives, then distributed data query engine is written into all real-time streaming data analysis results received so that multiple real-time fluxions
It is stored with a parquet document form according in distributed data query engine, reduces the number of parquet files
Amount, to the speed of parquet file accesss, is looked into when improving distributed data query engine progress inquiry operation to improve
The efficiency of inquiry.
Further, the specific implementation as method shown in Fig. 2, the embodiment of the present invention provide another real-time streaming data
Storage device, as shown in figure 4, described device may include:Receiving unit 41, resolution unit 42, determination unit 43, first judge
Unit 44, writing unit 45, second judgment unit 46, dispensing unit 47, storage unit 48.
Receiving unit 41, for receiving real-time streaming data;
Resolution unit 42 obtains analysis result for being parsed to the real-time streaming data;
Determination unit 43, for according to the analysis result, determining the number of data of the real-time streaming data;
First judging unit 44, for judging whether the number of data of the real-time streaming data reaches preset data item number;
If the analysis result of the real-time streaming data is written for reaching preset data item number for writing unit 45
Distributed data query engine.
Further, the efficiency in distributed data query engine, institute is written in order to further improve real-time streaming data
Stating device further includes:Second judgment unit 46,
The second judgment unit 46, for judging to receive time of the real-time streaming data for the first time to current time institute
Whether elapsed time reaches prefixed time interval, alternatively, judging that the distributed data query engine is written from last time data
Time to current time elapsed time whether reach prefixed time interval;
Said write unit 45, if being additionally operable to reach prefixed time interval, by the parsing knot of the real-time streaming data
Distributed data query engine is written in fruit.
First judging unit 44, specifically for executing the number for judging the real-time streaming data using first thread
Whether reach preset data item number according to item number;
The second judgment unit 46, it is described real-time specifically for being received for the first time using the first thread execution judgement
Whether the time point of flow data to current time elapsed time reaches prefixed time interval, alternatively, judging from last time data
Whether time to the current time elapsed time that the distributed data query engine is written reaches prefixed time interval
Step;
Said write unit 45, will be described real-time with the independent second thread execution of the first thread specifically for using
The step of analysis result write-in distributed data query engine of flow data.
Further, in order to configure preset data item number and prefixed time interval in different threads, realize hot-swap,
Described device further includes:
Dispensing unit 47, for being configured with the first thread, the third thread of second thread independently
The preset data item number, and with the first thread, second thread, the 4th line of the third thread independently
The prefixed time interval is configured in journey.
Further, described device further includes:Storage unit 48,
The storage unit 48, for storing the analysis result into preset buffer memory;
The determination unit 43, specifically for according to the analysis result stored in the preset buffer memory, determining described real-time
The number of data of flow data.
The device embodiment is corresponding with preceding method embodiment, and for ease of reading, present apparatus embodiment is no longer to aforementioned side
Detail content in method embodiment is repeated one by one, it should be understood that the device in the present embodiment can correspond to realize it is aforementioned
Full content in embodiment of the method.
The storage device of another kind real-time streaming data provided in an embodiment of the present invention, the embodiment of the present invention will be by that will use the
One thread judges the number of data for the real-time streaming data that parsing is stored in preset buffer memory, if not reaching preset data item number,
Then judge whether time to the current time for receiving real-time streaming data for the first time reaches prefixed time interval, if reaching, profit
Distributed data query engine Impala is written into the data after parsing with the second thread, and is changed using third, the 4th thread
Preset data item number and prefixed time interval so that Impala generates a corresponding parquet file, and data is avoided to be stored in
When in preset buffer memory, does not achieve the effect that predetermined threshold value for a long time and influence output inquiry, reduce the number of parquet files
Amount, may be implemented the hot-swap between different step, be restarted when avoiding change preset data item number with prefixed time interval
System, when improving inquiry system progress inquiry operation, to the speed of parquet file accesss, to improve the efficiency of inquiry.
The storage device of the real-time streaming data includes processor and memory, above-mentioned receiving unit, resolution unit, determination
Unit, the first judging unit and writing unit etc. are used as program unit storage in memory, are stored in by processor execution
Above procedure unit in memory realizes corresponding function.
Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be arranged one
Or more, it is literary to solve a large amount of parquet produced by the data transmitted by real-time system by adjusting kernel parameter
The problem of part can cause inquiry system when carrying out inquiry operation, access all parquet files, influence query performance.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include at least one deposit
Store up chip.
Present invention also provides a kind of computer program products, when being executed on data processing equipment, are adapted for carrying out just
The program code of beginningization there are as below methods step:Receive real-time streaming data;The real-time streaming data is parsed, is parsed
As a result;According to the analysis result, the number of data of the real-time streaming data is determined;Judge the data strip of the real-time streaming data
Whether number reaches preset data item number;If it is, by the analysis result write-in distributed data inquiry of the real-time streaming data
Engine.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, the application can be used in one or more wherein include computer usable program code computer
The computer program production implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The application is with reference to method, the flow of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram
The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided
Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real
The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to
Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or
The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie
The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or other magnetic storage apparatus
Or any other non-transmission medium, it can be used for storage and can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It these are only embodiments herein, be not intended to limit this application.To those skilled in the art,
The application can have various modifications and variations.It is all within spirit herein and principle made by any modification, equivalent replacement,
Improve etc., it should be included within the scope of claims hereof.
Claims (10)
1. a kind of storage method of real-time streaming data, which is characterized in that including:
Receive real-time streaming data;
The real-time streaming data is parsed, analysis result is obtained;
According to the analysis result, the number of data of the real-time streaming data is determined;
Judge whether the number of data of the real-time streaming data reaches preset data item number;
If it is, distributed data query engine is written in the analysis result of the real-time streaming data.
2. according to the method described in claim 1, it is characterized in that, judging that the number of data of the real-time streaming data is not up to
After preset data item number, the method further includes:
Judge whether time to the current time elapsed time for receiving the real-time streaming data for the first time reaches preset time
Interval, alternatively, judge from last time data be written time of the distributed data query engine to current time passed through when
Between whether reach prefixed time interval;
If it is, distributed data query engine is written in the analysis result of the real-time streaming data.
3. according to the method described in claim 2, it is characterized in that:
Execute whether the number of data for judging the real-time streaming data reaches preset data item number using first thread, and
When whether time point to the current time elapsed time that the judgement receives the real-time streaming data for the first time reaches default
Between be spaced, alternatively, judging what the time that the distributed data query engine is written from last time data was passed through to current time
The step of whether time reaches prefixed time interval;
Distribution is written into the analysis result of the real-time streaming data using being executed with independent second thread of the first thread
The step of data query engine.
4. according to the method described in claim 3, it is characterized in that, whether being reached in the number of data for judging the real-time streaming data
To before preset data item number, the method further includes:
The preset data item number is being configured with the first thread, the third thread of second thread independently, with
And with configured in the first thread, second thread, the 4th thread of the third thread independently it is described default
Time interval.
5. method according to any one of claims 1 to 4, which is characterized in that it is parsed to the real-time streaming data,
After obtaining analysis result, the method further includes:
The analysis result is stored into preset buffer memory;
According to the analysis result, determine that the number of data of the real-time streaming data includes:
According to the analysis result stored in the preset buffer memory, the number of data of the real-time streaming data is determined.
6. a kind of storage device of real-time streaming data, which is characterized in that including:
Receiving unit, for receiving real-time streaming data;
Resolution unit obtains analysis result for being parsed to the real-time streaming data;
Determination unit, for according to the analysis result, determining the number of data of the real-time streaming data;
First judging unit, for judging whether the number of data of the real-time streaming data reaches preset data item number;
If writing unit the analysis result of the real-time streaming data is written distributed for reaching preset data item number
Data query engine.
7. device according to claim 6, which is characterized in that described device further includes:Second judgment unit,
The second judgment unit, for judging what the time for receiving the real-time streaming data for the first time was passed through to current time
Whether the time reaches prefixed time interval, alternatively, judging to be written the time of the distributed data query engine from last time data
Whether reach prefixed time interval to current time elapsed time;
The analysis result of the real-time streaming data is written if being additionally operable to reach prefixed time interval for said write unit
Distributed data query engine.
8. device according to claim 7, it is characterised in that:
First judging unit, specifically for executing the number of data for judging the real-time streaming data using first thread
Whether preset data item number is reached;
The second judgment unit receives the real-time streaming data for the first time specifically for executing the judgement using first thread
Time point to current time elapsed time whether reach prefixed time interval, alternatively, judge from last time data be written institute
The step of stating time to current time elapsed time of distributed data query engine and whether reach prefixed time interval;
Said write unit is specifically used for executing the real-time streaming data using with independent second thread of the first thread
Analysis result write-in distributed data query engine the step of.
9. device according to claim 8, which is characterized in that described device further includes:
Dispensing unit, for configured in the first thread, the third thread of second thread independently it is described pre-
If number of data, and match with the first thread, second thread, the 4th thread of the third thread independently
Set the prefixed time interval.
10. according to claim 6 to 9 any one of them device, which is characterized in that described device further includes:Storage unit,
The storage unit, for storing the analysis result into preset buffer memory;
The determination unit, specifically for according to the analysis result stored in the preset buffer memory, determining the real-time streaming data
Number of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710224721.4A CN108694187A (en) | 2017-04-07 | 2017-04-07 | The storage method and device of real-time streaming data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710224721.4A CN108694187A (en) | 2017-04-07 | 2017-04-07 | The storage method and device of real-time streaming data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108694187A true CN108694187A (en) | 2018-10-23 |
Family
ID=63842854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710224721.4A Pending CN108694187A (en) | 2017-04-07 | 2017-04-07 | The storage method and device of real-time streaming data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108694187A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977334A (en) * | 2019-03-26 | 2019-07-05 | 浙江度衍信息技术有限公司 | Retrieval rate optimization method |
CN113296962A (en) * | 2021-07-26 | 2021-08-24 | 阿里云计算有限公司 | Memory management method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102118268A (en) * | 2011-02-18 | 2011-07-06 | 中兴通讯股份有限公司 | Telephone traffic data storage method and system |
CN102646121A (en) * | 2012-02-23 | 2012-08-22 | 武汉大学 | Two-stage storage method combined with RDBMS (relational database management system) and Hadoop cloud storage |
CN103853671A (en) * | 2012-12-07 | 2014-06-11 | 北京百度网讯科技有限公司 | Data writing control method and device |
CN104967807A (en) * | 2014-12-30 | 2015-10-07 | 浙江大华技术股份有限公司 | Caching method and apparatus |
CN105446893A (en) * | 2014-07-14 | 2016-03-30 | 阿里巴巴集团控股有限公司 | Data storage method and device |
-
2017
- 2017-04-07 CN CN201710224721.4A patent/CN108694187A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102118268A (en) * | 2011-02-18 | 2011-07-06 | 中兴通讯股份有限公司 | Telephone traffic data storage method and system |
CN102646121A (en) * | 2012-02-23 | 2012-08-22 | 武汉大学 | Two-stage storage method combined with RDBMS (relational database management system) and Hadoop cloud storage |
CN103853671A (en) * | 2012-12-07 | 2014-06-11 | 北京百度网讯科技有限公司 | Data writing control method and device |
CN105446893A (en) * | 2014-07-14 | 2016-03-30 | 阿里巴巴集团控股有限公司 | Data storage method and device |
CN104967807A (en) * | 2014-12-30 | 2015-10-07 | 浙江大华技术股份有限公司 | Caching method and apparatus |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977334A (en) * | 2019-03-26 | 2019-07-05 | 浙江度衍信息技术有限公司 | Retrieval rate optimization method |
CN109977334B (en) * | 2019-03-26 | 2023-10-20 | 浙江度衍信息技术有限公司 | Search speed optimization method |
CN113296962A (en) * | 2021-07-26 | 2021-08-24 | 阿里云计算有限公司 | Memory management method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101775227B1 (en) | Techniques for routing service chain flow packets between virtual machines | |
US10705935B2 (en) | Generating job alert | |
WO2019042312A1 (en) | Distributed computing system, data transmission method and device in distributed computing system | |
US10331669B2 (en) | Fast query processing in columnar databases with GPUs | |
US20160182320A1 (en) | Techniques to generate a graph model for cloud infrastructure elements | |
CN105843819B (en) | Data export method and device | |
US10862765B2 (en) | Allocation of shared computing resources using a classifier chain | |
WO2015149514A1 (en) | Virtual machine deploying method and apparatus | |
CN107229747A (en) | A kind of large-scale data processing unit and method based on Stream Processing framework | |
CN110555038A (en) | Data processing system, method and device | |
CN111124708B (en) | Microservice-oriented batch reasoning method, server and computer readable storage medium | |
US20170359398A1 (en) | Efficient Sorting for a Stream Processing Engine | |
CN108694187A (en) | The storage method and device of real-time streaming data | |
CN109726096A (en) | A kind of test data generating method and device | |
US10915704B2 (en) | Intelligent reporting platform | |
US10198784B2 (en) | Capturing commands in a multi-engine graphics processing unit | |
CN109426439A (en) | The method and device of dilatation is carried out to distributed memory system | |
US9898614B1 (en) | Implicit prioritization to rate-limit secondary index creation for an online table | |
US11250001B2 (en) | Accurate partition sizing for memory efficient reduction operations | |
CN110019497A (en) | A kind of method for reading data and device | |
US9069594B1 (en) | Burst buffer appliance comprising multiple virtual machines | |
CN116089477B (en) | Distributed training method and system | |
CN115840654B (en) | Message processing method, system, computing device and readable storage medium | |
CN108021448B (en) | Kernel space optimization method and device | |
CN106954264B (en) | A kind of downlink physical shares the method for mapping resource and system of channel PDSCH |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181023 |