CN108694187A - The storage method and device of real-time streaming data - Google Patents

The storage method and device of real-time streaming data Download PDF

Info

Publication number
CN108694187A
CN108694187A CN201710224721.4A CN201710224721A CN108694187A CN 108694187 A CN108694187 A CN 108694187A CN 201710224721 A CN201710224721 A CN 201710224721A CN 108694187 A CN108694187 A CN 108694187A
Authority
CN
China
Prior art keywords
data
time
real
streaming data
time streaming
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710224721.4A
Other languages
Chinese (zh)
Inventor
胡信
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201710224721.4A priority Critical patent/CN108694187A/en
Publication of CN108694187A publication Critical patent/CN108694187A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of storage method of real-time streaming data and devices, it is related to technical field of information processing, main purpose is a large amount of parquet files produced by a data for solving to be transmitted by real-time system, inquiry system can be caused when carrying out inquiry operation, the problem of accessing all parquet files, influencing query performance.Technical solution includes:Receive real-time streaming data;The real-time streaming data is parsed, analysis result is obtained;According to the analysis result, the number of data of the real-time streaming data is determined;Judge whether the number of data of the real-time streaming data reaches preset data item number;If it is, distributed data query engine is written in the analysis result of the real-time streaming data.It is mainly used for the storage of real-time streaming data.

Description

The storage method and device of real-time streaming data
Technical field
The present invention relates to technical field of information processing more particularly to the storage methods and device of a kind of real-time streaming data.
Background technology
As information processing gradually develops to big data processing direction, a kind of new distribution suitable for big data inquiry Data query engine Impala comes into the visual field of people.Impala can provide determining for structured query language structure SQL Justice, and in the data flow got in real time after parsing, it can be by real-time data memory in Impala.
Currently, existing often transmit a data in real-time system forward data stream, data will be stored into In Impala, and the parquet files of this data are generated, in turn, will be produced after transmitting a data and being stored A raw parquet file, a large amount of parquet files produced by the data transmitted by real-time system, can cause to look into Inquiry system accesses all parquet files, influences query performance when carrying out inquiry operation, to reduce the efficiency of inquiry.
Invention content
In view of the above problems, it is proposed that the present invention in order to provide a kind of real-time streaming data storage method and device, mainly Purpose is a large amount of parquet files produced by a data for solving to be transmitted by real-time system, inquiry system can be caused to exist When carrying out inquiry operation, the problem of accessing all parquet files, influence query performance.
By above-mentioned technical proposal, a kind of storage method of real-time streaming data provided by the invention, including:
Receive real-time streaming data;
The real-time streaming data is parsed, analysis result is obtained;
According to the analysis result, the number of data of the real-time streaming data is determined;
Judge whether the number of data of the real-time streaming data reaches preset data item number;
If it is, distributed data query engine is written in the analysis result of the real-time streaming data.
Further, described to judge that the number of data of the real-time streaming data is not up to after preset data item number, it is described Method further includes:
Judge whether the time for receiving the real-time streaming data for the first time reaches default to current time elapsed time Time interval, alternatively, judging that the time that the distributed data query engine is written from last time data is passed through to current time Time whether reach prefixed time interval;
If it is, distributed data query engine is written in the analysis result of the real-time streaming data.
Further, the method further includes:
Execute whether the number of data for judging the real-time streaming data reaches preset data item number using first thread, And whether the time point that the judgement receives the real-time streaming data for the first time reaches pre- to current time elapsed time If time interval, alternatively, judging that the time that the distributed data query engine is written from last time data is passed through to current time The step of whether time crossed reaches prefixed time interval;
It is executed the analysis result write-in point of the real-time streaming data using with independent second thread of the first thread The step of cloth data query engine.
Further, described before whether the number of data for judging the real-time streaming data reaches preset data item number Method further includes:
The preset data item is being configured with the first thread, the third thread of second thread independently Number, and in the first thread, second thread, the 4th thread of the third thread independently configuration described in Prefixed time interval.
Further, it is parsed to the real-time streaming data, after obtaining analysis result, the method further includes:
The analysis result is stored into preset buffer memory;
According to the analysis result, determine that the number of data of the real-time streaming data includes:
According to the analysis result stored in the preset buffer memory, the number of data of the real-time streaming data is determined.
By above-mentioned technical proposal, a kind of storage device of real-time streaming data provided by the invention, including:
Receiving unit, for receiving real-time streaming data;
Resolution unit obtains analysis result for being parsed to the real-time streaming data;
Determination unit, for according to the analysis result, determining the number of data of the real-time streaming data;
First judging unit, for judging whether the number of data of the real-time streaming data reaches preset data item number;
Writing unit, if for reaching preset data item number, by the analysis result write-in point of the real-time streaming data Cloth data query engine.
Further, described device further includes:Second judgment unit,
The second judgment unit, for judging that the time for receiving the real-time streaming data for the first time is passed through to current time Whether the time crossed reaches prefixed time interval, alternatively, judging that the distributed data query engine is written from last time data Whether time to current time elapsed time reaches prefixed time interval;
Said write unit, if being additionally operable to reach prefixed time interval, by the analysis result of the real-time streaming data Distributed data query engine is written.
Further, described device further includes:
First judging unit, specifically for executing the data for judging the real-time streaming data using first thread Whether item number reaches preset data item number;
The second judgment unit receives the real-time streams for the first time specifically for executing the judgement using first thread Whether the time point of data to current time elapsed time reaches prefixed time interval, alternatively, judging to write from last time data Whether the time for entering the distributed data query engine reaches the step of prefixed time interval to current time elapsed time Suddenly;
Said write unit is specifically used for executing the real-time streams using with independent second thread of the first thread The step of analysis result write-in distributed data query engine of data.
Further, described device further includes:
Dispensing unit, for configuring institute with the first thread, the third thread of second thread independently State preset data item number, and with the first thread, second thread, the 4th thread of the third thread independently The middle configuration prefixed time interval.
Further, described device further includes:Storage unit,
Storage unit, for storing the analysis result into preset buffer memory;
The determination unit, specifically for according to the analysis result stored in the preset buffer memory, determining the real-time streams The number of data of data.
By above-mentioned technical proposal, technical solution provided in an embodiment of the present invention at least has following advantages:
The storage method and device of a kind of real-time streaming data provided in an embodiment of the present invention, receive real-time streaming data first; The real-time streaming data is parsed, analysis result is obtained;According to the analysis result, the number of the real-time streaming data is determined According to item number;Judge whether the number of data of the real-time streaming data reaches preset data item number;If it is, by the real-time streams Distributed data query engine is written in the analysis result of data.With it is existing direct by a data for transmitting real-time system It is stored in Impala, generates a large amount of parquet files and compare, the embodiment of the present invention passes through the real-time streaming data that will receive It is parsed, whether real-time streaming data number of data reaches preset data item number after judging parsing, if reaching, by what is received Distributed data query engine is written in all real-time streaming data analysis results so that multiple real-time streaming datas are looked into distributed data It askes and is stored with a parquet document form in engine, reduce the quantity of parquet files, improve distributed number When carrying out inquiry operation according to query engine, to the speed of parquet file accesss, to improve the efficiency of inquiry.
Above description is only the general introduction of technical solution of the present invention, in order to better understand the technical means of the present invention, And can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, below the special specific implementation mode for lifting the present invention.
Description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit are common for this field Technical staff will become clear.Attached drawing only for the purpose of illustrating preferred embodiments, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of the storage method for real-time streaming data that inventive embodiments provide;
Fig. 2 shows the flow charts of the storage method of another real-time streaming data of inventive embodiments offer
Fig. 3 shows a kind of block diagram of the storage device for real-time streaming data that inventive embodiments provide;
Fig. 4 shows the block diagram of the storage device for another real-time streaming data that inventive embodiments provide.
Specific implementation mode
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.
The embodiment of the present invention provides a kind of storage method of real-time streaming data, as shown in Figure 1, the method includes:
101, real-time streaming data is received.
Wherein, the real-time streaming data is the data that real-time system acquires in real time, for distributed data query engine For Impala, the data of real-time streaming system real-time Transmission can be received, so as to operations such as the inquiry, the storages that carry out data. Since real-time system is relatively high to the requirement of real-time of data, inquiry is stored in time for the real-time streaming data needs received In engine, to meet the requirement of real-time to data, it is ensured that the accuracy of query result, still, directly by real-time reception Every data be stored in query engine and will produce a large amount of intermediate files, it is relatively low to cause search efficiency, therefore can just propose One kind not only may insure real-time property, but also the method that can improve search efficiency.
102, the real-time streaming data is parsed, obtains analysis result.
Since the data format of real-time streaming data or type that receive are not necessarily unified, need real-time fluxion According to unification is carried out, for example, real-time streaming data may be binary system, character string etc., the real-time streaming data to receiving can be passed through It is parsed, so that it is guaranteed that the real-time streaming data after parsing can keep the unification of data format or type, the embodiment of the present invention The data format for parsing front and back real-time streaming data is not limited specifically.
103, according to the analysis result, the number of data of the real-time streaming data is determined.
Wherein, the number of data is the base unit that real-time streaming system carries out data counts, according to real-time streaming data Analysis result is readily apparent that the number of data of received real-time streaming data, subsequently to judge whether number of data reaches Preset data item number.
104, judge whether the number of data of the real-time streaming data reaches preset data item number.
Wherein, the preset data item number can require height and data storage, inquiry speed according to real-time property Requirement of degree etc. carries out synthetic setting, requires real-time property higher occasion, and the value of preset data item number is smaller, and The occasion higher to data storage, search efficiency then requires data storage item number fewer, generally requires the preset data item of setting Number is bigger, needs to be obtained according to the requirement of various aspects progress COMPREHENSIVE CALCULATING in practice.
If 105, reaching preset data item number, the analysis result write-in distributed data of the real-time streaming data is looked into Ask engine.
Wherein, said write distributed data query engine is that the real-time streaming data after parsing is written with a time point In distributed data query engine Impala, to generate a parquet text in distributed data query engine Impala Part.
It should be noted that Impala can be stored according to the data that transmit in real time, shape one by one according to this in full Formula is stored, and every corresponding parquet file will be generated under the subregion of Impala, so, Impala is with time point Form carries out storage and generates parquet files, therefore, when Impala is written in the analysis result of the real-time streaming data, A parquet file is just will produce, to ensure the real-time of data, nor affects on data depositing in inquiry system Storage.
For the embodiment of the present invention, if preset data item number is not achieved after step 104, return to step 101 waits counting Write operation is executed again after reaching preset data item number according to item number.
The storage method of a kind of real-time streaming data provided in an embodiment of the present invention, with existing by transmitting real-time system One data is stored directly in Impala, is generated a large amount of parquet files and is compared, the embodiment of the present invention will be by that will receive To real-time streaming data parsed, judge parsing after real-time streaming data number of data whether reach preset data item number, if reaching It arrives, then distributed data query engine is written into all real-time streaming data analysis results received so that multiple real-time fluxions It is stored with a parquet document form according in distributed data query engine, reduces the number of parquet files Amount, to the speed of parquet file accesss, is looked into when improving distributed data query engine progress inquiry operation to improve The efficiency of inquiry.
The embodiment of the present invention additionally provides the storage method of another real-time streaming data, as shown in Fig. 2, the method packet It includes:
201, real-time streaming data is received.
This step is identical as step 101 method shown in FIG. 1, and details are not described herein.
202, the real-time streaming data is parsed, obtains analysis result.
This step is identical as step 102 method shown in FIG. 1, and details are not described herein.
203, the analysis result is stored into preset buffer memory.
Wherein, the preset buffer memory is a spatial cache for storing the real-time streaming data after all parsings, can Think local cache or high in the clouds caching etc., the embodiment of the present invention is not specifically limited.By storing the analysis result into pre- If in caching so that the real-time streaming data after parsing can be carried out buffer memory before carrying out judging item number, avoid data Loss, so as to directly from preset buffer memory determine real-time streaming data item number.
It should be noted that the cache size of preset buffer memory can be real according to the ability or reception that receive real-time streaming data When flow data density degree set, the embodiment of the present invention is not specifically limited, and the real-time streaming data in preset buffer memory exists It is written after distributed data query engine, the data in preset buffer memory is deleted, to receive real-time streams next time Analysis result is stored in preset buffer memory after data.
204, according to the analysis result stored in the preset buffer memory, the number of data of the real-time streaming data is determined.
By according to the analysis result stored in the preset buffer memory, determining the number of data of the real-time streaming data, with Just judge whether to need that distributed data query engine is written according to number of data, avoid the real-time streaming data received in determination The case where going out active before number of data.
205, judge whether the number of data of the real-time streaming data reaches preset data item number.
This step is identical as step 104 method shown in FIG. 1, and details are not described herein.
For the embodiment of the present invention, step 205 is specifically as follows executes the judgement real-time streams using first thread Whether the number of data of data reaches the step of preset data item number.
Wherein, the first thread is individually to be matched relative to other threads be currently running or operation suspension by system The thread set can be responsible for executing involved in the embodiment of the present invention the step of receiving, parsing, storing, determine and judge, especially It is to judge whether the number of data of the real-time streaming data reaches the step of preset data item number.Utilize relatively independent First Line Journey executes the step of judging number of data so that even if which step is first thread go to and do not affect other journeys in system The operation of sequence can both achieve the purpose that store real-time streaming data.
If 206a, reaching preset data item number, distributed data is written into the analysis result of the real-time streaming data Query engine.
This step is identical as step 105 method shown in FIG. 1, and details are not described herein.
In present invention embodiment shown in FIG. 1 and the present embodiment, the real-time streaming data that is received by different moments Density degree it is different, in order to prevent in the case of Sparse, Impala databases are not written in the long time, to lead The accuracy of the real-time or data query result that cause data is affected, and can be not up in the real-time streaming data received In the case of preset data item number, increase the step of time interval judges, thus when data transmission is sparse, it also can be according to pre- If Impala databases are written in time interval.If as shown in Fig. 2, the step 206b arranged side by side with step 206a, not up to default Number of data, judges whether the time for receiving the real-time streaming data for the first time reaches default to current time elapsed time Time interval, alternatively, judging that the time that the distributed data query engine is written from last time data is passed through to current time Time whether reach prefixed time interval.
Wherein, the time received for the first time is the time for receiving first real-time streaming data, due to judging data strip Whether number reaches preset data item number or judges whether current time reaches the negative decision of prefixed time interval and be to revert to and connect The step of receiving real-time streaming data, so may be needed by multiple before executing the step of distributed data query engine is written Cycle, so the last time is the corresponding last time the step of relative to this needing that distributed data query engine is written, in Between may include repeatedly only when judging whether number of data reaches preset data item number or judge whether current time reaches default Between be spaced, without the step of writing data into distributed data query engine, for example, current time be judge for the 3rd time it is real-time The time of as the 2nd time data write-in Impala in Impala is written in the time of the number of data of flow data, last time data.
It should be noted that when very few due to number of data, after some time it is possible to reach the time of preset data item number is longer, and longer Time data is written less than the data query effect that in Impala, can influence Impala, therefore, can be according to the density journey of data Configuration setting prefixed time interval inside degree or Impala can also will according to prefixed time interval so as to when data are less Data are written in Impala, can be 1 hour, 10 minutes etc., the embodiment of the present invention is not specifically limited.
For the embodiment of the present invention, step 206b is specifically as follows to be received for the first time using the first thread execution judgement Whether the time point of the real-time streaming data to current time elapsed time reaches prefixed time interval, alternatively, judging certainly When whether time to the current time elapsed time that the distributed data query engine is written in last time data reaches default Between the step of being spaced.
Wherein, it is identical thread that the first thread, which is with the thread in step 205, and details are not described herein.
It, will be described if step 207b after step 206b, reaching prefixed time interval for the embodiment of the present invention Distributed data query engine is written in the analysis result of real-time streaming data.
By reaching prefixed time interval, distributed data query engine is written into the analysis result of the real-time streaming data In, it is embodied as being written in distributed data query engine and increases Rule of judgment, overlong time is avoided to write data into not yet point Cloth data query engine, influences the effect of data query.
For the embodiment of the present invention, further include after step 206b:If not reaching prefixed time interval, return to step 201。
For the embodiment of the present invention, specifically may be used in step 206a and step 207b and first thread independence The second thread execute the step of distributed data query engine is written into the analysis result of the real-time streaming data.
Wherein, second thread is that system is separately configured relative to other threads be currently running or operation suspension Thread, and also relatively independent with first thread, second thread is dedicated for executing the parsing of the real-time streaming data As a result the step of distributed data query engine is written.I.e. whenever needing the distributed number of the analysis result of real-time streaming data write-in It when according to query engine, is intended to be executed using the second thread, so that the write step in the embodiment of the present invention of execution can be with It individually carries out, does not influence the execution of other steps.
Further, the embodiment of the present invention can also include:With the first thread, second thread independently Third thread in configure the preset data item number, and with the first thread, second thread, the third line The prefixed time interval is configured in the 4th thread of journey independently.
Wherein, the third thread, the 4th thread are mutually independent with first thread, the second thread, and third thread It is also mutual indepedent with the 4th thread, it is also mutual indepedent with being carrying out in system or suspending the program executed.By using Three thread configuration preset data item numbers, using the 4th thread configuration prefixed time interval, so as to preset data item number and it is default when Between be spaced numerical value setting can be configured outside system program, realize hot-swap, not only can arbitrarily change numerical value, but also not It needs to restart program.
For the embodiment of the present invention, specific application scenarios can be as follows, but not limited to this, including:It connects for the first time Real-time streaming data is received, real-time streaming data is resolved into character string forms, the character string after parsing is then stored in preset buffer memory In, determine that the item number of data in preset buffer memory is 30,000, preset data item number is 50,000, does not reach preset data item number, after Real-time streaming data is received in continued access, judges to determine that the time of number of data 40,000 is 1 hour second, is judged 1 hour using first thread Prefixed time interval half an hour is reached, then the real-time streaming data after parsing for being 40,000 by item number is using the write-in distribution of the second thread Formula data query engine.
The storage method of another kind real-time streaming data provided in an embodiment of the present invention, the embodiment of the present invention will be by that will use the One thread judges the number of data for the real-time streaming data that parsing is stored in preset buffer memory, if not reaching preset data item number, Then judge whether time to the current time for receiving real-time streaming data for the first time reaches prefixed time interval, if reaching, profit Distributed data query engine Impala is written into the data after parsing with the second thread, and is changed using third, the 4th thread Preset data item number and prefixed time interval so that Impala generates a corresponding parquet file, and data is avoided to be stored in When in preset buffer memory, does not achieve the effect that predetermined threshold value for a long time and influence output inquiry, reduce the number of parquet files Amount, may be implemented the hot-swap between different step, be restarted when avoiding change preset data item number with prefixed time interval System, when improving inquiry system progress inquiry operation, to the speed of parquet file accesss, to improve the efficiency of inquiry.
Further, the specific implementation as method shown in Fig. 1, the embodiment of the present invention provide a kind of depositing for real-time streaming data Storage device, as shown in figure 3, described device may include:Receiving unit 31, resolution unit 32, determination unit 33, first judge single Member 34, writing unit 35.
Receiving unit 31, for receiving real-time streaming data;The receiving unit 31 is that a kind of storage of real-time streaming data fills Set the function module for executing and receiving real-time streaming data.
Resolution unit 32 obtains analysis result for being parsed to the real-time streaming data;The resolution unit 32 is A kind of storage device execution of real-time streaming data parses the real-time streaming data, obtains the function module of analysis result.
Determination unit 33, for according to the analysis result, determining the number of data of the real-time streaming data;The determination Unit 33 is that a kind of storage device of real-time streaming data is executed according to the analysis result, determines the data of the real-time streaming data The function module of item number.
First judging unit 34, for judging whether the number of data of the real-time streaming data reaches preset data item number; First judging unit 34 is that a kind of storage device of real-time streaming data executes the number of data for judging the real-time streaming data Whether the function module of preset data item number is reached.
If the analysis result of the real-time streaming data is written for reaching preset data item number for writing unit 35 Distributed data query engine.If the storage device execution that said write unit 35 is a kind of real-time streaming data reaches present count According to item number, then the analysis result of the real-time streaming data is written to the function module of distributed data query engine.
The device embodiment is corresponding with preceding method embodiment, and for ease of reading, present apparatus embodiment is no longer to aforementioned side Detail content in method embodiment is repeated one by one, it should be understood that the device in the present embodiment can correspond to realize it is aforementioned Full content in embodiment of the method.
The storage device of a kind of real-time streaming data provided in an embodiment of the present invention, with existing by transmitting real-time system One data is stored directly in Impala, is generated a large amount of parquet files and is compared, the embodiment of the present invention will be by that will receive To real-time streaming data parsed, judge parsing after real-time streaming data number of data whether reach preset data item number, if reaching It arrives, then distributed data query engine is written into all real-time streaming data analysis results received so that multiple real-time fluxions It is stored with a parquet document form according in distributed data query engine, reduces the number of parquet files Amount, to the speed of parquet file accesss, is looked into when improving distributed data query engine progress inquiry operation to improve The efficiency of inquiry.
Further, the specific implementation as method shown in Fig. 2, the embodiment of the present invention provide another real-time streaming data Storage device, as shown in figure 4, described device may include:Receiving unit 41, resolution unit 42, determination unit 43, first judge Unit 44, writing unit 45, second judgment unit 46, dispensing unit 47, storage unit 48.
Receiving unit 41, for receiving real-time streaming data;
Resolution unit 42 obtains analysis result for being parsed to the real-time streaming data;
Determination unit 43, for according to the analysis result, determining the number of data of the real-time streaming data;
First judging unit 44, for judging whether the number of data of the real-time streaming data reaches preset data item number;
If the analysis result of the real-time streaming data is written for reaching preset data item number for writing unit 45 Distributed data query engine.
Further, the efficiency in distributed data query engine, institute is written in order to further improve real-time streaming data Stating device further includes:Second judgment unit 46,
The second judgment unit 46, for judging to receive time of the real-time streaming data for the first time to current time institute Whether elapsed time reaches prefixed time interval, alternatively, judging that the distributed data query engine is written from last time data Time to current time elapsed time whether reach prefixed time interval;
Said write unit 45, if being additionally operable to reach prefixed time interval, by the parsing knot of the real-time streaming data Distributed data query engine is written in fruit.
First judging unit 44, specifically for executing the number for judging the real-time streaming data using first thread Whether reach preset data item number according to item number;
The second judgment unit 46, it is described real-time specifically for being received for the first time using the first thread execution judgement Whether the time point of flow data to current time elapsed time reaches prefixed time interval, alternatively, judging from last time data Whether time to the current time elapsed time that the distributed data query engine is written reaches prefixed time interval Step;
Said write unit 45, will be described real-time with the independent second thread execution of the first thread specifically for using The step of analysis result write-in distributed data query engine of flow data.
Further, in order to configure preset data item number and prefixed time interval in different threads, realize hot-swap, Described device further includes:
Dispensing unit 47, for being configured with the first thread, the third thread of second thread independently The preset data item number, and with the first thread, second thread, the 4th line of the third thread independently The prefixed time interval is configured in journey.
Further, described device further includes:Storage unit 48,
The storage unit 48, for storing the analysis result into preset buffer memory;
The determination unit 43, specifically for according to the analysis result stored in the preset buffer memory, determining described real-time The number of data of flow data.
The device embodiment is corresponding with preceding method embodiment, and for ease of reading, present apparatus embodiment is no longer to aforementioned side Detail content in method embodiment is repeated one by one, it should be understood that the device in the present embodiment can correspond to realize it is aforementioned Full content in embodiment of the method.
The storage device of another kind real-time streaming data provided in an embodiment of the present invention, the embodiment of the present invention will be by that will use the One thread judges the number of data for the real-time streaming data that parsing is stored in preset buffer memory, if not reaching preset data item number, Then judge whether time to the current time for receiving real-time streaming data for the first time reaches prefixed time interval, if reaching, profit Distributed data query engine Impala is written into the data after parsing with the second thread, and is changed using third, the 4th thread Preset data item number and prefixed time interval so that Impala generates a corresponding parquet file, and data is avoided to be stored in When in preset buffer memory, does not achieve the effect that predetermined threshold value for a long time and influence output inquiry, reduce the number of parquet files Amount, may be implemented the hot-swap between different step, be restarted when avoiding change preset data item number with prefixed time interval System, when improving inquiry system progress inquiry operation, to the speed of parquet file accesss, to improve the efficiency of inquiry.
The storage device of the real-time streaming data includes processor and memory, above-mentioned receiving unit, resolution unit, determination Unit, the first judging unit and writing unit etc. are used as program unit storage in memory, are stored in by processor execution Above procedure unit in memory realizes corresponding function.
Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be arranged one Or more, it is literary to solve a large amount of parquet produced by the data transmitted by real-time system by adjusting kernel parameter The problem of part can cause inquiry system when carrying out inquiry operation, access all parquet files, influence query performance.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include at least one deposit Store up chip.
Present invention also provides a kind of computer program products, when being executed on data processing equipment, are adapted for carrying out just The program code of beginningization there are as below methods step:Receive real-time streaming data;The real-time streaming data is parsed, is parsed As a result;According to the analysis result, the number of data of the real-time streaming data is determined;Judge the data strip of the real-time streaming data Whether number reaches preset data item number;If it is, by the analysis result write-in distributed data inquiry of the real-time streaming data Engine.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, the application can be used in one or more wherein include computer usable program code computer The computer program production implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The application is with reference to method, the flow of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or other magnetic storage apparatus Or any other non-transmission medium, it can be used for storage and can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It these are only embodiments herein, be not intended to limit this application.To those skilled in the art, The application can have various modifications and variations.It is all within spirit herein and principle made by any modification, equivalent replacement, Improve etc., it should be included within the scope of claims hereof.

Claims (10)

1. a kind of storage method of real-time streaming data, which is characterized in that including:
Receive real-time streaming data;
The real-time streaming data is parsed, analysis result is obtained;
According to the analysis result, the number of data of the real-time streaming data is determined;
Judge whether the number of data of the real-time streaming data reaches preset data item number;
If it is, distributed data query engine is written in the analysis result of the real-time streaming data.
2. according to the method described in claim 1, it is characterized in that, judging that the number of data of the real-time streaming data is not up to After preset data item number, the method further includes:
Judge whether time to the current time elapsed time for receiving the real-time streaming data for the first time reaches preset time Interval, alternatively, judge from last time data be written time of the distributed data query engine to current time passed through when Between whether reach prefixed time interval;
If it is, distributed data query engine is written in the analysis result of the real-time streaming data.
3. according to the method described in claim 2, it is characterized in that:
Execute whether the number of data for judging the real-time streaming data reaches preset data item number using first thread, and When whether time point to the current time elapsed time that the judgement receives the real-time streaming data for the first time reaches default Between be spaced, alternatively, judging what the time that the distributed data query engine is written from last time data was passed through to current time The step of whether time reaches prefixed time interval;
Distribution is written into the analysis result of the real-time streaming data using being executed with independent second thread of the first thread The step of data query engine.
4. according to the method described in claim 3, it is characterized in that, whether being reached in the number of data for judging the real-time streaming data To before preset data item number, the method further includes:
The preset data item number is being configured with the first thread, the third thread of second thread independently, with And with configured in the first thread, second thread, the 4th thread of the third thread independently it is described default Time interval.
5. method according to any one of claims 1 to 4, which is characterized in that it is parsed to the real-time streaming data, After obtaining analysis result, the method further includes:
The analysis result is stored into preset buffer memory;
According to the analysis result, determine that the number of data of the real-time streaming data includes:
According to the analysis result stored in the preset buffer memory, the number of data of the real-time streaming data is determined.
6. a kind of storage device of real-time streaming data, which is characterized in that including:
Receiving unit, for receiving real-time streaming data;
Resolution unit obtains analysis result for being parsed to the real-time streaming data;
Determination unit, for according to the analysis result, determining the number of data of the real-time streaming data;
First judging unit, for judging whether the number of data of the real-time streaming data reaches preset data item number;
If writing unit the analysis result of the real-time streaming data is written distributed for reaching preset data item number Data query engine.
7. device according to claim 6, which is characterized in that described device further includes:Second judgment unit,
The second judgment unit, for judging what the time for receiving the real-time streaming data for the first time was passed through to current time Whether the time reaches prefixed time interval, alternatively, judging to be written the time of the distributed data query engine from last time data Whether reach prefixed time interval to current time elapsed time;
The analysis result of the real-time streaming data is written if being additionally operable to reach prefixed time interval for said write unit Distributed data query engine.
8. device according to claim 7, it is characterised in that:
First judging unit, specifically for executing the number of data for judging the real-time streaming data using first thread Whether preset data item number is reached;
The second judgment unit receives the real-time streaming data for the first time specifically for executing the judgement using first thread Time point to current time elapsed time whether reach prefixed time interval, alternatively, judge from last time data be written institute The step of stating time to current time elapsed time of distributed data query engine and whether reach prefixed time interval;
Said write unit is specifically used for executing the real-time streaming data using with independent second thread of the first thread Analysis result write-in distributed data query engine the step of.
9. device according to claim 8, which is characterized in that described device further includes:
Dispensing unit, for configured in the first thread, the third thread of second thread independently it is described pre- If number of data, and match with the first thread, second thread, the 4th thread of the third thread independently Set the prefixed time interval.
10. according to claim 6 to 9 any one of them device, which is characterized in that described device further includes:Storage unit,
The storage unit, for storing the analysis result into preset buffer memory;
The determination unit, specifically for according to the analysis result stored in the preset buffer memory, determining the real-time streaming data Number of data.
CN201710224721.4A 2017-04-07 2017-04-07 The storage method and device of real-time streaming data Pending CN108694187A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710224721.4A CN108694187A (en) 2017-04-07 2017-04-07 The storage method and device of real-time streaming data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710224721.4A CN108694187A (en) 2017-04-07 2017-04-07 The storage method and device of real-time streaming data

Publications (1)

Publication Number Publication Date
CN108694187A true CN108694187A (en) 2018-10-23

Family

ID=63842854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710224721.4A Pending CN108694187A (en) 2017-04-07 2017-04-07 The storage method and device of real-time streaming data

Country Status (1)

Country Link
CN (1) CN108694187A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977334A (en) * 2019-03-26 2019-07-05 浙江度衍信息技术有限公司 Retrieval rate optimization method
CN113296962A (en) * 2021-07-26 2021-08-24 阿里云计算有限公司 Memory management method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102118268A (en) * 2011-02-18 2011-07-06 中兴通讯股份有限公司 Telephone traffic data storage method and system
CN102646121A (en) * 2012-02-23 2012-08-22 武汉大学 Two-stage storage method combined with RDBMS (relational database management system) and Hadoop cloud storage
CN103853671A (en) * 2012-12-07 2014-06-11 北京百度网讯科技有限公司 Data writing control method and device
CN104967807A (en) * 2014-12-30 2015-10-07 浙江大华技术股份有限公司 Caching method and apparatus
CN105446893A (en) * 2014-07-14 2016-03-30 阿里巴巴集团控股有限公司 Data storage method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102118268A (en) * 2011-02-18 2011-07-06 中兴通讯股份有限公司 Telephone traffic data storage method and system
CN102646121A (en) * 2012-02-23 2012-08-22 武汉大学 Two-stage storage method combined with RDBMS (relational database management system) and Hadoop cloud storage
CN103853671A (en) * 2012-12-07 2014-06-11 北京百度网讯科技有限公司 Data writing control method and device
CN105446893A (en) * 2014-07-14 2016-03-30 阿里巴巴集团控股有限公司 Data storage method and device
CN104967807A (en) * 2014-12-30 2015-10-07 浙江大华技术股份有限公司 Caching method and apparatus

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977334A (en) * 2019-03-26 2019-07-05 浙江度衍信息技术有限公司 Retrieval rate optimization method
CN109977334B (en) * 2019-03-26 2023-10-20 浙江度衍信息技术有限公司 Search speed optimization method
CN113296962A (en) * 2021-07-26 2021-08-24 阿里云计算有限公司 Memory management method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
TWI712900B (en) Distributed cluster training method and device
KR101775227B1 (en) Techniques for routing service chain flow packets between virtual machines
WO2019042312A1 (en) Distributed computing system, data transmission method and device in distributed computing system
US10705935B2 (en) Generating job alert
US10331669B2 (en) Fast query processing in columnar databases with GPUs
US20160182320A1 (en) Techniques to generate a graph model for cloud infrastructure elements
CN109145051A (en) The data summarization method and device and electronic equipment of distributed data base
CN105843819B (en) Data export method and device
US10862765B2 (en) Allocation of shared computing resources using a classifier chain
CN108153589B (en) Method and system for data processing in a multi-threaded processing arrangement
CN107229747A (en) A kind of large-scale data processing unit and method based on Stream Processing framework
CN110555038A (en) Data processing system, method and device
CN108694187A (en) The storage method and device of real-time streaming data
US10915704B2 (en) Intelligent reporting platform
CN109426439A (en) The method and device of dilatation is carried out to distributed memory system
US9690728B1 (en) Burst buffer appliance comprising multiple virtual machines
US11250001B2 (en) Accurate partition sizing for memory efficient reduction operations
CN111124708B (en) Microservice-oriented batch reasoning method, server and computer readable storage medium
CN115840654B (en) Message processing method, system, computing device and readable storage medium
WO2023060833A1 (en) Data exchange method, electronic device and storage medium
Sarkar et al. A scalable artificial intelligence data pipeline for accelerating time to insight
CN106954264B (en) A kind of downlink physical shares the method for mapping resource and system of channel PDSCH
CN109101514A (en) Data lead-in method and device
CN110019357A (en) Data base querying scenario generation method and device
CN112988383A (en) Resource allocation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181023