CN108255628A - A kind of data processing method and device - Google Patents
A kind of data processing method and device Download PDFInfo
- Publication number
- CN108255628A CN108255628A CN201611247794.7A CN201611247794A CN108255628A CN 108255628 A CN108255628 A CN 108255628A CN 201611247794 A CN201611247794 A CN 201611247794A CN 108255628 A CN108255628 A CN 108255628A
- Authority
- CN
- China
- Prior art keywords
- location information
- consumption data
- message system
- pending
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0727—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
Abstract
The invention discloses a kind of data processing method and devices, are related to field of computer technology, and main purpose is to ensure the integrality of consumption data in data handling procedure.The method includes:By location information persistence of the pending consumption data in message system to preset memory space;When reboot process program, the pending consumption data is reloaded from message system according to the location information;The pending consumption data is handled.Present invention is mainly used for the processing of consumption data.
Description
Technical field
The present invention relates to field of computer technology, especially a kind of data processing method and device.
Background technology
Kafka is that a kind of distributed post of high-throughput subscribes to message system, it can handle the net of consumer's scale
Everything flow data in standing, such as the web page browsing of user, search or other user behaviors.
Usually in Kafka message systems, the senders of data is the producer, and the recipients of data is consumer, data
Transmitter for transfer server, after consumer gets consumption data from Kafka message systems, need to consumption data
Reading situation recorded, with facilitate according to read record consumption data is handled in real time.
At present the spark streaming components in generally use spark distribution platforms to the consumption data that gets into
Row batch processing, first after consumption data is got from Kafka message systems every time, spark streaming component meetings
The position of consumption data to having handled records, however, when occurring exception in the process of processing to consumption data
During situation, such as the situation that treatment progress is surprisingly turned off during handling or the operation of spark streaming programs stops
Under, even if spark streaming components have had recorded the position of consumption data, but reactivate treatment progress or
During spark streaming programs, the loss of consumption data is also resulted in, can not ensure the integrality of consumption data.
Invention content
In view of the above problems, it is proposed that the present invention overcomes the above problem in order to provide one kind or solves at least partly
A kind of data processing method and device of problem are stated, can ensure the integrality of consumption data in data handling procedure.
On the one hand, the present invention provides a kind of data processing method, including:
By location information persistence of the pending consumption data in message system to preset memory space;
When reboot process program, the pending consumption number is reloaded from message system according to the location information
According to;
The pending consumption data is handled.
Further, the location information persistence by pending consumption data in message system is empty to preset storage
Between include:
Obtain location information of the pending consumption data in message system;
Obtain the persistent storage type of the location information;
According to the persistent storage type by the location information persistence to preset memory space.
Further, it is described that the pending consumption data packet is reloaded from message system according to the location information
It includes:
The location information is read from preset memory space according to the persistent storage type of the location information;
The pending consumption data is reloaded from message system according to the location information.
Further, in the location information persistence by pending consumption data in message system to preset storage
Before space, the method further includes:
The pending consumption data obtained from message system is handled according to prefixed time interval.
Further, it is described the pending consumption data is handled after, the method further includes:
After the pending consumption data obtained from message system is completed in processing, update in the preset memory areas domain
Location information, in order to load the pending of next batch from the message system according to the updated location information
Consumption data.
Further, it is described that pending consumption data exists when the persistent storage type is stored for file system
Location information persistence to preset memory space in message system includes:
Location information of the pending consumption data in the message system is obtained, the location information is included not
The subregion field of same type consumption data and district location field;
Default file is created in the file system;
It will be in the location information persistence to the default file;
Further, it is described that pending consumption data is disappearing when the persistent storage type is database purchase
Location information persistence to preset memory space in breath system includes:
Location information of the pending consumption data in the message system is obtained, the location information is included not
The subregion field of same type consumption data and district location field;
Preset table is created in the database;
It will be in the location information persistence to the preset table.
On the other hand, the present invention provides a kind of data processing equipment, including:
Storage unit, for location information persistence of the pending consumption data in message system is empty to preset storage
Between;
Loading unit, for when reboot process program, institute to be reloaded from message system according to the location information
State pending consumption data;
First processing units, for handling the pending consumption data.
Further, the storage unit includes:
Acquisition module, for obtaining location information of the pending consumption data in message system;
Parsing module, for obtaining the persistent storage type of the location information;
Memory module, for according to the persistent storage type that the location information persistence is empty to preset storage
Between.
Further, the loading unit includes:
Read module, described in being read from preset memory space according to the persistent storage type of the location information
Location information;
Load-on module, for reloading the pending consumption data from message system according to the location information.
Further, described device further includes:
Second processing unit, for according to prefixed time interval to the pending consumption data that is obtained from message system into
Row processing.
Further, described device further includes:
Updating unit, after handling the pending consumption data for completing to be obtained from message system, described in update
Location information in the domain of preset memory areas, under being loaded from the message system according to the updated location information
A batch of pending consumption data.
Further, when the persistent storage type is stored for file system,
The storage unit is additionally operable to obtain location information of the pending consumption data in the message system,
The location information includes the subregion field of different type consumption data and district location field;
The storage unit is additionally operable to create default file in the file system;
The storage unit, being additionally operable to will be in the location information persistence to the default file;
Further, when the persistent storage type is database purchase,
The storage unit is additionally operable to obtain location information of the pending consumption data in the message system,
The location information includes the subregion field of different type consumption data and district location field;
The storage unit is additionally operable to create preset table in the database;
The storage unit, being additionally operable to will be in the location information persistence to the preset table.
By above-mentioned technical proposal, a kind of data processing method provided by the invention and device, by by pending consumption
Location information persistence of the data in message system is to preset memory space so that when the process handled consumption data
In when there are abnormal conditions, and then can be added from message system according to the location information of persistence to preset memory space
Pending consumption data is carried, so as to ensure the integrality of consumption data in data handling procedure.Relative to existing data processing
Method, the embodiment of the present invention is by the way that after program is restarted, pending disappear is got from the preset memory space of persistence
Take the location information of data, and then from the consumption data of new processing untreated completion because of program exception, consumption will not be lost
Data, so as to ensure that the integrality of consumption data.
Above description is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, below the special specific embodiment for lifting the present invention.
Description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this field
Technical staff will become clear.Attached drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of data processing method flow diagram provided in an embodiment of the present invention;
Fig. 2 shows another data processing method flow diagrams provided in an embodiment of the present invention;
Fig. 3 shows that the flow provided in an embodiment of the present invention by location information persistence to preset memory space is illustrated
Figure;
Fig. 4 shows the flow diagram of the loading position information provided in an embodiment of the present invention from preset memory space;
Fig. 5 shows the flow diagram of data processing under normal circumstances provided in an embodiment of the present invention;
Fig. 6 shows the flow diagram of data processing under abnormal conditions provided in an embodiment of the present invention;
Fig. 7 shows a kind of data processing equipment structural diagram provided in an embodiment of the present invention;
Fig. 8 shows another data processing equipment structural diagram provided in an embodiment of the present invention.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Completely it is communicated to those skilled in the art.
An embodiment of the present invention provides a kind of data processing method, as shown in Figure 1, this method can be applied to spark points
The processing of consumption data in cloth platform, by being held to location information of the pending consumption data in Kafka message systems
Longization is further ensured that, when processing routine is abnormal, consumption data will not be lost, and specific steps include:
101st, by location information persistence of the pending consumption data in message system to preset memory space.
Most conventional part, such as page in the data to be used when being all site analysis website service conditions of consumption data
Visit capacity is checked the contents such as information and search situation in terms of content.The common processing mode of consumption data is first each
Certain file is written in kind activity in the form of daily record, then periodically for statistical analysis to file, as a plurality of types of
Data pipe, there are message system in many websites now, so that convenient acquire consumption data from client web site.
Kafka is that a kind of distributed post of high-throughput subscribes to message system, it can handle the net of consumer's scale
Everything flow data in standing.Usually in Kafka message systems, the senders of data is the producer, the recipient of data
As consumer, the transmitter of data is transfer server.In the message queue provided in Kafka, when flat using spark distributions
After a collection of consumption data of spark streaming component batchs processing in platform, the location information of processing consumption data can be recorded,
To facilitate when there are emergency situations, then consumption data can be handled according to the location information of record.
For the embodiment of the present invention, location information of the pending consumption data in message system is passes through every time here
Spark streaming processing routines consumption data is exported after next group consumption data to be treated in Kafka message
Location information in system, since the amount of message in Kafka message systems is very large, real messages amount is Website page browsing
As many as several times of sum, it is therefore necessary to divide one group of mutually independent subregion for the data flow in system, be referred to by the producer
Which subregion is fixed every message belong to, and then can be accurately in an orderly manner from Kafka message systems according to the district location of record
Obtain consumption data.
Here persistence process be by the location information persistence of consumption data pending in Kafka message queues extremely
Preset memory space can also obtain next consumption data by the location information of the current consumption data of record here certainly
Location information, so as to which by the location information persistence of next consumption data to preset memory space, which can
For monitoring consumer for the consumption of consumption data, the embodiment of the present invention does not limit preset memory space, example
Such as:Distributed file system (Hadoop Distributed File System, HDFS) may be used.
102nd, when reboot process program, described pending disappear is reloaded from message system according to the location information
Take data.
It should be noted that when using the spark streaming component batchs processing Kafka in spark distribution platforms
During the message queue of offer, it is possible that abnormal conditions, such as:Spark streaming components when handling consumption data,
Process is accidentally shut down or stops the spark streaming programs currently consumed.
After spark streaming processing routines are restarted, consumed according to the persistence that preset memory space stores
Data positional information, it may be determined that wait to locate at the end of spark streaming component last time batch processing Kafka message queues
The location information of consumption data (also untreated complete consumption data) is managed, further passes the location information of pending consumption data
Kafka message systems are given, so as to facilitate spark streaming components according to location information corresponding in Kafka message systems
Reload above-mentioned pending consumption data.
103rd, the pending consumption data is handled.
According to the consumption data location information of the persistence stored in preset memory space, it may be determined that spark
Consumption data location information and consumption next time number at the end of streaming component last time batch processing Kafka message queues
According to location information, the starting point of the current consumption data position for needing to consume Kafka message queues is determined, using spark points
The message queue that spark streaming component batchs processing Kafka in cloth platform is provided.
For the embodiment of the present invention, spark streaming are one and quasi real time handle frame, and the processing response time is general
As unit of minute, that is to say, that the delay time for handling real time data is a second rank.Spark streaming pass through distribution
Receiver on each node caches the data flow received from Kafka, and data stream packets are dressed up spark to handle
RDD forms, so that spark clusters can be handled data stream in real time, so as to will treated that data export needs to having
The business end asked, so as to business end, to treated, data are analyzed.
Can be seen that a kind of data processing method provided in an embodiment of the present invention with reference to above-mentioned realization method, pass through by
Location information persistence of the pending consumption data in message system is to preset memory space so that is carried out when to consumption data
It, can be from message system according to the location information of persistence to preset memory space when occurring abnormal conditions during processing
Pending consumption data is reloaded, so as to ensure the integrality of consumption data in data handling procedure.Relative to existing number
According to processing method, the embodiment of the present invention from the preset memory space of persistence by after program is restarted, getting and treating
The location information of consumption data is handled, and then handles the consumption data of the untreated completion because of program exception again, will not be lost
Consumption data is lost, so as to ensure that the integrality of consumption data.
Below in order to which a kind of data processing method proposed by the present invention is explained in more detail, especially for using different
Persistence type, the embodiment of the present invention additionally provides another data processing method, as shown in Fig. 2, the specific step of this method
Suddenly include:
201st, the pending consumption data obtained from message system is handled according to prefixed time interval.
Under normal conditions, spark streaming components are run in spark clusters to recycle according to prefixed time interval
Ground obtains consumption data from Kafka message queues, that is, obtains consumption number from Kafka message queues with the predetermined time cycle
According to.Real-time Computational frame of the spark streaming components as structure on spark carries out the consumption data got
Batch processing.
It should be noted that the embodiment of the present invention does not limit prefixed time interval here, preferably for 1s~5s's
Time interval, specifically can be by the time window that pre-sets, in the slave Kafka message systems according to time window cycle
Obtain pending consumption data.
202nd, by location information persistence of the pending consumption data in message system to preset memory space.
For the embodiment of the present invention, location information persistence of the pending consumption data in message system is deposited to preset
The step of storing up space can include but is not limited to following manner, first by spark platforms to the consumption data position that has handled
It puts and is recorded, obtain location information of the pending consumption data in message system, obtained by the property parameters for parsing data
The persistence type of location information of the pending consumption data in message system is taken, property parameters here is pass through spark
The parameter that pending consumption data is configured in platform, property parameters here could be provided as file system or database, and then
According to persistent storage type by location information persistence of the pending message data in message system to preset memory space,
For example, when persistence type is file system, spark streaming are by the corresponding filename of system file and get
Location information preserve to preset storage location.
When the persistent storage type is file system, position of the pending consumption data in message system is believed
Following method may be used in breath persistence to preset memory space, obtains position of the pending consumption data in message system first
Confidence ceases, and location information here includes the subregion field of different type consumption data and district location field, due to every
Item is published to the message data of Kafka all there are one classification, this classification is referred to as Topic, physically difference Topic disappears
Breath data are stored separately to different subregions, for example, there is 5 subregions in message system, subregion field is respectively A, B, C, D, E,
In logic although the message data of a Topic is stored on the server of one or more Kafka but user need to only specify and disappear
The Topic of breath data can be produced or consumption data is without being concerned about message data is stored in where, in addition, the consumption of each classification
Data, which are recorded in subregion, also corresponding district location field, and default file is then created in file system, here pre-
If file name can be named according to subregion field of the pending consumption data in message system, in order to which different type disappears
Take the lookup of data, can also arbitrarily name, the embodiment of the present invention is without limiting;Finally by pending consumption data in message
In location information persistence to default file in system.
Illustratively, the name of above-mentioned default file can be arbitrarily named by creating a file in file system
For kafka_offset.txt, then location information of the consumption data in message system is passed to by way of passing and joining
In spark streaming programs, then file output content is " 2:1428808”、“0:1456813 " and " 1:1431255”;
Location information of the consumption data in message system is wherein shown in output file content, " 2 ", " 0 " and " 1 " are Kafka
Topic subregion fields are consumed, correspondingly " 1428808 ", " 1456813 " and " 1431255 " are corresponding zone bit successively
Put field.
When the persistent storage type is database purchase, by position of the pending consumption data in message system
Following method may be used in information persistence to preset memory space, obtains pending consumption data first in message system
Location information, location information here is same as above, then this is not repeated;Then preset table is created in the database, finally
By in location information persistence to preset table of the pending consumption data in message system, the row and column in the preset table
Subregion field and district location field can be filled in respectively, in order to the lookup of different type consumption data.
Illustratively, the name of above-mentioned preset table can be arbitrarily named by creating a table in the database
Then location information of the consumption data in message system is passed to spark by kafka_offset by way of passing and joining
In streaming programs, then two fields are mainly included in preset table:Field 1 is subregion field and field 2 is zone bit
Field is put, it is shown such as following table.Wherein there is location information of the consumption data in message system in output database preset table, such as
“2:1428808”、“0:1456813 " and " 1:1431255”;During wherein output formats content is shown, " 2 ", " 0 " and " 1 "
Topic subregion fields are consumed for Kafka, correspondingly " 1428808 ", " 1456813 " and " 1431255 " are corresponding successively
District location field.
For the embodiment of the present invention, when spark streaming processing routines occur in batch processing Kafka message queues
It is abnormal, when being unable to operate normally, location information of the consumption data to be handled in message system is recorded in preset memory space,
To work as after spark streaming restore, it is untreated complete last time can be loaded from Kafka message systems according to location information
Into consumption data, the loss of data will not be caused, specifically hold location information of the pending consumption data in message system
The flow of longization to preset memory space is as shown in Figure 3.
203rd, when reboot process program, described pending disappear is reloaded from message system according to the location information
Take data.
For the embodiment of the present invention, loaded from message system according to location information pending consumption data can include but
Following manner is not limited to, obtains pending consumption data in message system by parsing the property parameters of consumption data first
Location information persistent storage type, the description of property parameters such as step 202 here, then this does not repeat;Then root
Location information of the consumption data in message system is read from preset memory space according to the persistent storage type, here
Record has the location information that each batch processing completes consumption data in file or table in preset memory space;Last root
Next batch consumption data to be treated is loaded from message system according to corresponding location information, to work as spark
After streaming restores, the consumption number of last time untreated completion can be loaded from Kafka message systems according to location information
According to will not lead to the loss of data, the flow of pending consumption data is loaded such as from message system with specific reference to location information
Shown in Fig. 4.
204th, the pending consumption data is handled.
According to the consumption data location information of persistence in preset memory space, it may be determined that spark streaming groups
The location information of consumption data at the end of part last time batch processing Kafka message queues, and number is consumed at the end of
According to location information can determine the location information of next consumption data, i.e., currently need to consume the consumption of Kafka message queues
The starting point of Data Position is provided using the spark streaming component batchs processing Kafka in spark distribution platforms
Message queue.
205th, after the pending consumption data obtained from message system is completed in processing, the preset memory areas is updated
Location information in domain, in order to load treating for next batch from the message system according to the updated location information
Handle consumption data.
Due to spark streaming component processes datas flow according to the scheduled time cycle from Kafka message system
Consumption data is obtained in system, and consumption data is handled, therefore, every time after consumption data has been handled, it is necessary to update
The location information of next batch consumption data to be treated in the domain of preset memory areas, it is convenient when there are emergency situations, again
Then consumption data can be handled according to the location information of persistence after startup program.
For the embodiment of the present invention, Spark streaming are after the complete pending consumption data of a batch per treatment, meeting
The location information of persistent storage in storage region is updated, when spark streaming processing routine normal operations,
Directly according to the location information of the consumption data of Spark streaming component records, disappearing for next batch is obtained from Kafka
Take data, it, can be according to the position of persistence in the domain of preset memory areas when spark streaming processing routines occur abnormal
Information obtains the consumption data of next batch from Kafka.
Following realization methods can be included but is not limited to for the concrete application scene of the embodiment of the present invention:Work as spark
During streaming processing routine normal operations, spark streaming components can according to time window set time cycle at
Reason obtains consumption data from Kafka message queues, and will handle the consumption data output completed, in spark streaming groups
It can be by the location information of the consumption data of next batch pending in Kafka message queues after part complete consumption data per treatment
Persistence to preset memory space, step flow chart as shown in Figure 5 works as spark by way of database table
When streaming components are abnormal rear reboot process program, by the location information that is recorded in database table from message system
Continue to load the consumption data of pending next batch in system, continue to handle consumption data according to service logic, and
The consumption data that output processing is completed, step flow chart as shown in Figure 6.
Due to excessive or extraneous interference of data volume etc., it is likely to occur spark in data processing
Streaming treatment progress accidental interruption or stopping, and may result in batch processing during process is again started up
The failure of data and loss, in order to ensure the validity of data processing, another kind data processing method of the embodiment of the present invention, when
When unusual condition occur in spark streaming processing routines, by the pending consumption data of persistent storage in Kafka message
The mode of location information in system, and then ensure after spark streaming processing routines reactivate, spark
The consumption data of streaming processing is not in the situation of Data duplication processing and loss of data.
Further, the specific implementation as method shown in Fig. 1, the embodiment of the present invention provide a kind of data processing equipment,
The device embodiment is corresponding with preceding method embodiment, and for ease of reading, the present apparatus is no longer to thin in preceding method embodiment
Section content is repeated one by one, is realized in preceding method embodiment it should be understood that the device in the present embodiment can correspond to
Full content, as shown in fig. 7, described device includes:
Storage unit 31 can be used for location information persistence of the pending consumption data in message system to preset
Memory space, the storage unit 31 are that location information of the pending consumption data of persistence in message system is used in the present apparatus
Main functional modules, the persistence to consumption data can be specifically realized by way of system file or database;
Loading unit 32 can be used for, when reboot process program, being loaded from message system according to the location information
The pending consumption data, the loading unit 32 are to be used to load pending consumption number from preset memory space in the present apparatus
According to location information main functional modules, the position of pending consumption data can be specifically read according to different persistence types
Confidence ceases, and then pending data is loaded from message system;
First processing units 33 can be used for handling the pending consumption data, which is this
For handling the main functional modules of consumption data in device, can specifically be treated according to service logic to what is loaded from Kafka
Processing consumption data is handled.
A kind of data processing equipment provided in an embodiment of the present invention, by by pending consumption data in message system
Location information persistence is to preset memory space so that when there are abnormal conditions in the process of processing to consumption data,
And then pending consumption data can be loaded from message system according to the location information of persistence to preset memory space,
So as to ensure the integrality of consumption data in data handling procedure.Compared with existing data processing method, the present invention is real
It applies example and is believed by after program is restarted, getting the position of pending consumption data from the preset memory space of persistence
Breath, and then from the consumption data of new processing untreated completion because of program exception, consumption data will not be lost, so as to ensure that
The integrality of consumption data.
Further, as shown in figure 8, described device further includes:
Second processing unit 34 can be used for according to pending consumption of the prefixed time interval to being obtained from message system
Data are handled, which is to be used under processing routine normal condition in the present apparatus by setting time window pair
The main functional modules that the consumption data obtained from message system is handled;
Updating unit 35, after can be used for the pending consumption data that processing completion is obtained from message system, more
Location information in the new preset memory areas domain, in order to according to the updated location information from the message system
The pending consumption data of next batch is loaded, which is to be used to update storage persistence in region in the present apparatus
Location information disappears in order to load the pending of next batch from the message system according to the updated location information
Take data.
Further, the storage unit 31 includes:
Acquisition module 311 can be used for obtaining location information of the pending consumption data in message system;
Parsing module 312 can be used for obtaining the persistent storage type of location information;
Memory module 313 can be used for location information persistence to preset storage according to the persistent storage type
Space.
Further, the loading unit 32 includes:
Read module 321 can be used for the persistent storage type according to the location information from preset memory space
Read the location information;
Load-on module 322 can be used for reloading described pending disappear from message system according to the location information
Take data.
Further, when the persistent storage type is stored for file system, the storage unit 31 can also be used
In obtaining location information of the pending consumption data in the message system, the location information includes different type
The subregion field of consumption data and district location field;
The storage unit 31 can be also used for creating default file in the file system;
The storage unit 31, can be also used for will be in the location information persistence to the default file;
When the persistent storage type is database purchase,
The storage unit 31 can be also used for obtaining position of the pending consumption data in the message system
Information, the location information include the subregion field of different type consumption data and district location field;
The storage unit 31, can be also used for creating preset table in the database;
The storage unit 31 can be also used in the location information persistence to the preset table.
Another kind data processing equipment provided by the invention, when unusual condition occur in spark streaming processing routines
When, the pending consumption data of persistent storage in Kafka message systems by way of location information, and then in spark
After streaming processing routines reactivate, the consumption data of spark streaming processing is not in Data duplication processing
And the situation of loss of data.
The data transmission device includes processor and memory, at said memory cells 31, loading unit 32 and first
It manages 33 grade of unit to store in memory as program unit, above procedure list stored in memory is performed by processor
Member realizes corresponding function.
Comprising kernel in processor, gone in memory to transfer corresponding program unit by kernel.Kernel can set one
Or more, manpower is saved by adjusting kernel parameter, can ensure the integrality of consumption data in data handling procedure.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM), memory includes at least one deposit
Store up chip.
Present invention also provides a kind of computer program products, first when being performed on data processing equipment, being adapted for carrying out
The program code of beginningization there are as below methods step:By location information persistence of the pending consumption data in message system to pre-
Put memory space;When reboot process program, described pending disappear is reloaded from message system according to the location information
Take data;The pending consumption data is handled.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program
Product.Therefore, the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware can be used in the application
Apply the form of example.Moreover, the computer for wherein including computer usable program code in one or more can be used in the application
The computer program production that usable storage medium is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The application is with reference to the flow according to the method for the embodiment of the present application, equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that it can be realized by computer program instructions every first-class in flowchart and/or the block diagram
The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided
The processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that the instruction performed by computer or the processor of other programmable data processing devices is generated for real
The device of function specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction generation being stored in the computer-readable memory includes referring to
Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or
The function of being specified in multiple boxes.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps are performed on calculation machine or other programmable devices to generate computer implemented processing, so as in computer or
The instruction offer performed on other programmable devices is used to implement in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie
The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, CD-ROM read-only memory (CD-ROM),
Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, the storage of tape magnetic rigid disk or other magnetic storage apparatus
Or any other non-transmission medium, available for storing the information that can be accessed by a computing device.It defines, calculates according to herein
Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It these are only embodiments herein, be not limited to the application.To those skilled in the art,
The application can have various modifications and variations.All any modifications made within spirit herein and principle, equivalent replacement,
Improve etc., it should be included within the scope of claims hereof.
Claims (10)
1. a kind of data processing method, which is characterized in that including:
By location information persistence of the pending consumption data in message system to preset memory space;
When reboot process program, the pending consumption data is reloaded from message system according to the location information;
The pending consumption data is handled.
2. the according to the method described in claim 1, it is characterized in that, position by pending consumption data in message system
Confidence breath persistence to preset memory space includes:
Obtain location information of the pending consumption data in message system;
Obtain the persistent storage type of the location information;
According to the persistent storage type by the location information persistence to preset memory space.
3. according to the method described in claim 2, it is characterized in that, it is described according to the location information from message system again
The pending consumption data is loaded to include:
The location information is read from preset memory space according to the persistent storage type of the location information;
The pending consumption data is reloaded from message system according to the location information.
4. method according to any one of claim 1-3, which is characterized in that pending consumption data is disappearing described
Before location information persistence to preset memory space in breath system, the method further includes:
The pending consumption data obtained from message system is handled according to prefixed time interval.
5. according to the method described in claim 4, it is characterized in that, the pending consumption data is carried out handling it described
Afterwards, the method further includes:
After the pending consumption data obtained from message system is completed in processing, the position in the preset memory areas domain is updated
Confidence ceases, in order to load the pending consumption of next batch from the message system according to the updated location information
Data.
6. according to the method described in claim 1, it is characterized in that, when the persistent storage type is stored for file system
When, the location information persistence by pending consumption data in message system to preset memory space includes:
Location information of the pending consumption data in the message system is obtained, the location information includes inhomogeneity
The subregion field of type consumption data and district location field;
Default file is created in the file system;
It will be in the location information persistence to the default file.
7. according to the method described in claim 1, it is characterized in that, when the persistent storage type be database purchase when,
The location information persistence by pending consumption data in message system to preset memory space includes:
Location information of the pending consumption data in the message system is obtained, the location information includes inhomogeneity
The subregion field of type consumption data and district location field;
Preset table is created in the database;
It will be in the location information persistence to the preset table.
8. a kind of data processing equipment, which is characterized in that including:
Storage unit, for by location information persistence of the pending consumption data in message system to preset memory space;
Loading unit, during for reboot process program, treated according to the location information is reloaded from message system from
Manage consumption data;
First processing units, for handling the pending consumption data.
9. device according to claim 8, which is characterized in that the storage unit includes:
First acquisition module, for obtaining location information of the pending consumption data in message system;
Parsing module, for obtaining the persistent storage type of the location information;
Memory module, for according to the persistent storage type by the location information persistence to preset memory space.
10. device according to claim 9, which is characterized in that the loading unit includes:
Read module reads the position for the persistent storage type according to the location information from preset memory space
Information;
Load-on module, for reloading the pending consumption data from message system according to the location information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611247794.7A CN108255628A (en) | 2016-12-29 | 2016-12-29 | A kind of data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611247794.7A CN108255628A (en) | 2016-12-29 | 2016-12-29 | A kind of data processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108255628A true CN108255628A (en) | 2018-07-06 |
Family
ID=62720805
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611247794.7A Pending CN108255628A (en) | 2016-12-29 | 2016-12-29 | A kind of data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108255628A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109788026A (en) * | 2018-12-13 | 2019-05-21 | 新华三大数据技术有限公司 | Message treatment method and device |
CN110825533A (en) * | 2018-08-10 | 2020-02-21 | 网宿科技股份有限公司 | Data transmitting method and device |
CN112328602A (en) * | 2020-11-17 | 2021-02-05 | 中盈优创资讯科技有限公司 | Method, device and equipment for writing data into Kafka |
CN112445626A (en) * | 2019-08-29 | 2021-03-05 | 北京京东振世信息技术有限公司 | Data processing method and device based on message middleware |
CN112486986A (en) * | 2020-11-26 | 2021-03-12 | 清创网御(合肥)科技有限公司 | Automatic persistence method for consumption data of topic newly added in Kafka |
CN112559227A (en) * | 2021-01-13 | 2021-03-26 | 贵州省广播电视信息网络股份有限公司 | Spark Streaming based method for dynamically updating shared data |
WO2023040399A1 (en) * | 2021-09-18 | 2023-03-23 | 深圳前海微众银行股份有限公司 | Service persistence method and apparatus |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102486798A (en) * | 2010-12-03 | 2012-06-06 | 腾讯科技(深圳)有限公司 | Data loading method and device |
CN105574082A (en) * | 2015-12-08 | 2016-05-11 | 曙光信息产业(北京)有限公司 | Storm based stream processing method and system |
CN105791431A (en) * | 2016-04-26 | 2016-07-20 | 北京邮电大学 | On-line distributed monitoring video processing task scheduling method and device |
CN106202324A (en) * | 2016-06-30 | 2016-12-07 | 北京奇虎科技有限公司 | The data processing method of a kind of real-time calculating platform and device |
-
2016
- 2016-12-29 CN CN201611247794.7A patent/CN108255628A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102486798A (en) * | 2010-12-03 | 2012-06-06 | 腾讯科技(深圳)有限公司 | Data loading method and device |
CN105574082A (en) * | 2015-12-08 | 2016-05-11 | 曙光信息产业(北京)有限公司 | Storm based stream processing method and system |
CN105791431A (en) * | 2016-04-26 | 2016-07-20 | 北京邮电大学 | On-line distributed monitoring video processing task scheduling method and device |
CN106202324A (en) * | 2016-06-30 | 2016-12-07 | 北京奇虎科技有限公司 | The data processing method of a kind of real-time calculating platform and device |
Non-Patent Citations (3)
Title |
---|
KK303: "Spark Streaming 中使用kafka低级api+zookeeper 保存 offset 并重用 以及 相关代码整合", 《HTTPS://BLOG.CSDN.NET/KK303/ARTICLE/DETAILS/52767260?SPM=1001.2014.3001.5501》 * |
SUN_QIANGWEI: "将 Spark Streaming + Kafka direct 的 offset 保存进入Zookeeper", 《HTTPS://BLOG.CSDN.NET/SUN_QIANGWEI/ARTICLE/DETAILS/52089795》 * |
大数据部: "Spark Streaming createDirectStream保存kafka offset(JAVA实现)", 《HTTPS://BLOG.CSDN.NET/BDCHOME/ARTICLE/DETAILS/52438377》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110825533A (en) * | 2018-08-10 | 2020-02-21 | 网宿科技股份有限公司 | Data transmitting method and device |
CN110825533B (en) * | 2018-08-10 | 2022-12-20 | 网宿科技股份有限公司 | Data transmitting method and device |
CN109788026A (en) * | 2018-12-13 | 2019-05-21 | 新华三大数据技术有限公司 | Message treatment method and device |
CN109788026B (en) * | 2018-12-13 | 2022-03-08 | 新华三大数据技术有限公司 | Message processing method and device |
CN112445626A (en) * | 2019-08-29 | 2021-03-05 | 北京京东振世信息技术有限公司 | Data processing method and device based on message middleware |
CN112445626B (en) * | 2019-08-29 | 2023-11-03 | 北京京东振世信息技术有限公司 | Data processing method and device based on message middleware |
CN112328602A (en) * | 2020-11-17 | 2021-02-05 | 中盈优创资讯科技有限公司 | Method, device and equipment for writing data into Kafka |
CN112328602B (en) * | 2020-11-17 | 2023-03-31 | 中盈优创资讯科技有限公司 | Method, device and equipment for writing data into Kafka |
CN112486986A (en) * | 2020-11-26 | 2021-03-12 | 清创网御(合肥)科技有限公司 | Automatic persistence method for consumption data of topic newly added in Kafka |
CN112559227A (en) * | 2021-01-13 | 2021-03-26 | 贵州省广播电视信息网络股份有限公司 | Spark Streaming based method for dynamically updating shared data |
WO2023040399A1 (en) * | 2021-09-18 | 2023-03-23 | 深圳前海微众银行股份有限公司 | Service persistence method and apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108255628A (en) | A kind of data processing method and device | |
US10592282B2 (en) | Providing strong ordering in multi-stage streaming processing | |
Barika et al. | Orchestrating big data analysis workflows in the cloud: research challenges, survey, and future directions | |
US10198298B2 (en) | Handling multiple task sequences in a stream processing framework | |
US9418085B1 (en) | Automatic table schema generation | |
US20180074852A1 (en) | Compact Task Deployment for Stream Processing Systems | |
US8875120B2 (en) | Methods and apparatus for providing software bug-fix notifications for networked computing systems | |
EP3743822A1 (en) | Temporal optimization of data operations using distributed search and server management | |
US20180011739A1 (en) | Data factory platform and operating system | |
JP2015512099A (en) | Provide configurable workflow features | |
JP7313382B2 (en) | Frequent Pattern Analysis of Distributed Systems | |
US20210373914A1 (en) | Batch to stream processing in a feature management platform | |
CN109460439A (en) | A kind of data processing method, device, medium and electronic equipment | |
US11797527B2 (en) | Real time fault tolerant stateful featurization | |
CN104461826A (en) | Object flow monitoring method, device and system | |
CN106648839B (en) | Data processing method and device | |
CN108228193A (en) | Data capture method and device | |
CN110928941B (en) | Data fragment extraction method and device | |
CN109684051A (en) | A kind of method and system of the hybrid asynchronous submission of big data task | |
CN111125087A (en) | Data storage method and device | |
CN109101514A (en) | Data lead-in method and device | |
CN115373886A (en) | Service group container shutdown method, device, computer equipment and storage medium | |
CN111078975B (en) | Multi-node incremental data acquisition system and acquisition method | |
CN106888244A (en) | A kind of method for processing business and device | |
US9240968B1 (en) | Autogenerated email summarization process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: Beijing Guoshuang Technology Co.,Ltd. Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A Applicant before: Beijing Guoshuang Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180706 |
|
RJ01 | Rejection of invention patent application after publication |