CN108255628A - A kind of data processing method and device - Google Patents

A kind of data processing method and device Download PDF

Info

Publication number
CN108255628A
CN108255628A CN201611247794.7A CN201611247794A CN108255628A CN 108255628 A CN108255628 A CN 108255628A CN 201611247794 A CN201611247794 A CN 201611247794A CN 108255628 A CN108255628 A CN 108255628A
Authority
CN
China
Prior art keywords
location information
consumption data
message system
pending
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611247794.7A
Other languages
Chinese (zh)
Inventor
涓ユ尝
严波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201611247794.7A priority Critical patent/CN108255628A/en
Publication of CN108255628A publication Critical patent/CN108255628A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system

Abstract

The invention discloses a kind of data processing method and devices, are related to field of computer technology, and main purpose is to ensure the integrality of consumption data in data handling procedure.The method includes:By location information persistence of the pending consumption data in message system to preset memory space;When reboot process program, the pending consumption data is reloaded from message system according to the location information;The pending consumption data is handled.Present invention is mainly used for the processing of consumption data.

Description

A kind of data processing method and device
Technical field
The present invention relates to field of computer technology, especially a kind of data processing method and device.
Background technology
Kafka is that a kind of distributed post of high-throughput subscribes to message system, it can handle the net of consumer's scale Everything flow data in standing, such as the web page browsing of user, search or other user behaviors.
Usually in Kafka message systems, the senders of data is the producer, and the recipients of data is consumer, data Transmitter for transfer server, after consumer gets consumption data from Kafka message systems, need to consumption data Reading situation recorded, with facilitate according to read record consumption data is handled in real time.
At present the spark streaming components in generally use spark distribution platforms to the consumption data that gets into Row batch processing, first after consumption data is got from Kafka message systems every time, spark streaming component meetings The position of consumption data to having handled records, however, when occurring exception in the process of processing to consumption data During situation, such as the situation that treatment progress is surprisingly turned off during handling or the operation of spark streaming programs stops Under, even if spark streaming components have had recorded the position of consumption data, but reactivate treatment progress or During spark streaming programs, the loss of consumption data is also resulted in, can not ensure the integrality of consumption data.
Invention content
In view of the above problems, it is proposed that the present invention overcomes the above problem in order to provide one kind or solves at least partly A kind of data processing method and device of problem are stated, can ensure the integrality of consumption data in data handling procedure.
On the one hand, the present invention provides a kind of data processing method, including:
By location information persistence of the pending consumption data in message system to preset memory space;
When reboot process program, the pending consumption number is reloaded from message system according to the location information According to;
The pending consumption data is handled.
Further, the location information persistence by pending consumption data in message system is empty to preset storage Between include:
Obtain location information of the pending consumption data in message system;
Obtain the persistent storage type of the location information;
According to the persistent storage type by the location information persistence to preset memory space.
Further, it is described that the pending consumption data packet is reloaded from message system according to the location information It includes:
The location information is read from preset memory space according to the persistent storage type of the location information;
The pending consumption data is reloaded from message system according to the location information.
Further, in the location information persistence by pending consumption data in message system to preset storage Before space, the method further includes:
The pending consumption data obtained from message system is handled according to prefixed time interval.
Further, it is described the pending consumption data is handled after, the method further includes:
After the pending consumption data obtained from message system is completed in processing, update in the preset memory areas domain Location information, in order to load the pending of next batch from the message system according to the updated location information Consumption data.
Further, it is described that pending consumption data exists when the persistent storage type is stored for file system Location information persistence to preset memory space in message system includes:
Location information of the pending consumption data in the message system is obtained, the location information is included not The subregion field of same type consumption data and district location field;
Default file is created in the file system;
It will be in the location information persistence to the default file;
Further, it is described that pending consumption data is disappearing when the persistent storage type is database purchase Location information persistence to preset memory space in breath system includes:
Location information of the pending consumption data in the message system is obtained, the location information is included not The subregion field of same type consumption data and district location field;
Preset table is created in the database;
It will be in the location information persistence to the preset table.
On the other hand, the present invention provides a kind of data processing equipment, including:
Storage unit, for location information persistence of the pending consumption data in message system is empty to preset storage Between;
Loading unit, for when reboot process program, institute to be reloaded from message system according to the location information State pending consumption data;
First processing units, for handling the pending consumption data.
Further, the storage unit includes:
Acquisition module, for obtaining location information of the pending consumption data in message system;
Parsing module, for obtaining the persistent storage type of the location information;
Memory module, for according to the persistent storage type that the location information persistence is empty to preset storage Between.
Further, the loading unit includes:
Read module, described in being read from preset memory space according to the persistent storage type of the location information Location information;
Load-on module, for reloading the pending consumption data from message system according to the location information.
Further, described device further includes:
Second processing unit, for according to prefixed time interval to the pending consumption data that is obtained from message system into Row processing.
Further, described device further includes:
Updating unit, after handling the pending consumption data for completing to be obtained from message system, described in update Location information in the domain of preset memory areas, under being loaded from the message system according to the updated location information A batch of pending consumption data.
Further, when the persistent storage type is stored for file system,
The storage unit is additionally operable to obtain location information of the pending consumption data in the message system, The location information includes the subregion field of different type consumption data and district location field;
The storage unit is additionally operable to create default file in the file system;
The storage unit, being additionally operable to will be in the location information persistence to the default file;
Further, when the persistent storage type is database purchase,
The storage unit is additionally operable to obtain location information of the pending consumption data in the message system, The location information includes the subregion field of different type consumption data and district location field;
The storage unit is additionally operable to create preset table in the database;
The storage unit, being additionally operable to will be in the location information persistence to the preset table.
By above-mentioned technical proposal, a kind of data processing method provided by the invention and device, by by pending consumption Location information persistence of the data in message system is to preset memory space so that when the process handled consumption data In when there are abnormal conditions, and then can be added from message system according to the location information of persistence to preset memory space Pending consumption data is carried, so as to ensure the integrality of consumption data in data handling procedure.Relative to existing data processing Method, the embodiment of the present invention is by the way that after program is restarted, pending disappear is got from the preset memory space of persistence Take the location information of data, and then from the consumption data of new processing untreated completion because of program exception, consumption will not be lost Data, so as to ensure that the integrality of consumption data.
Above description is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, below the special specific embodiment for lifting the present invention.
Description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this field Technical staff will become clear.Attached drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of data processing method flow diagram provided in an embodiment of the present invention;
Fig. 2 shows another data processing method flow diagrams provided in an embodiment of the present invention;
Fig. 3 shows that the flow provided in an embodiment of the present invention by location information persistence to preset memory space is illustrated Figure;
Fig. 4 shows the flow diagram of the loading position information provided in an embodiment of the present invention from preset memory space;
Fig. 5 shows the flow diagram of data processing under normal circumstances provided in an embodiment of the present invention;
Fig. 6 shows the flow diagram of data processing under abnormal conditions provided in an embodiment of the present invention;
Fig. 7 shows a kind of data processing equipment structural diagram provided in an embodiment of the present invention;
Fig. 8 shows another data processing equipment structural diagram provided in an embodiment of the present invention.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.
An embodiment of the present invention provides a kind of data processing method, as shown in Figure 1, this method can be applied to spark points The processing of consumption data in cloth platform, by being held to location information of the pending consumption data in Kafka message systems Longization is further ensured that, when processing routine is abnormal, consumption data will not be lost, and specific steps include:
101st, by location information persistence of the pending consumption data in message system to preset memory space.
Most conventional part, such as page in the data to be used when being all site analysis website service conditions of consumption data Visit capacity is checked the contents such as information and search situation in terms of content.The common processing mode of consumption data is first each Certain file is written in kind activity in the form of daily record, then periodically for statistical analysis to file, as a plurality of types of Data pipe, there are message system in many websites now, so that convenient acquire consumption data from client web site.
Kafka is that a kind of distributed post of high-throughput subscribes to message system, it can handle the net of consumer's scale Everything flow data in standing.Usually in Kafka message systems, the senders of data is the producer, the recipient of data As consumer, the transmitter of data is transfer server.In the message queue provided in Kafka, when flat using spark distributions After a collection of consumption data of spark streaming component batchs processing in platform, the location information of processing consumption data can be recorded, To facilitate when there are emergency situations, then consumption data can be handled according to the location information of record.
For the embodiment of the present invention, location information of the pending consumption data in message system is passes through every time here Spark streaming processing routines consumption data is exported after next group consumption data to be treated in Kafka message Location information in system, since the amount of message in Kafka message systems is very large, real messages amount is Website page browsing As many as several times of sum, it is therefore necessary to divide one group of mutually independent subregion for the data flow in system, be referred to by the producer Which subregion is fixed every message belong to, and then can be accurately in an orderly manner from Kafka message systems according to the district location of record Obtain consumption data.
Here persistence process be by the location information persistence of consumption data pending in Kafka message queues extremely Preset memory space can also obtain next consumption data by the location information of the current consumption data of record here certainly Location information, so as to which by the location information persistence of next consumption data to preset memory space, which can For monitoring consumer for the consumption of consumption data, the embodiment of the present invention does not limit preset memory space, example Such as:Distributed file system (Hadoop Distributed File System, HDFS) may be used.
102nd, when reboot process program, described pending disappear is reloaded from message system according to the location information Take data.
It should be noted that when using the spark streaming component batchs processing Kafka in spark distribution platforms During the message queue of offer, it is possible that abnormal conditions, such as:Spark streaming components when handling consumption data, Process is accidentally shut down or stops the spark streaming programs currently consumed.
After spark streaming processing routines are restarted, consumed according to the persistence that preset memory space stores Data positional information, it may be determined that wait to locate at the end of spark streaming component last time batch processing Kafka message queues The location information of consumption data (also untreated complete consumption data) is managed, further passes the location information of pending consumption data Kafka message systems are given, so as to facilitate spark streaming components according to location information corresponding in Kafka message systems Reload above-mentioned pending consumption data.
103rd, the pending consumption data is handled.
According to the consumption data location information of the persistence stored in preset memory space, it may be determined that spark Consumption data location information and consumption next time number at the end of streaming component last time batch processing Kafka message queues According to location information, the starting point of the current consumption data position for needing to consume Kafka message queues is determined, using spark points The message queue that spark streaming component batchs processing Kafka in cloth platform is provided.
For the embodiment of the present invention, spark streaming are one and quasi real time handle frame, and the processing response time is general As unit of minute, that is to say, that the delay time for handling real time data is a second rank.Spark streaming pass through distribution Receiver on each node caches the data flow received from Kafka, and data stream packets are dressed up spark to handle RDD forms, so that spark clusters can be handled data stream in real time, so as to will treated that data export needs to having The business end asked, so as to business end, to treated, data are analyzed.
Can be seen that a kind of data processing method provided in an embodiment of the present invention with reference to above-mentioned realization method, pass through by Location information persistence of the pending consumption data in message system is to preset memory space so that is carried out when to consumption data It, can be from message system according to the location information of persistence to preset memory space when occurring abnormal conditions during processing Pending consumption data is reloaded, so as to ensure the integrality of consumption data in data handling procedure.Relative to existing number According to processing method, the embodiment of the present invention from the preset memory space of persistence by after program is restarted, getting and treating The location information of consumption data is handled, and then handles the consumption data of the untreated completion because of program exception again, will not be lost Consumption data is lost, so as to ensure that the integrality of consumption data.
Below in order to which a kind of data processing method proposed by the present invention is explained in more detail, especially for using different Persistence type, the embodiment of the present invention additionally provides another data processing method, as shown in Fig. 2, the specific step of this method Suddenly include:
201st, the pending consumption data obtained from message system is handled according to prefixed time interval.
Under normal conditions, spark streaming components are run in spark clusters to recycle according to prefixed time interval Ground obtains consumption data from Kafka message queues, that is, obtains consumption number from Kafka message queues with the predetermined time cycle According to.Real-time Computational frame of the spark streaming components as structure on spark carries out the consumption data got Batch processing.
It should be noted that the embodiment of the present invention does not limit prefixed time interval here, preferably for 1s~5s's Time interval, specifically can be by the time window that pre-sets, in the slave Kafka message systems according to time window cycle Obtain pending consumption data.
202nd, by location information persistence of the pending consumption data in message system to preset memory space.
For the embodiment of the present invention, location information persistence of the pending consumption data in message system is deposited to preset The step of storing up space can include but is not limited to following manner, first by spark platforms to the consumption data position that has handled It puts and is recorded, obtain location information of the pending consumption data in message system, obtained by the property parameters for parsing data The persistence type of location information of the pending consumption data in message system is taken, property parameters here is pass through spark The parameter that pending consumption data is configured in platform, property parameters here could be provided as file system or database, and then According to persistent storage type by location information persistence of the pending message data in message system to preset memory space, For example, when persistence type is file system, spark streaming are by the corresponding filename of system file and get Location information preserve to preset storage location.
When the persistent storage type is file system, position of the pending consumption data in message system is believed Following method may be used in breath persistence to preset memory space, obtains position of the pending consumption data in message system first Confidence ceases, and location information here includes the subregion field of different type consumption data and district location field, due to every Item is published to the message data of Kafka all there are one classification, this classification is referred to as Topic, physically difference Topic disappears Breath data are stored separately to different subregions, for example, there is 5 subregions in message system, subregion field is respectively A, B, C, D, E, In logic although the message data of a Topic is stored on the server of one or more Kafka but user need to only specify and disappear The Topic of breath data can be produced or consumption data is without being concerned about message data is stored in where, in addition, the consumption of each classification Data, which are recorded in subregion, also corresponding district location field, and default file is then created in file system, here pre- If file name can be named according to subregion field of the pending consumption data in message system, in order to which different type disappears Take the lookup of data, can also arbitrarily name, the embodiment of the present invention is without limiting;Finally by pending consumption data in message In location information persistence to default file in system.
Illustratively, the name of above-mentioned default file can be arbitrarily named by creating a file in file system For kafka_offset.txt, then location information of the consumption data in message system is passed to by way of passing and joining In spark streaming programs, then file output content is " 2:1428808”、“0:1456813 " and " 1:1431255”; Location information of the consumption data in message system is wherein shown in output file content, " 2 ", " 0 " and " 1 " are Kafka Topic subregion fields are consumed, correspondingly " 1428808 ", " 1456813 " and " 1431255 " are corresponding zone bit successively Put field.
When the persistent storage type is database purchase, by position of the pending consumption data in message system Following method may be used in information persistence to preset memory space, obtains pending consumption data first in message system Location information, location information here is same as above, then this is not repeated;Then preset table is created in the database, finally By in location information persistence to preset table of the pending consumption data in message system, the row and column in the preset table Subregion field and district location field can be filled in respectively, in order to the lookup of different type consumption data.
Illustratively, the name of above-mentioned preset table can be arbitrarily named by creating a table in the database Then location information of the consumption data in message system is passed to spark by kafka_offset by way of passing and joining In streaming programs, then two fields are mainly included in preset table:Field 1 is subregion field and field 2 is zone bit Field is put, it is shown such as following table.Wherein there is location information of the consumption data in message system in output database preset table, such as “2:1428808”、“0:1456813 " and " 1:1431255”;During wherein output formats content is shown, " 2 ", " 0 " and " 1 " Topic subregion fields are consumed for Kafka, correspondingly " 1428808 ", " 1456813 " and " 1431255 " are corresponding successively District location field.
For the embodiment of the present invention, when spark streaming processing routines occur in batch processing Kafka message queues It is abnormal, when being unable to operate normally, location information of the consumption data to be handled in message system is recorded in preset memory space, To work as after spark streaming restore, it is untreated complete last time can be loaded from Kafka message systems according to location information Into consumption data, the loss of data will not be caused, specifically hold location information of the pending consumption data in message system The flow of longization to preset memory space is as shown in Figure 3.
203rd, when reboot process program, described pending disappear is reloaded from message system according to the location information Take data.
For the embodiment of the present invention, loaded from message system according to location information pending consumption data can include but Following manner is not limited to, obtains pending consumption data in message system by parsing the property parameters of consumption data first Location information persistent storage type, the description of property parameters such as step 202 here, then this does not repeat;Then root Location information of the consumption data in message system is read from preset memory space according to the persistent storage type, here Record has the location information that each batch processing completes consumption data in file or table in preset memory space;Last root Next batch consumption data to be treated is loaded from message system according to corresponding location information, to work as spark After streaming restores, the consumption number of last time untreated completion can be loaded from Kafka message systems according to location information According to will not lead to the loss of data, the flow of pending consumption data is loaded such as from message system with specific reference to location information Shown in Fig. 4.
204th, the pending consumption data is handled.
According to the consumption data location information of persistence in preset memory space, it may be determined that spark streaming groups The location information of consumption data at the end of part last time batch processing Kafka message queues, and number is consumed at the end of According to location information can determine the location information of next consumption data, i.e., currently need to consume the consumption of Kafka message queues The starting point of Data Position is provided using the spark streaming component batchs processing Kafka in spark distribution platforms Message queue.
205th, after the pending consumption data obtained from message system is completed in processing, the preset memory areas is updated Location information in domain, in order to load treating for next batch from the message system according to the updated location information Handle consumption data.
Due to spark streaming component processes datas flow according to the scheduled time cycle from Kafka message system Consumption data is obtained in system, and consumption data is handled, therefore, every time after consumption data has been handled, it is necessary to update The location information of next batch consumption data to be treated in the domain of preset memory areas, it is convenient when there are emergency situations, again Then consumption data can be handled according to the location information of persistence after startup program.
For the embodiment of the present invention, Spark streaming are after the complete pending consumption data of a batch per treatment, meeting The location information of persistent storage in storage region is updated, when spark streaming processing routine normal operations, Directly according to the location information of the consumption data of Spark streaming component records, disappearing for next batch is obtained from Kafka Take data, it, can be according to the position of persistence in the domain of preset memory areas when spark streaming processing routines occur abnormal Information obtains the consumption data of next batch from Kafka.
Following realization methods can be included but is not limited to for the concrete application scene of the embodiment of the present invention:Work as spark During streaming processing routine normal operations, spark streaming components can according to time window set time cycle at Reason obtains consumption data from Kafka message queues, and will handle the consumption data output completed, in spark streaming groups It can be by the location information of the consumption data of next batch pending in Kafka message queues after part complete consumption data per treatment Persistence to preset memory space, step flow chart as shown in Figure 5 works as spark by way of database table When streaming components are abnormal rear reboot process program, by the location information that is recorded in database table from message system Continue to load the consumption data of pending next batch in system, continue to handle consumption data according to service logic, and The consumption data that output processing is completed, step flow chart as shown in Figure 6.
Due to excessive or extraneous interference of data volume etc., it is likely to occur spark in data processing Streaming treatment progress accidental interruption or stopping, and may result in batch processing during process is again started up The failure of data and loss, in order to ensure the validity of data processing, another kind data processing method of the embodiment of the present invention, when When unusual condition occur in spark streaming processing routines, by the pending consumption data of persistent storage in Kafka message The mode of location information in system, and then ensure after spark streaming processing routines reactivate, spark The consumption data of streaming processing is not in the situation of Data duplication processing and loss of data.
Further, the specific implementation as method shown in Fig. 1, the embodiment of the present invention provide a kind of data processing equipment, The device embodiment is corresponding with preceding method embodiment, and for ease of reading, the present apparatus is no longer to thin in preceding method embodiment Section content is repeated one by one, is realized in preceding method embodiment it should be understood that the device in the present embodiment can correspond to Full content, as shown in fig. 7, described device includes:
Storage unit 31 can be used for location information persistence of the pending consumption data in message system to preset Memory space, the storage unit 31 are that location information of the pending consumption data of persistence in message system is used in the present apparatus Main functional modules, the persistence to consumption data can be specifically realized by way of system file or database;
Loading unit 32 can be used for, when reboot process program, being loaded from message system according to the location information The pending consumption data, the loading unit 32 are to be used to load pending consumption number from preset memory space in the present apparatus According to location information main functional modules, the position of pending consumption data can be specifically read according to different persistence types Confidence ceases, and then pending data is loaded from message system;
First processing units 33 can be used for handling the pending consumption data, which is this For handling the main functional modules of consumption data in device, can specifically be treated according to service logic to what is loaded from Kafka Processing consumption data is handled.
A kind of data processing equipment provided in an embodiment of the present invention, by by pending consumption data in message system Location information persistence is to preset memory space so that when there are abnormal conditions in the process of processing to consumption data, And then pending consumption data can be loaded from message system according to the location information of persistence to preset memory space, So as to ensure the integrality of consumption data in data handling procedure.Compared with existing data processing method, the present invention is real It applies example and is believed by after program is restarted, getting the position of pending consumption data from the preset memory space of persistence Breath, and then from the consumption data of new processing untreated completion because of program exception, consumption data will not be lost, so as to ensure that The integrality of consumption data.
Further, as shown in figure 8, described device further includes:
Second processing unit 34 can be used for according to pending consumption of the prefixed time interval to being obtained from message system Data are handled, which is to be used under processing routine normal condition in the present apparatus by setting time window pair The main functional modules that the consumption data obtained from message system is handled;
Updating unit 35, after can be used for the pending consumption data that processing completion is obtained from message system, more Location information in the new preset memory areas domain, in order to according to the updated location information from the message system The pending consumption data of next batch is loaded, which is to be used to update storage persistence in region in the present apparatus Location information disappears in order to load the pending of next batch from the message system according to the updated location information Take data.
Further, the storage unit 31 includes:
Acquisition module 311 can be used for obtaining location information of the pending consumption data in message system;
Parsing module 312 can be used for obtaining the persistent storage type of location information;
Memory module 313 can be used for location information persistence to preset storage according to the persistent storage type Space.
Further, the loading unit 32 includes:
Read module 321 can be used for the persistent storage type according to the location information from preset memory space Read the location information;
Load-on module 322 can be used for reloading described pending disappear from message system according to the location information Take data.
Further, when the persistent storage type is stored for file system, the storage unit 31 can also be used In obtaining location information of the pending consumption data in the message system, the location information includes different type The subregion field of consumption data and district location field;
The storage unit 31 can be also used for creating default file in the file system;
The storage unit 31, can be also used for will be in the location information persistence to the default file;
When the persistent storage type is database purchase,
The storage unit 31 can be also used for obtaining position of the pending consumption data in the message system Information, the location information include the subregion field of different type consumption data and district location field;
The storage unit 31, can be also used for creating preset table in the database;
The storage unit 31 can be also used in the location information persistence to the preset table.
Another kind data processing equipment provided by the invention, when unusual condition occur in spark streaming processing routines When, the pending consumption data of persistent storage in Kafka message systems by way of location information, and then in spark After streaming processing routines reactivate, the consumption data of spark streaming processing is not in Data duplication processing And the situation of loss of data.
The data transmission device includes processor and memory, at said memory cells 31, loading unit 32 and first It manages 33 grade of unit to store in memory as program unit, above procedure list stored in memory is performed by processor Member realizes corresponding function.
Comprising kernel in processor, gone in memory to transfer corresponding program unit by kernel.Kernel can set one Or more, manpower is saved by adjusting kernel parameter, can ensure the integrality of consumption data in data handling procedure.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM), memory includes at least one deposit Store up chip.
Present invention also provides a kind of computer program products, first when being performed on data processing equipment, being adapted for carrying out The program code of beginningization there are as below methods step:By location information persistence of the pending consumption data in message system to pre- Put memory space;When reboot process program, described pending disappear is reloaded from message system according to the location information Take data;The pending consumption data is handled.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware can be used in the application Apply the form of example.Moreover, the computer for wherein including computer usable program code in one or more can be used in the application The computer program production that usable storage medium is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The application is with reference to the flow according to the method for the embodiment of the present application, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that it can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided The processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that the instruction performed by computer or the processor of other programmable data processing devices is generated for real The device of function specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction generation being stored in the computer-readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps are performed on calculation machine or other programmable devices to generate computer implemented processing, so as in computer or The instruction offer performed on other programmable devices is used to implement in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, CD-ROM read-only memory (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, available for storing the information that can be accessed by a computing device.It defines, calculates according to herein Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It these are only embodiments herein, be not limited to the application.To those skilled in the art, The application can have various modifications and variations.All any modifications made within spirit herein and principle, equivalent replacement, Improve etc., it should be included within the scope of claims hereof.

Claims (10)

1. a kind of data processing method, which is characterized in that including:
By location information persistence of the pending consumption data in message system to preset memory space;
When reboot process program, the pending consumption data is reloaded from message system according to the location information;
The pending consumption data is handled.
2. the according to the method described in claim 1, it is characterized in that, position by pending consumption data in message system Confidence breath persistence to preset memory space includes:
Obtain location information of the pending consumption data in message system;
Obtain the persistent storage type of the location information;
According to the persistent storage type by the location information persistence to preset memory space.
3. according to the method described in claim 2, it is characterized in that, it is described according to the location information from message system again The pending consumption data is loaded to include:
The location information is read from preset memory space according to the persistent storage type of the location information;
The pending consumption data is reloaded from message system according to the location information.
4. method according to any one of claim 1-3, which is characterized in that pending consumption data is disappearing described Before location information persistence to preset memory space in breath system, the method further includes:
The pending consumption data obtained from message system is handled according to prefixed time interval.
5. according to the method described in claim 4, it is characterized in that, the pending consumption data is carried out handling it described Afterwards, the method further includes:
After the pending consumption data obtained from message system is completed in processing, the position in the preset memory areas domain is updated Confidence ceases, in order to load the pending consumption of next batch from the message system according to the updated location information Data.
6. according to the method described in claim 1, it is characterized in that, when the persistent storage type is stored for file system When, the location information persistence by pending consumption data in message system to preset memory space includes:
Location information of the pending consumption data in the message system is obtained, the location information includes inhomogeneity The subregion field of type consumption data and district location field;
Default file is created in the file system;
It will be in the location information persistence to the default file.
7. according to the method described in claim 1, it is characterized in that, when the persistent storage type be database purchase when, The location information persistence by pending consumption data in message system to preset memory space includes:
Location information of the pending consumption data in the message system is obtained, the location information includes inhomogeneity The subregion field of type consumption data and district location field;
Preset table is created in the database;
It will be in the location information persistence to the preset table.
8. a kind of data processing equipment, which is characterized in that including:
Storage unit, for by location information persistence of the pending consumption data in message system to preset memory space;
Loading unit, during for reboot process program, treated according to the location information is reloaded from message system from Manage consumption data;
First processing units, for handling the pending consumption data.
9. device according to claim 8, which is characterized in that the storage unit includes:
First acquisition module, for obtaining location information of the pending consumption data in message system;
Parsing module, for obtaining the persistent storage type of the location information;
Memory module, for according to the persistent storage type by the location information persistence to preset memory space.
10. device according to claim 9, which is characterized in that the loading unit includes:
Read module reads the position for the persistent storage type according to the location information from preset memory space Information;
Load-on module, for reloading the pending consumption data from message system according to the location information.
CN201611247794.7A 2016-12-29 2016-12-29 A kind of data processing method and device Pending CN108255628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611247794.7A CN108255628A (en) 2016-12-29 2016-12-29 A kind of data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611247794.7A CN108255628A (en) 2016-12-29 2016-12-29 A kind of data processing method and device

Publications (1)

Publication Number Publication Date
CN108255628A true CN108255628A (en) 2018-07-06

Family

ID=62720805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611247794.7A Pending CN108255628A (en) 2016-12-29 2016-12-29 A kind of data processing method and device

Country Status (1)

Country Link
CN (1) CN108255628A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109788026A (en) * 2018-12-13 2019-05-21 新华三大数据技术有限公司 Message treatment method and device
CN110825533A (en) * 2018-08-10 2020-02-21 网宿科技股份有限公司 Data transmitting method and device
CN112328602A (en) * 2020-11-17 2021-02-05 中盈优创资讯科技有限公司 Method, device and equipment for writing data into Kafka
CN112445626A (en) * 2019-08-29 2021-03-05 北京京东振世信息技术有限公司 Data processing method and device based on message middleware
CN112486986A (en) * 2020-11-26 2021-03-12 清创网御(合肥)科技有限公司 Automatic persistence method for consumption data of topic newly added in Kafka
CN112559227A (en) * 2021-01-13 2021-03-26 贵州省广播电视信息网络股份有限公司 Spark Streaming based method for dynamically updating shared data
WO2023040399A1 (en) * 2021-09-18 2023-03-23 深圳前海微众银行股份有限公司 Service persistence method and apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102486798A (en) * 2010-12-03 2012-06-06 腾讯科技(深圳)有限公司 Data loading method and device
CN105574082A (en) * 2015-12-08 2016-05-11 曙光信息产业(北京)有限公司 Storm based stream processing method and system
CN105791431A (en) * 2016-04-26 2016-07-20 北京邮电大学 On-line distributed monitoring video processing task scheduling method and device
CN106202324A (en) * 2016-06-30 2016-12-07 北京奇虎科技有限公司 The data processing method of a kind of real-time calculating platform and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102486798A (en) * 2010-12-03 2012-06-06 腾讯科技(深圳)有限公司 Data loading method and device
CN105574082A (en) * 2015-12-08 2016-05-11 曙光信息产业(北京)有限公司 Storm based stream processing method and system
CN105791431A (en) * 2016-04-26 2016-07-20 北京邮电大学 On-line distributed monitoring video processing task scheduling method and device
CN106202324A (en) * 2016-06-30 2016-12-07 北京奇虎科技有限公司 The data processing method of a kind of real-time calculating platform and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KK303: "Spark Streaming 中使用kafka低级api+zookeeper 保存 offset 并重用 以及 相关代码整合", 《HTTPS://BLOG.CSDN.NET/KK303/ARTICLE/DETAILS/52767260?SPM=1001.2014.3001.5501》 *
SUN_QIANGWEI: "将 Spark Streaming + Kafka direct 的 offset 保存进入Zookeeper", 《HTTPS://BLOG.CSDN.NET/SUN_QIANGWEI/ARTICLE/DETAILS/52089795》 *
大数据部: "Spark Streaming createDirectStream保存kafka offset(JAVA实现)", 《HTTPS://BLOG.CSDN.NET/BDCHOME/ARTICLE/DETAILS/52438377》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825533A (en) * 2018-08-10 2020-02-21 网宿科技股份有限公司 Data transmitting method and device
CN110825533B (en) * 2018-08-10 2022-12-20 网宿科技股份有限公司 Data transmitting method and device
CN109788026A (en) * 2018-12-13 2019-05-21 新华三大数据技术有限公司 Message treatment method and device
CN109788026B (en) * 2018-12-13 2022-03-08 新华三大数据技术有限公司 Message processing method and device
CN112445626A (en) * 2019-08-29 2021-03-05 北京京东振世信息技术有限公司 Data processing method and device based on message middleware
CN112445626B (en) * 2019-08-29 2023-11-03 北京京东振世信息技术有限公司 Data processing method and device based on message middleware
CN112328602A (en) * 2020-11-17 2021-02-05 中盈优创资讯科技有限公司 Method, device and equipment for writing data into Kafka
CN112328602B (en) * 2020-11-17 2023-03-31 中盈优创资讯科技有限公司 Method, device and equipment for writing data into Kafka
CN112486986A (en) * 2020-11-26 2021-03-12 清创网御(合肥)科技有限公司 Automatic persistence method for consumption data of topic newly added in Kafka
CN112559227A (en) * 2021-01-13 2021-03-26 贵州省广播电视信息网络股份有限公司 Spark Streaming based method for dynamically updating shared data
WO2023040399A1 (en) * 2021-09-18 2023-03-23 深圳前海微众银行股份有限公司 Service persistence method and apparatus

Similar Documents

Publication Publication Date Title
CN108255628A (en) A kind of data processing method and device
US10592282B2 (en) Providing strong ordering in multi-stage streaming processing
Barika et al. Orchestrating big data analysis workflows in the cloud: research challenges, survey, and future directions
US10198298B2 (en) Handling multiple task sequences in a stream processing framework
US9418085B1 (en) Automatic table schema generation
US20180074852A1 (en) Compact Task Deployment for Stream Processing Systems
US8875120B2 (en) Methods and apparatus for providing software bug-fix notifications for networked computing systems
EP3743822A1 (en) Temporal optimization of data operations using distributed search and server management
US20180011739A1 (en) Data factory platform and operating system
JP2015512099A (en) Provide configurable workflow features
JP7313382B2 (en) Frequent Pattern Analysis of Distributed Systems
US20210373914A1 (en) Batch to stream processing in a feature management platform
CN109460439A (en) A kind of data processing method, device, medium and electronic equipment
US11797527B2 (en) Real time fault tolerant stateful featurization
CN104461826A (en) Object flow monitoring method, device and system
CN106648839B (en) Data processing method and device
CN108228193A (en) Data capture method and device
CN110928941B (en) Data fragment extraction method and device
CN109684051A (en) A kind of method and system of the hybrid asynchronous submission of big data task
CN111125087A (en) Data storage method and device
CN109101514A (en) Data lead-in method and device
CN115373886A (en) Service group container shutdown method, device, computer equipment and storage medium
CN111078975B (en) Multi-node incremental data acquisition system and acquisition method
CN106888244A (en) A kind of method for processing business and device
US9240968B1 (en) Autogenerated email summarization process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20180706

RJ01 Rejection of invention patent application after publication