CN105204776B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN105204776B
CN105204776B CN201510507458.0A CN201510507458A CN105204776B CN 105204776 B CN105204776 B CN 105204776B CN 201510507458 A CN201510507458 A CN 201510507458A CN 105204776 B CN105204776 B CN 105204776B
Authority
CN
China
Prior art keywords
message
write
consumer
data processing
forwarding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510507458.0A
Other languages
Chinese (zh)
Other versions
CN105204776A (en
Inventor
洪彬
吴娅
张侃
刘彦伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201510507458.0A priority Critical patent/CN105204776B/en
Publication of CN105204776A publication Critical patent/CN105204776A/en
Application granted granted Critical
Publication of CN105204776B publication Critical patent/CN105204776B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention provides a kind of data processing method and device, data can be written to fast and stable.Wherein, it is that multiple write orders are added in write-in batch processing that data processing method, which includes: the more message transformations that will be received,;It is used as a collection of message sequence that write-ahead log file is written write-in batch processing;Message in write-ahead log file is forwarded in different message subject files according to message subject type;Record forwarding site information, auxiliary positioning forwards starting point when forwarding site information is for forwarding operation next time.

Description

Data processing method and device
Technical field
The present invention relates to computer data field of storage, and in particular to a kind of data processing method and device.
Background technique
It is the epoch of a data huge explosion now, real time data processing has become each internet and the concern of tradition IT company Focus.Message transmission usually provides data to the storage of data service end by the producer, then consumes with the model of storage Person extracts data from data service end.During this, guarantee data stabilization high efficiency of transmission and real-time, quickly storage seem to It closes important.
The operating method at existing data service end is generally as follows:
Write phase: server-side parses the every message received immediately, then the master according to belonging to the message Topic type is written into corresponding message subject file.The disadvantage is that: message write-in is related to the parsing to every message, causes to write It is low to enter rate.Longer to which more message times be written, being easy to be interfered causes write-in to fail.
The reading stage: server-side takes random read operation, the program to message in most of message transmission storage models Upper time loss with higher is sought in disk, therefore reading efficiency is low.And in a small number of message transmission storage models, Consumption water level line when consumption operation every time is recorded in client, but the record is stored in client, is had inconvenient The shortcomings that.
Summary of the invention
In view of this, the present invention provides a kind of data processing method and device, data can be written to fast and stable.
To achieve the above object, according to an aspect of the invention, there is provided data processing method, comprising: will receive More message transformations be multiple write orders be added write-in batch processing in;Said write batch processing is suitable as a collection of message Write-ahead log file is written in sequence;The message in the write-ahead log file is forwarded to according to message subject type different In message subject file;Record forwarding site information, auxiliary positioning when the forwarding site information is for forwarding operation next time Forward starting point.
Optionally, further includes: receive the extraction request that consumer issues, include consumer identification in extractions request with The message subject file name for needing to check;It is requested according to the extraction, needs to mention in the message subject file checked from described Data corresponding with the consumer are taken to be sent to the consumer.
Optionally, it is requested described according to the extraction, extraction and institute from the message subject file for needing to check After stating the step of corresponding data of consumer are sent to the consumer, further includes: the consumption site of record consumption operation Information, auxiliary positioning consumes starting point when the consumption site information is for consumption operation next time.
Optionally, the more message that first receiving module receives come from the producer.
Optionally, the write-ahead log file, the message subject file are stored in disk.
To achieve the above object, according to another aspect of the present invention, a kind of data processing equipment is provided, comprising: the One receiving module, the more message transformations for will receive are that multiple write orders are added in write-in batch processing;Pre- writing module, For the write-ahead log file to be written using said write batch processing as a collection of message sequence;Forwarding module is used for institute The message stated in write-ahead log file is forwarded in different message subject files according to message subject type;First record Module, for recording forwarding site information, auxiliary positioning is forwarded when the forwarding site information is for forwarding operation next time Point.
Optionally, further includes: the second receiving module, for receiving the extraction request of consumer's sending, the extraction request In include consumer identification and need the message subject file name checked;Data transmission blocks, for being asked according to the extraction It asks, needs extraction data corresponding with the consumer in the message subject file checked to be sent to the consumption from described Person.
Optionally, further includes: the second logging modle, for recording the consumption site information of consumption operation, the consumption position Auxiliary positioning consumes starting point when point information is for consumption operation next time.
Optionally, the more message that first receiving module receives come from the producer.
Optionally, the write-ahead log file, the message subject file are stored in disk.
Introduce write-ahead log file according to the technique and scheme of the present invention, first by message write-in write-ahead log file then into Row is forwarded in corresponding message subject file, and attribute easy to be lost is not allowed once persistence since write-ahead log file has, Even if disconnecting with the producer will not cause message to store as long as therefore write-in write-ahead log file completes writing process Failure.And the parsing of every message is also related to when prior art write-in, fail just in case interrupting just write-in.Therefore, of the invention Technical solution has the advantages that writing process fast and stable.
Detailed description of the invention
Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:
Fig. 1 is the schematic diagram of the main flow of data processing method according to an embodiment of the present invention;
Fig. 2 is the schematic diagram of the main component of data processing equipment according to an embodiment of the present invention;
Fig. 3 is the process schematic that technical solution of the present invention carries out data write-in and reading data.
Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
In order to enable those skilled in the art to better understand the present invention, brief introduction first is done such as to herein presented term concepts Under.
Data-message (message): abbreviation message, load demand data and in the producer, data service end and consumer Between transmit.
The message producer (producer): the abbreviation producer is responsible for generating information and sending message to data service end.
Message consumer (consumer): abbreviation consumer is responsible for the consumption of message, by consumer actively from data service End pulling data is simultaneously parsed into message and consumes.
Data service end (server): also known as data processing equipment or data server.
Message subject file (topic): abbreviation theme is defined by the user and configures at data service end.The producer sends Under message to some message subject file, consumer consumes message under some message subject file.
Consumer is grouped (group): multiple consumers can consume the message under a message subject file jointly, each Consumer spending part messages.These consumers just form a grouping, possess same consumer's group names, usually also referred to as For consumer's cluster.
Offset (offset): each message subject file of the message in server-side is organized into a file column Table, consumer's pulling data need to know that the offset of data hereof, the offset are offset.Offset is absolute Offset.Data service end can convert offset to the relative displacement of specific file.
Data medium (data carrier): refer to the entity of record data, usually disk.
Fig. 1 is the schematic diagram of the main flow of data processing method according to an embodiment of the present invention.As shown in Figure 1, the party Method may include following step A to step D.
Step A: being that multiple write orders (writecommand) is added at write-in batch by the more message transformations received It manages in (writebatch).
Wherein, these more message can come from the producer.It should be noted that no matter receiving more message is to come from What the producer or Bulk producers of default, either asynchronous system the obtain or method of synchronization obtained, all first split into list Message is then converted to multiple write orders and is incorporated in write-in batch processing.
Step B: will write-in batch processing as a collection of message sequence be written write-ahead log (Write-Ahead Logging, Wal) file.
Write-ahead log is a kind of mechanism in many databases for realizing atomic transaction, its central idea is to data The modification of file must can only occur in these modifications to be recorded after log, that is, write data after first writing log.
Wherein, the format of write-ahead log file can be exemplified below:
Filename: db- serial number .log
Content: BatchControlHead (length is 4 byte)
BatchType (length is 1 byte)
HeadMagic (length is 11 byte)
ContentLength (length that length is a collection of message)
Checksum (length is 8 byte)
Content (length that length is data)
TailMagic (length is 13 byte)
Step C: the message in write-ahead log file is forwarded to different message subject files according to message subject type In.
Specifically: more message in write-ahead log file are parsed one by one, according to the affiliated message subject type of each message This message is forwarded in corresponding message subject file.
Wherein, the format of message subject file can be exemplified below:
Filename: the message sum-topic message sum .log (example of the upper file of date-walID- serial number-db- Such as: 20150316-1-1-db-0-0.log)
Content: magic (length is 2 byte)
Message body length (length is 1 int)
Type (length is 1 byte)
Specific message body
Magic (length is 6 byte)
Step D: record forwarding site information, auxiliary positioning forwards when forwarding site information is for forwarding operation next time Point.
Specifically, a datacarrier file can be set up in disk, recorded in " walID | offset " etc. Hold.Wherein walID indicates that wal reference number of a document, offset indicate the offset of corresponding forwarding operation.By recording current consumption The forwarding site information of operation, positioning forwarding starting point, realizes and is sequentially written in, ensure that number when being conducive to forward operation next time According to the high efficiency of write-in.
It should be noted that the above process can be realized based on disk.In other words, write-ahead log file and message subject text Part can store in disk.
From the foregoing, it will be observed that the data processing method of the embodiment of the present invention introduces wal file, wal file first is written into message Then be forwarded in corresponding topic, due to wal file do not allow once persistence it is easy to be lost, as long as write-in wal text Even if part completes writing process, message storage failure will not be led to by disconnecting with the producer.And when prior art write-in The parsing of every message is also related to, just in case interrupt just write-in failure.Therefore, method of the invention has writing process quickly steady Fixed advantage.
It can also include that step E and step F (does not show in Fig. 1 in the embodiment of data processing method of the invention Out).
Step E: the extraction for receiving consumer's sending is requested, and extracting in request includes that consumer identification disappears with what needs were checked Cease subject document title.
Step F: requesting according to extracting, and extracts data relevant to consumer simultaneously from the message subject file for needing to check It is sent to consumer.
In the embodiment of the data processing method, includes not only data writing process, further comprise the mistake of reading data Journey.
It, can also be including step G (in Fig. 1 not after step F in the embodiment of data processing method of the invention It shows).
Step G: the consumption site information of record consumption operation, it is auxiliary when the consumption site information is for consumption operation next time Help positioning consumption starting point.The designated position the topic sequence consumption data message from request may be implemented in this way, avoid random write Mode is taken to expend the time during disk tracking, to have the advantages that reading efficiency is high.
Specifically: corresponding consumer water bit line can be respectively set for each topic and record file (subscriberinfofile), it consumes message offset amount for recording corresponding consumer in topic.It needs to illustrate It is that the consumption site information can recorde in the memory of server-side and in non-consumer, can quickly calls data in this way.
Wherein, the format of consumer water bit line record file can be exemplified below:
Filename: subscriberinfofile.data
Content: each group of consumption+" r n " wherein:
The specific format of each group of consumption: group name | the consumption of each consumer in organizing | it is not identified Consume list.
The specific format of the consumption of each consumer in group: some consumer's current message offset | whether confirm Mark (value is confirmation).
Be not identified consumption list specific format: some consumer's current message offset | whether acknowledgement indicator (value It is unconfirmed).
The specific format of some consumer's current message offset: reference number of a document | current file offset | what this read Byte number.
Fig. 2 is the schematic diagram of the main component of data processing equipment according to an embodiment of the present invention.As shown in Fig. 2, the number It mainly include the first receiving module 21, pre- writing module 22, forwarding module 23 and the first logging modle 24 according to processing unit 20.First Receiving module 21 is used to be that multiple write orders are added in write-in batch processing by the more message transformations received.Wherein, first The more message that receiving module 21 receives can come from the producer.Pre- writing module 22 will be for that will be written batch processing as a batch Write-ahead log file is written in message sequence.Forwarding module 23 is used for the message in write-ahead log file according to message subject type It is forwarded in different message subject files.First logging modle 24 is for recording forwarding site information, the forwarding site information Auxiliary positioning forwards starting point when for forwarding operation next time.It should be noted that data processing equipment 20 can be based on disk It realizes, in other words, write-ahead log file and message subject file can store in disk.
From the foregoing, it will be observed that the data processing equipment of the embodiment of the present invention introduces wal file, wal file first is written into message Then be forwarded in corresponding topic, due to wal file do not allow once persistence it is easy to be lost, as long as write-in wal text Even if part completes writing process, message storage failure will not be led to by disconnecting with the producer.And when prior art write-in The parsing of every message is also related to, just in case interrupt just write-in failure.Therefore, the device of the invention has writing process quickly steady Fixed advantage.
In embodiments of the present invention, data processing equipment 20 can also include that the second receiving module and data send mould Block.Second receiving module is used to receive the extraction request of consumer's sending, and extracting in request includes that consumer identification and needs are looked into The message subject file name seen.Data transmission blocks are used to be requested according to extraction, from the message subject file for needing to check It extracts data corresponding with consumer and is sent to consumer.It not only include data in the embodiment of the data processing equipment Writing process further comprises the process of reading data.
In embodiments of the present invention, data processing equipment 20 can also include the second logging modle.Second record mould Block is used to record the consumption site information of consumption operation, and auxiliary positioning is consumed when consumption site information is for consumption operation next time Starting point.In the embodiment, the consumption site information of current consumption operation is recorded by the second logging modle in order to next Positioning consumption starting point, may be implemented the designated position the topic sequence consumption data message from request, avoids when secondary consumption operates Random reading manner expends the time during disk is sought, therefore has the advantages that reading efficiency is high.
Those skilled in the art can refer to the number of the process of data processing shown in Fig. 3 to better understand the invention According to processing method and processing device.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention Within.

Claims (10)

1. a kind of data processing method characterized by comprising
It is that multiple write orders are added in write-in batch processing by the more message transformations received;
Write-ahead log file is written using said write batch processing as a collection of message sequence;
The message in the write-ahead log file is forwarded in different message subject files according to message subject type;
Record forwarding site information, auxiliary positioning forwards starting point when the forwarding site information is for forwarding operation next time;Institute Stating forwarding site information includes the current message offset of each consumer and each consumer is corresponding is in consumer's grouping No confirmation mark.
2. data processing method according to claim 1, which is characterized in that further include:
The extraction request that consumer issues is received, it is described to extract the message subject for including consumer identification in request and needing to check File name;
It is requested according to the extraction, extracts number corresponding with the consumer from the message subject file for needing to check According to being sent to the consumer.
3. data processing method according to claim 2, which is characterized in that requested described according to the extraction, from institute It states and needs to extract the step of data corresponding with the consumer are sent to the consumer in the message subject file checked Later, further includes:
The consumption site information of record consumption operation, auxiliary positioning disappears when the consumption site information is for consumption operation next time Take starting point.
4. data processing method according to any one of claims 1 to 3, which is characterized in that more received disappear Breath comes from the producer.
5. data processing method according to any one of claims 1 to 3, which is characterized in that the write-ahead log file, institute Message subject file is stated to be stored in disk.
6. a kind of data processing equipment characterized by comprising
First receiving module, the more message transformations for will receive are that multiple write orders are added in write-in batch processing;
Pre- writing module, for write-ahead log file to be written using said write batch processing as a collection of message sequence;
Forwarding module, for the message in the write-ahead log file to be forwarded to different disappear according to message subject type It ceases in subject document;
First logging modle, it is auxiliary when the forwarding site information is for forwarding operation next time for recording forwarding site information Help positioning forwarding starting point;The forwarding site information includes the current message offset of each consumer in consumer's grouping, and Whether each consumer is corresponding confirms mark.
7. data processing equipment according to claim 6, which is characterized in that further include:
Second receiving module includes consumer identification for receiving the extraction request of consumer's sending, in extractions request with The message subject file name for needing to check;
Data transmission blocks, for being requested according to the extraction, extraction and institute from the message subject file for needing to check It states the corresponding data of consumer and is sent to the consumer.
8. data processing equipment according to claim 7, which is characterized in that further include:
Second logging modle, for recording the consumption site information of consumption operation, the consumption site information is for disappearing next time Auxiliary positioning consumption starting point when taking operation.
9. according to the described in any item data processing equipments of claim 6 to 8, which is characterized in that first receiving module connects The more message received come from the producer.
10. according to the described in any item data processing equipments of claim 6 to 8, which is characterized in that the write-ahead log file, The message subject file is stored in disk.
CN201510507458.0A 2015-08-18 2015-08-18 Data processing method and device Active CN105204776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510507458.0A CN105204776B (en) 2015-08-18 2015-08-18 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510507458.0A CN105204776B (en) 2015-08-18 2015-08-18 Data processing method and device

Publications (2)

Publication Number Publication Date
CN105204776A CN105204776A (en) 2015-12-30
CN105204776B true CN105204776B (en) 2019-06-04

Family

ID=54952495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510507458.0A Active CN105204776B (en) 2015-08-18 2015-08-18 Data processing method and device

Country Status (1)

Country Link
CN (1) CN105204776B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107665092B (en) * 2016-07-28 2019-11-12 华为技术有限公司 A kind of storage method and device
CN109254870B (en) * 2018-08-01 2021-05-18 华为技术有限公司 Data backup method and device
CN113126919B (en) * 2021-04-02 2023-01-06 山东英信计算机技术有限公司 Method, system and storage medium for improving performance of RocksDB

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955717A (en) * 2012-11-05 2013-03-06 北京奇虎科技有限公司 Message management equipment and method in distributed message processing system
CN102981911A (en) * 2012-11-05 2013-03-20 北京奇虎科技有限公司 Distributed message handling system and device and method thereof
CN103034540A (en) * 2012-11-16 2013-04-10 北京奇虎科技有限公司 Distributed information system, device and coordinating method thereof
CN103034541A (en) * 2012-11-16 2013-04-10 北京奇虎科技有限公司 Distributing type information system and equipment and method thereof
CN104092767A (en) * 2014-07-21 2014-10-08 北京邮电大学 Posting/subscribing system for adding message queue models and working method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955717A (en) * 2012-11-05 2013-03-06 北京奇虎科技有限公司 Message management equipment and method in distributed message processing system
CN102981911A (en) * 2012-11-05 2013-03-20 北京奇虎科技有限公司 Distributed message handling system and device and method thereof
CN103034540A (en) * 2012-11-16 2013-04-10 北京奇虎科技有限公司 Distributed information system, device and coordinating method thereof
CN103034541A (en) * 2012-11-16 2013-04-10 北京奇虎科技有限公司 Distributing type information system and equipment and method thereof
CN104092767A (en) * 2014-07-21 2014-10-08 北京邮电大学 Posting/subscribing system for adding message queue models and working method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KAFKA分布式消息系统;ljdgm;《www.cnblogs.com/downey/p/5302048.html》;20150105;第1-4页

Also Published As

Publication number Publication date
CN105204776A (en) 2015-12-30

Similar Documents

Publication Publication Date Title
US8583743B1 (en) System and method for message gateway consolidation
CN105204776B (en) Data processing method and device
CN103236959A (en) Test system and test method for testing business processing module
CN103279474A (en) Video file index method and system
CN103488732A (en) Generation method and device of static pages
CN103763368B (en) A kind of method of data synchronization across data center
CN111641700B (en) Ceph object-based management and retrieval implementation method for storage metadata
CN103825918B (en) Method of data synchronization, terminal device, server and system
CN105354236A (en) Reconciliation information generation method and system
US20140337343A1 (en) Method, computer program and computer for detecting communities in social media
CN103207916A (en) Metadata processing method and device
CN101197844B (en) File storage method based on subscription issuing system
JP6364727B2 (en) Information processing system, distributed processing method, and program
CN103312743A (en) Data synchronization device and method
CN106528667A (en) Low-power-consumption mass data full-text retrieval system frame capable of carrying out read-write separation
CN109491807A (en) Data exchange method, device and system
US9742832B2 (en) Transmission apparatus, transmission method, computer-readable storage medium storing transmission program, and relay system
EP1158446A2 (en) Method and apparatus for unified message storage and delivery system
EP2442235A1 (en) Method adapting to mobile search interface, search server and system thereof
CN104834724A (en) Method and device for synchronizing database
CN101814071A (en) Method and device for realizing data exchange between system and data source
JP6107410B2 (en) System, relay device and program
CN109284292A (en) A kind of generation method of mobile operation list generates system and relevant apparatus
CN102981934A (en) Log transition method and log transition device
JP5947622B2 (en) Telegram distribution device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant