CN105589884A - Data processing method and data processing device - Google Patents

Data processing method and data processing device Download PDF

Info

Publication number
CN105589884A
CN105589884A CN201410577196.0A CN201410577196A CN105589884A CN 105589884 A CN105589884 A CN 105589884A CN 201410577196 A CN201410577196 A CN 201410577196A CN 105589884 A CN105589884 A CN 105589884A
Authority
CN
China
Prior art keywords
data
real time
network packet
retrieve
retrieve data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410577196.0A
Other languages
Chinese (zh)
Other versions
CN105589884B (en
Inventor
罗勋
朱峰明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410577196.0A priority Critical patent/CN105589884B/en
Publication of CN105589884A publication Critical patent/CN105589884A/en
Application granted granted Critical
Publication of CN105589884B publication Critical patent/CN105589884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a data processing method and a data processing device, wherein the method comprises the following steps of: acquiring real time data, packaging the real time data to be network data packets to obtain network real-time data; transforming network packet real time data into a retrieval data format to obtain retrieval data of the network packet real time data; and putting the retrieval data of the network packet real time data in storage for retrieval. With the method and the device provided by the invention, dependence of a system on a storage system is reduced, storage performance and system stability are improved greatly, retrieval delay fluctuation can be controlled to be in a millisecond level, retrieval efficiency is increased and resource consumption is reduced; moreover, through a technology of marking a data timestamp, the method and the device are compatible to concurrent storage of real time data and historical file data, and avoid a problem of disorder caused by concurrent storage of the two types of data.

Description

Data processing method and device
Technical field
The present invention relates to networking technology area, relate in particular to a kind of data processing method and device.
Background technology
At present, network real time data warehousing system is derivative by the Input System of Webpage search.The real time data of this system is based on file alternately, and real time data is with the side of file one by oneFormula is pushed to this system and carries out retrieve data conversion processing, and the retrieve data of generation is also with fileForm storage. Whether searching system is by there being new data file raw in timing scan storage systemBecome database data. Its data flow specifically as shown in Figure 1.
There is following defect in existing this real time data warehousing system:
First, the data interaction based on file extremely relies on the performance of storage system, when real-time literary compositionAfter part stored number acquires a certain degree, there will be the very slow problem of access, cause retrieval to be prolonged(data are from being generated to the time difference that can be retrieved) is multiplied late.
Secondly, the data refresh mode of this system adopts file timing scan, comprehensive storage systemPerformance, can only sweep time granularity be set to a minute rank, i.e. the search delay of this systemFor a minute rank, exist in actual use entry time unstable, very large problem fluctuates.
In addition, the stability of data file access greatly depends on the stability of storage system, hasThe higher probability of makeing mistakes, and when storage system appearance fluctuation, by the data file that causes makeing mistakesNeed manually to repair, thus, increased operation maintenance cost.
Summary of the invention
The embodiment of the present invention provides a kind of data processing method and device, is intended to solve real time dataInput System warehouse-in performance and stability is low, search delay and the high technical problem of cost.
The embodiment of the present invention has proposed a kind of data processing method, comprising:
Obtain real time data, described real time data is packaged as to network packet, obtain network packetReal time data;
Network packet real time data is converted into retrieve data form, obtains network packet real time dataRetrieve data;
The retrieve data of network packet real time data is put in storage for retrieval.
The embodiment of the present invention also proposes a kind of data processing equipment, comprising:
Acquisition module, for obtaining real time data, is packaged as network data by described real time dataBag, obtains network packet real time data;
Conversion module, for network packet real time data is converted into retrieve data form, obtains netThe retrieve data of network bag real time data;
Enter library module, for the retrieve data of network packet real time data is put in storage for retrieval.
A kind of data processing method and device that the embodiment of the present invention proposes, by by real time dataBe packaged as network packet, network packet real time data is converted into retrieve data form, obtain netThe retrieve data of network bag real time data; The retrieve data of network packet real time data is put in storage for inspectionRope, because the real time data of system interaction transmission is network packet data, has reduced system pair thusIn the dependence of storage system, greatly promote warehouse-in performance and the stability of a system, retrieval can be prolongedFluctuation is late controlled at a millisecond rank, improves recall precision, reduces resource consumption; In addition pass through,The technology of flag data timestamp, warehouse-in when compatible real time data and history file data,Avoid two class data to put the out of order problem of bringing in storage simultaneously.
Brief description of the drawings
Fig. 1 is the data flow schematic diagram of existing real time data warehousing system;
Fig. 2 is the hardware configuration signal of the data processing equipment that relates to of embodiment of the present invention schemeFigure;
Fig. 3 is the schematic flow sheet of data processing method the first embodiment of the present invention;
Fig. 4 is the schematic flow sheet of data processing method the second embodiment of the present invention;
Fig. 5 is the schematic flow sheet of data processing method of the present invention the 3rd embodiment;
Fig. 6 is the high-level schematic functional block diagram of data processing equipment the first embodiment of the present invention;
Fig. 7 is the high-level schematic functional block diagram of data processing equipment the second embodiment of the present invention.
For make technical scheme of the present invention clearer, understand, below in conjunction with accompanying drawing do intoOne step describes in detail.
Detailed description of the invention
Should be appreciated that specific embodiment described herein is only in order to explain the present invention, noBe used for limiting the present invention.
The main thought of embodiment of the present invention scheme is: by the real time data packing producing in networkFor network packet, afterwards network packet real time data is converted into retrieve data form, obtain netThe retrieve data of network bag real time data; The retrieve data of network packet real time data is put in storage for inspectionRope, because the real time data of system interaction transmission is network packet data, has reduced system pair thusIn the dependence of storage system, greatly promote warehouse-in performance and the stability of a system, and retrieval can be prolongedFluctuation is late controlled at a millisecond rank, improves recall precision, reduces resource consumption; In addition pass through,The technology of flag data timestamp, warehouse-in when compatible real time data and history file data,Avoid two class data to put the out of order problem of bringing in storage simultaneously.
The real time data that the embodiment of the present invention relates to refers to: the needs that successively produce in networkEnter the network data of library searching, such as the data of user's webpage clicking generation, again such as user repaiiesChange the data that QQ group's information (as group's title, group's signature etc.) produces, after these data loadings,Can be for user search inquiry, such as inquiring about QQ group's title etc.
The history file data that the embodiment of the present invention relates to refer to: the needs warehouse-in inspection regularly producingThe file data of rope, such as the regularly data of propelling movement of Website page.
Because existing network real time data warehousing system is when by real time data warehousing, be by realityTime data be pushed to this system in the mode of file one by one and carry out retrieve data conversion processing,The mode that adopts file scanning to upgrade realizes the data interaction between modules, and search delay isMinute rank, exists entry time unstable in actual use, and very large problem fluctuates. WithTime based on file alternately when the access, be subject to the impact of file system performance, the probability of makeing mistakes is high,This also makes maintenance cost uprise.
In addition, the large data retrieval business in internet requires more and more the search delay of real time dataHeight, some even requires a millisecond rank so that user can perceive the change of content fastChange. Simultaneously large data service also has increasing challenge to data loading performance.
For the defect of existing this demand and trend and prior art, the present invention has designedA kind of Input System in real time.
In embodiment of the present invention scheme, the real time data of the mutual transmission of Input System is in real time netNetwork bag data, can reduce the dependence of system for storage system thus, can be by real time dataEnter fast library searching by network, greatly promote warehouse-in performance and the stability of a system, meet large numberRequirement according to business to data loading performance, and search delay fluctuation can be controlled to MillisecondNot, improve recall precision, reduce resource consumption; In addition, by the skill of flag data timestampArt, when compatible real time data and history file data, warehouse-in avoids two class data to enter simultaneouslyThe out of order problem that library tape comes.
Particularly, the system running environment that the present embodiment scheme relates to comprises and being applied in networkData processing equipment. This device can be carried on PC end, also can be carried on mobile phone, flat boardComputers etc. can browsing page, use the various mobile terminals of network application. This data processing dressThe hardware configuration of putting can be as shown in Figure 2.
With reference to Fig. 2, this data processing equipment can comprise: processor 1001, and for example CPU,Network interface 1004, user interface 1003, memory 1005, communication bus 1002. Wherein,Communication bus 1002 leads to for the connection realizing between the each building block of this data processing equipmentLetter. User interface 1003 can comprise display screen (Display), keyboard (Keyboard), mouseThe assemblies such as mark, for receiving the information of user's input, and are sent to processor by the information of reception1005 process. Display screen can be LCD display, LED display, also can be for touchingTouch screen,, for the data that show that open platform need to show, for example display application task gets,Award such as gets at the operation interface. Selectable user interface 1003 can also comprise that the wired of standard connectsMouth, wave point. Network interface 1004 optionally can comprise the wireline interface, wireless of standardInterface (as WI-FI interface). Memory 1005 can be high-speed RAM memory, also canStable memory (non-volatilememory), for example magnetic disc store. Memory 1005It can also be optionally the storage device that is independent of aforementioned processing device 1001. As shown in Figure 2, doIn memory 1005 for a kind of computer-readable storage medium, can comprise operating system, network serviceModule, Subscriber Interface Module SIM and data processor.
In the data processing equipment shown in Fig. 2, network interface 1004 is mainly used in network and puts downPlatform, carries out data communication with the network platform; User interface 1003 is mainly used in connecting client,Carry out data communication with client, receive information and the instruction of client input; And processor1001 can be for calling in memory 1005 data processor of storage, and below carrying outOperation:
Obtain real time data, described real time data is packaged as to network packet, obtain network packetReal time data;
Network packet real time data is converted into retrieve data form, obtains network packet real time dataRetrieve data;
The retrieve data of network packet real time data is put in storage for retrieval.
Further, in one embodiment, processor 1001 calls in memory 1005 and depositsThe data processor of storage, can also carry out following operation:
The text data format conversion of described network packet real time data, for arranging form, is obtained to netThe retrieve data of network bag real time data.
Further, in one embodiment, processor 1001 calls in memory 1005 and depositsThe data processor of storage, can also carry out following operation:
Obtain history file data;
Described history file data are converted into retrieve data form, obtain history file dataRetrieve data;
The described warehouse-in of the retrieve data by network packet real time data comprises for the step of retrieval:
By the retrieve data of the retrieve data of network packet real time data and described history file dataWarehouse-in is for retrieval.
Further, in one embodiment, processor 1001 calls in memory 1005 and depositsThe data processor of storage, can also carry out following operation:
After the described step of obtaining real time data, also comprise:
To described real time data mark timestamp;
After the described step of obtaining history file data, also comprise:
To described history file data markers timestamp;
Described by the retrieval of the retrieve data of network packet real time data and described history file dataData loading comprises for the step of retrieval:
According to the timestamp of mark, to the retrieve data of described network packet real time data with described in go throughThe retrieve data of history file data is put in storage and is processed for retrieval.
Further, in one embodiment, processor 1001 calls in memory 1005 and depositsThe data processor of storage, can also carry out following operation:
Judge the inspection of retrieve data and the described history file data of described network packet real time dataWhether rope data there is the identical but situation that content is different of search index; If so,
Obtain respectively timestamp and the historical literary composition of the retrieve data of described network packet real time dataThe timestamp of the retrieve data of event data;
Judge timestamp and the history file data of the retrieve data of described network packet real time dataThe sequencing of timestamp of retrieve data;
The retrieve data of the network packet real time data after timestamp is leaned on or history file dataRetrieve data is put in storage for retrieval.
Further, in one embodiment, processor 1001 calls in memory 1005 and depositsThe data processor of storage, can also carry out following operation:
Import corresponding retrieve data into searching system warehouse-in.
The present embodiment, by such scheme, is packaged as network packet by real time data, by networkBag real time data is converted into retrieve data form, obtains the retrieve data of network packet real time data;The retrieve data of network packet real time data is put in storage for retrieval, due to the reality of system interaction transmissionTime data be network packet data, reduced thus the dependence of system for storage system, can be byReal time data enters library searching fast by network, has greatly promoted warehouse-in performance and system stabilityProperty, meet the requirement of large data service to data loading performance, can be by search delay fluctuation controlBuilt in millisecond rank, improve recall precision, reduce resource consumption; In addition, pass through flag dataThe technology of timestamp, when compatible real time data and history file data, warehouse-in, avoids two classesData are put the out of order problem of bringing in storage simultaneously.
Based on above-mentioned hardware structure, data processing method embodiment of the present invention is proposed.
As shown in Figure 3, first embodiment of the invention proposes a kind of data processing method, comprising:
Step S101, obtains real time data, and described real time data is packaged as to network packet,Obtain network packet real time data;
As previously mentioned, the real time data that the embodiment of the present invention relates to refers to: successive in networkThe network data that need to enter library searching producing, such as the data of user's webpage clicking generation, againSuch as user revises the data that QQ group's information (as group's title, group's signature etc.) produces, theseAfter data loading, can be for user search inquiry, such as inquiring about QQ group's title etc.
First, obtain by data processing equipment the needs warehouse-in inspection producing in real time in NetworkThe data of rope, this real time data form is generally text formatting, needs to transform before entering library searchingFor specific retrieve data form, rear extended meeting is described in detail this.
Then, the real time data of obtaining is packaged as to network packet, obtains network packet and count in real timeAccording to, network packet is serializing conventionally PROTOBUF or PB form.
Adoption Network packet is transmission alternately in real time data warehousing system, can reduce in real timeData loading system, for the dependence of storage system, can be saved storage system, so not onlyReal time data can be entered to library searching fast by network, reduce search delay fluctuation, greatly carryRise up into storehouse performance and the stability of a system, meet the requirement of large data service to data loading performance,And due to without storage system is set, thereby can reduce resource consumption.
Step S102, is converted into retrieve data form by network packet real time data, obtains networkThe retrieve data of bag real time data;
As previously mentioned, real time data form is generally text formatting, needs to turn before entering library searchingTurn to specific retrieve data form, this is because the number in large data service warehouse-in searching systemAccording to thering is specific format, the row's of being generally form.
Concrete conversion process is as follows:
First data processing equipment is cut word by the network packet real time data of text formatting and (also canTo be called participle), such as " Tengxun's in-house network " this document, cut word for " Tengxun ", " inPortion ", " net ", obtain thus multiple index terms " Tengxun ", " inside ", " net ", then, will" Tengxun's in-house network " this document adds " Tengxun " to, " inside ", " net " these three ropesDraw in the each self-corresponding document sequence of word (lists of documents), i.e. the row of falling, is arranged formRetrieve data, stores the search file that index terms is corresponding and (can pass through word in the document listBar mark) so that when retrieval, can, according to this index terms, search lists of documents, therebyRetrieve corresponding document.
Such as, there are two real time datas " Tengxun's in-house network " and " Tengxun's extranets ", " TengxunIn-house network " cut word for " Tengxun ", " inside ", " net ", obtain index terms " Tengxun ", " inPortion ", " net ", then, generate arrange form retrieve data, generate fall row process, justThat " Tengxun's in-house network " this section of document added to " Tengxun ", " inside ", " net " this threeIn the each self-corresponding document sequence of individual index terms, the retrieve data of being arranged form.
In addition, " Tengxun's extranets " cut word for " Tengxun ", " outside ", " net ", obtain indexWord " Tengxun ", " outside ", " net ", then, generate the retrieve data of arranging form, generatesThe process of the row of falling, adds " Tengxun " to " Tengxun's extranets " this section of document exactly, " outerPortion ", in the each self-corresponding document sequence of " net " these three index terms, the inspection of being arranged formRope data.
In the time of retrieval, if input term " Tengxun " can show " Tengxun inside simultaneouslyNet " and " Tengxun's extranets " two entries confession users selection, thereby retrieve corresponding document(can be linked to corresponding webpage by corresponding URL, not describe in detail at this).
Step S103, puts the retrieve data of network packet real time data for retrieval in storage.
Obtaining after the retrieve data of network packet real time data, by the retrieval of network packet real time dataData loading is for retrieval.
In the present embodiment, can there are two kinds of retrieve data warehouse-in modes:
First kind of way, the present embodiment data processing equipment place Input System is directly by network packetThe retrieve data warehouse-in of real time data, then need to carry out search operaqtion according to user.
That is to say, in the present embodiment, data processing equipment place system can have searching systemFunction, can be directly real time data warehousing by obtaining, after packing and retrieve data format conversionRetrieval, and do not need extra searching system, can reduce like this complexity of overall data flow processDegree.
The second way, in the time that retrieve data is put in storage, can also import retrieve data into retrievalSystem warehouse-in, provides search function by searching system.
That is to say, searching system function can be separated, connect by searching system independentlyThe retrieve data of the network packet real time data in income storehouse and the follow-up history file data of mentioningRetrieve data, need to carry out search operaqtion by outside searching system according to user.
The present embodiment, by such scheme, is packaged as network packet by real time data, by networkBag real time data is converted into retrieve data form, obtains the retrieve data of network packet real time data;The retrieve data of network packet real time data is put in storage for retrieval, due to the reality of system interaction transmissionTime data be network packet data, reduced thus the dependence of system for storage system, can be byReal time data enters library searching fast by network, has greatly promoted warehouse-in performance and system stabilityProperty, meet the requirement of large data service to data loading performance, can be by search delay fluctuation controlBuilt in millisecond rank, improve recall precision, reduce resource consumption.
Further, in order to realize the warehouse-in of history file data, the present invention is based on above-mentioned firstEmbodiment, also proposes the second embodiment of data processing method, can network enabled real time dataWarehouse-in with history file data time.
As shown in Figure 4, second embodiment of the invention proposes a kind of data processing method, based on upperState embodiment, before the step for retrieval by the retrieve data warehouse-in of network packet real time dataAlso comprise:
Step S104, obtains history file data;
Step S105, is converted into retrieve data form by described history file data, is gone throughThe retrieve data of history file data;
Above-mentioned steps S103: the retrieve data of network packet real time data is put in storage for retrieval toolBody comprises:
Step S1031, by the retrieve data of network packet real time data and described history file dataRetrieve data put in storage for retrieval.
Compare above-described embodiment, the present embodiment can also be realized the warehouse-in of history file data.
History file data refer to the file data that need to enter library searching regularly producing in network,Such as the regularly data of propelling movement of Website page.
Particularly, first, obtain the history file that need to enter library searching regularly producing in networkData, then, are converted into retrieve data form by described history file data, obtain historical literary compositionThe retrieve data of event data, wherein, the retrieve data conversion process of history file data is with above-mentionedThe conversion process of real time data; Finally by the retrieve data of network packet real time data and history fileThe retrieve data of data is put in storage for retrieval.
Wherein, the retrieve data of the retrieve data of network packet real time data and history file data canTo put in storage simultaneously, also can put in storage respectively, in this no limit.
The present embodiment, by such scheme, has reduced the dependence of system for storage system, canReal time data is entered to library searching fast by network, greatly promoted warehouse-in performance and system stabilityProperty, meet the requirement of large data service to data loading performance, can be by search delay fluctuation controlBuilt in millisecond rank, improve recall precision, reduce resource consumption, but also can be by historical literary compositionEvent data warehouse-in, has met system requirements.
Further, put generation in storage for fear of real time data and history file data simultaneouslyThe problem that data are out of order, the present invention is based on above-mentioned the second embodiment, also proposes data processing methodThe 3rd embodiment.
As shown in Figure 5, third embodiment of the invention proposes a kind of data processing method, based on upperState the second embodiment, after obtaining the step of real time data, also comprise:
Step S106, to described real time data mark timestamp;
After the described step of obtaining history file data, also comprise:
Step S107, to described history file data markers timestamp;
Above-mentioned steps S1031: by the retrieve data of network packet real time data and described history fileThe retrieve data warehouse-in of data specifically can comprise for retrieval:
Step S1032: according to the timestamp of mark, to the retrieval of described network packet real time dataThe retrieve data of data and described history file data is put in storage and is processed for retrieval.
The present embodiment can network enabled real time data and warehouse-in when file data, but examinesWorry to: because real time data exists with in history file data the data that some indexes are identical, butIts content difference, searching system can cover legacy data warehouse-in to the identical data of index. Therefore,Two data flow are put the out of order problem of data that has in storage simultaneously.
For addressing this problem, the present embodiment Input System is introduced the mechanism of timestamp, is retrievalData mark, and searching system determines whether warehouse-in by judging time order and function.
Particularly, after obtaining real time data, to described real time data mark timestamp,After obtaining history file data, to described history file data markers timestamp.
Follow-up, by the retrieval of the retrieve data of network packet real time data and described history file dataWhen data loading, according to the timestamp of mark, to the retrieve data of described network packet real time dataPut processing in storage with the retrieve data of described history file data.
As a kind of embodiment, according to the timestamp of mark, to described network packet real time dataRetrieve data and the retrieve data of described history file data put in storage and process and can adoptFollowing mode:
First, judge the inspection of retrieve data and the described history file data of network packet real time dataWhether rope data there is the identical but situation that content is different of search index; If exist, obtain respectivelyGet the timestamp of retrieve data and the inspection of history file data of described network packet real time dataThe timestamp of rope data.
Then, judge timestamp and the history file of the retrieve data of described network packet real time dataThe sequencing of the timestamp of the retrieve data of data.
Finally, the network packet real time data after timestamp is leaned on or the retrieval number of history file dataFor retrieval, can avoid thus real time data and history file data to put product in storage according to warehouse-in simultaneouslyThe raw out of order problem of data, has improved the accuracy that enters database data, realizes entering database dataPrecisely control.
It should be noted that, in the present embodiment, to the step of described real time data mark timestampSuddenly, can, after obtaining real time data, real time data be packaged as to network packet before completeBecome; Also can obtain real time data, after real time data is packaged as to network packet, willNetwork packet real time data completes before being converted into retrieve data form; Certainly, can also be by netAfter network bag real time data is converted into retrieve data form, by the retrieval number of network packet real time dataAccording to completing before warehouse-in.
In like manner, to the step of described history file data markers timestamp, history can obtainedAfter file data, the step that history file data is converted into retrieve data form is before completeBecome; Also can, obtaining history file data, history file data be converted into retrieve data latticeAfter the step of formula, before by the retrieve data warehouse-in of history file data, complete.
Compared to existing technology, the present embodiment scheme has following beneficial effect:
1, search delay can be shortened to a millisecond rank, after user data update, Neng GouliRetrieve, thereby improve recall precision.
2, Hoisting System warehouse-in performance greatly, for example, CPU monokaryon disposal ability is from beforeHundreds of unit data can rise to 10000 unit datas, has promoted tens times.
3, reduce the dependence of Input System for storage system, greatly improved stability, willDelay fluctuation is controlled at a millisecond rank. Save storage system simultaneously, reduce resource consumption.
4, the raising of stability has also reduced the cost of artificial operation.
5, by the technology of flag data timestamp, compatible real time data and history file dataIn time, puts in storage, avoids two class data to put the out of order problem of bringing in storage simultaneously.
6, can pass through route technology, the searching system data loading essence of online and off-line is providedAccurate control, avoids switching the data that cause and omits and repeat in a large number the storage resources waste causing.
Corresponding said method embodiment, the present invention also proposes data processing equipment embodiment.
As shown in Figure 6, first embodiment of the invention proposes a kind of data processing equipment, comprising:Acquisition module 201, conversion module 202 and enter library module 203, wherein:
Acquisition module 201, for obtaining real time data, is packaged as network by described real time dataPacket, obtains network packet real time data;
Conversion module 202, for network packet real time data is converted into retrieve data form,To the retrieve data of network packet real time data;
Enter library module 203, for the retrieve data of network packet real time data is put in storage for retrieval.
Particularly, as previously mentioned, the real time data that the embodiment of the present invention relates to refers to: in networkThe network data that need to enter library searching successively producing, such as user's webpage clicking producesData, again such as user revises the number that QQ group's information (as group's title, group's signature etc.) producesAccording to, after these data loadings, can be for user search inquiry, such as inquiring about QQ group's title etc.
First, obtain by data processing equipment the needs warehouse-in inspection producing in real time in NetworkThe data of rope, this real time data form is generally text formatting, needs to transform before entering library searchingFor specific retrieve data form, rear extended meeting is described in detail this.
Then, the real time data of obtaining is packaged as to network packet, obtains network packet and count in real timeAccording to, network packet is serializing conventionally PROTOBUF or PB form.
Adoption Network packet is transmission alternately in real time data warehousing system, can reduce in real timeData loading system, for the dependence of storage system, can be saved storage system, so not onlyReal time data can be entered to library searching fast by network, reduce search delay fluctuation, greatly carryRise up into storehouse performance and the stability of a system, meet the requirement of large data service to data loading performance,And due to without storage system is set, thereby can reduce resource consumption.
Afterwards, network packet real time data is converted into retrieve data form, obtains network packet real-timeThe retrieve data of data.
As previously mentioned, real time data form is generally text formatting, needs to turn before entering library searchingTurn to specific retrieve data form, this is because the number in large data service warehouse-in searching systemAccording to thering is specific format, the row's of being generally form.
Concrete conversion process is as follows:
First data processing equipment is cut word by the network packet real time data of text formatting and (also canTo be called participle), such as " Tengxun's in-house network " this document, cut word for " Tengxun ", " inPortion ", " net ", obtain thus multiple index terms " Tengxun ", " inside ", " net ", then, will" Tengxun's in-house network " this document adds " Tengxun " to, " inside ", " net " these three ropesDraw in the each self-corresponding document sequence of word (lists of documents), the retrieve data of being arranged form,In the document list, store the search file that index terms is corresponding (can identify by entry),So that when retrieval can, according to this index terms, search lists of documents, thereby retrieve correspondenceDocument.
Such as, there are two real time datas " Tengxun's in-house network " and " Tengxun's extranets ", " TengxunIn-house network " cut word for " Tengxun ", " inside ", " net ", obtain index terms " Tengxun ", " inPortion ", " net ", then, generate arrange form retrieve data, generate fall row process, justThat " Tengxun's in-house network " this section of document added to " Tengxun ", " inside ", " net " this threeIn the each self-corresponding document sequence of individual index terms, the retrieve data of being arranged form.
In addition, " Tengxun's extranets " cut word for " Tengxun ", " outside ", " net ", obtain indexWord " Tengxun ", " outside ", " net ", then, generate the retrieve data of arranging form, generatesThe process of the row of falling, adds " Tengxun " to " Tengxun's extranets " this section of document exactly, " outerPortion ", in the each self-corresponding document sequence of " net " these three index terms, the inspection of being arranged formRope data.
In the time of retrieval, if input term " Tengxun " can show " Tengxun inside simultaneouslyNet " and " Tengxun's extranets " two entries confession users selection, thereby retrieve corresponding document(can be linked to corresponding webpage by corresponding URL, not describe in detail at this).
Obtaining after the retrieve data of network packet real time data, by the retrieval of network packet real time dataData loading is for retrieval.
In the present embodiment, can there are two kinds of retrieve data warehouse-in modes:
First kind of way, the present embodiment data processing equipment place Input System is directly by network packetThe retrieve data warehouse-in of real time data, then need to carry out search operaqtion according to user.
That is to say, in the present embodiment, data processing equipment place system can have searching systemFunction, can be directly real time data warehousing by obtaining, after packing and retrieve data format conversionRetrieval, and do not need extra searching system, can reduce like this complexity of overall data flow processDegree.
The second way, in the time that retrieve data is put in storage, can also import retrieve data into retrievalSystem warehouse-in, provides search function by searching system.
That is to say, searching system function can be separated, connect by searching system independentlyThe retrieve data of the network packet real time data in income storehouse and the follow-up history file data of mentioningRetrieve data, need to carry out search operaqtion by outside searching system according to user.
Compared to existing technology, the present embodiment scheme does not re-use the data interactive mode of file, andUse network to give out a contract for a project and be responsible for the transmission of data between modules, the access of having saved file,The impact of having avoided storage system to cause. Search delay is eased down to a millisecond rank simultaneously.
Further, described acquisition module 201, also for obtaining history file data;
Described conversion module 202, also for being converted into retrieve data by described history file dataForm, obtains the retrieve data of history file data;
Described enter library module 203, also for by the retrieve data of network packet real time data with described inThe retrieve data of history file data is put in storage for retrieval.
The present embodiment can also be realized the warehouse-in of history file data.
History file data refer to the file data that need to enter library searching regularly producing in network,Such as the regularly data of propelling movement of Website page.
Particularly, first, obtain the history file that need to enter library searching regularly producing in networkData, then, are converted into retrieve data form by described history file data, obtain historical literary compositionThe retrieve data of event data; Finally by the retrieve data of network packet real time data and history file numberAccording to retrieve data put in storage for retrieval.
Wherein, the retrieve data of the retrieve data of network packet real time data and history file data canTo put in storage simultaneously, also can put in storage respectively, in this no limit.
The present embodiment, by such scheme, has reduced the dependence of system for storage system, canReal time data is entered to library searching fast by network, greatly promoted warehouse-in performance and system stabilityProperty, meet the requirement of large data service to data loading performance, can be by search delay fluctuation controlBuilt in millisecond rank, improve recall precision, reduce resource consumption, but also can be by historical literary compositionEvent data warehouse-in, has met system requirements.
Further, put generation in storage for fear of real time data and history file data simultaneouslyThe problem that data are out of order, the present invention is based on above-described embodiment, also proposes of data processing equipmentTwo embodiment.
As shown in Figure 7, second embodiment of the invention proposes a kind of data processing equipment, based on upperState embodiment, also comprise:
Mark module 204, for after described acquisition module obtains real time data, to describedReal time data mark timestamp; And after described acquisition module obtains history file data,To described history file data markers timestamp;
Described enter library module 203, also for according to the timestamp of mark, real to described network packetTime the retrieve data of data and the retrieve data of described history file data put in storage process withFor retrieval.
The present embodiment can network enabled real time data and warehouse-in when file data, but examinesWorry to: because real time data exists with in history file data the data that some indexes are identical, butIts content difference, searching system can cover legacy data warehouse-in to the identical data of index. Therefore,Two data flow are put the out of order problem of data that has in storage simultaneously.
For addressing this problem, the present embodiment Input System is introduced the mechanism of timestamp, is retrievalData mark, and searching system determines whether warehouse-in by judging time order and function.
Particularly, after obtaining real time data, to described real time data mark timestamp,After obtaining history file data, to described history file data markers timestamp.
Follow-up, by the retrieval of the retrieve data of network packet real time data and described history file dataWhen data loading, according to the timestamp of mark, to the retrieve data of described network packet real time dataPut processing in storage with the retrieve data of described history file data.
As a kind of embodiment, according to the timestamp of mark, to described network packet real time dataRetrieve data and the retrieve data of described history file data put in storage and process and can adoptFollowing mode:
First, judge the inspection of retrieve data and the described history file data of network packet real time dataWhether rope data there is the identical but situation that content is different of search index; If exist, obtain respectivelyGet the timestamp of retrieve data and the inspection of history file data of described network packet real time dataThe timestamp of rope data.
Then, judge timestamp and the history file of the retrieve data of described network packet real time dataThe sequencing of the timestamp of the retrieve data of data.
Finally, the network packet real time data after timestamp is leaned on or the retrieval number of history file dataFor retrieval, can avoid thus real time data and history file data to put product in storage according to warehouse-in simultaneouslyThe raw out of order problem of data, has improved the accuracy that enters database data, realizes entering database dataPrecisely control.
It should be noted that, in the present embodiment, to the step of described real time data mark timestampSuddenly, can, after obtaining real time data, real time data be packaged as to network packet before completeBecome; Also can obtain real time data, after real time data is packaged as to network packet, willNetwork packet real time data completes before being converted into retrieve data form; Certainly, can also be by netAfter network bag real time data is converted into retrieve data form, by the retrieval number of network packet real time dataAccording to completing before warehouse-in.
In like manner, to the step of described history file data markers timestamp, history can obtainedAfter file data, the step that history file data is converted into retrieve data form is before completeBecome; Also can, obtaining history file data, history file data be converted into retrieve data latticeAfter the step of formula, before by the retrieve data warehouse-in of history file data, complete.
Compared to existing technology, the present embodiment scheme has following beneficial effect:
1, search delay can be shortened to a millisecond rank, after user data update, Neng GouliRetrieve, thereby improve recall precision.
2, Hoisting System warehouse-in performance greatly, for example, CPU monokaryon disposal ability is from beforeHundreds of unit data can rise to 10000 unit datas, has promoted tens times.
3, reduce the dependence of Input System for storage system, greatly improved stability, willDelay fluctuation is controlled at a millisecond rank. Save storage system simultaneously, reduce resource consumption.
4, the raising of stability has also reduced the cost of artificial operation.
5, by the technology of flag data timestamp, compatible real time data and history file dataIn time, puts in storage, avoids two class data to put the out of order problem of bringing in storage simultaneously.
6, can pass through route technology, the searching system data loading essence of online and off-line is providedAccurate control, avoids switching the data that cause and omits and repeat in a large number the storage resources waste causing.
Also it should be noted that, in this article, term " comprises ", " comprising " or it is anyOther variants are intended to contain comprising of nonexcludability, thereby make to comprise the mistake of a series of key elementsJourney, method, article or device not only comprise those key elements, but also do not comprise clearly rowOther key elements that go out, or be also included as this process, method, article or device is consolidatedSome key elements. In the situation that there is no more restrictions, limited by statement " comprising ... "Key element, and be not precluded within and comprise in the process, method, article of this key element or device and also depositingAt other identical element.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be well understood toThe mode that can add essential general hardware platform by software to above-described embodiment method realizes,Can certainly pass through hardware, but in a lot of situation, the former is better embodiment. Based on thisThe understanding of sample, the portion that technical scheme of the present invention contributes to prior art in essence in other wordsDivide and can embody with the form of software product, this computer software product is stored in one and depositsIn storage media (as ROM/RAM, magnetic disc, CD), comprise that some instructions are in order to make oneStation terminal equipment (can be mobile phone, computer, server, or the network equipment etc.) carry outMethod described in each embodiment of the present invention.
The foregoing is only the preferred embodiments of the present invention, not thereby limit patent of the present inventionScope, every equivalent structure or flow process change that utilizes description of the present invention and accompanying drawing content to doChange, or be directly or indirectly used in other relevant technical field, be all in like manner included in the present inventionScope of patent protection in.

Claims (12)

1. a data processing method, is characterized in that, comprising:
Obtain real time data, described real time data is packaged as to network packet, obtain network packet and count in real timeAccording to;
Network packet real time data is converted into retrieve data form, obtains the retrieval number of network packet real time dataAccording to;
The retrieve data of network packet real time data is put in storage for retrieval.
2. method according to claim 1, is characterized in that, described by the real time data of network packetBe converted into retrieve data form, the step that obtains the retrieve data of network packet real time data comprises:
The text data format conversion of described network packet real time data, for arranging form, is obtained to network packet realTime data retrieve data.
3. method according to claim 2, is characterized in that, described by network packet real time dataBefore the step of retrieve data warehouse-in for retrieval, also comprise:
Obtain history file data;
Described history file data are converted into retrieve data form, obtain the retrieval number of history file dataAccording to;
The described warehouse-in of the retrieve data by network packet real time data comprises for the step of retrieval:
By the retrieve data warehouse-in of the retrieve data of network packet real time data and described history file data forRetrieval.
4. method according to claim 3, is characterized in that,
After the described step of obtaining real time data, also comprise:
To described real time data mark timestamp;
After the described step of obtaining history file data, also comprise:
To described history file data markers timestamp;
The described warehouse-in of the retrieve data by the retrieve data of network packet real time data and described history file dataStep for retrieval comprises:
According to the timestamp of mark, the retrieve data to described network packet real time data and described history fileThe retrieve data of data is put in storage and is processed for retrieval.
5. method according to claim 4, is characterized in that, described according to the timestamp of mark,The retrieve data of the retrieve data to described network packet real time data and described history file data is put in storageProcess for the step of retrieval and comprise:
Judge that the retrieve data of described network packet real time data and the retrieve data of described history file data areThe no identical but situation that content is different of search index that exists; If so,
Obtain respectively the timestamp of retrieve data of described network packet real time data and history file dataThe timestamp of retrieve data;
Judge the timestamp of retrieve data and the retrieval number of history file data of described network packet real time dataAccording to the sequencing of timestamp;
The retrieve data of the network packet real time data after timestamp is leaned on or the retrieve data of history file dataWarehouse-in is for retrieval.
6. according to the method described in any one in claim 1-5, it is characterized in that, by corresponding retrievalThe step of data loading comprises:
Import corresponding retrieve data into searching system warehouse-in.
7. a data processing equipment, is characterized in that, comprising:
Acquisition module, for obtaining real time data, is packaged as network packet by described real time data,To network packet real time data;
Conversion module, for network packet real time data is converted into retrieve data form, obtains network packet realTime data retrieve data;
Enter library module, for the retrieve data of network packet real time data is put in storage for retrieval.
8. device according to claim 7, is characterized in that,
Described conversion module, also for by the text data format conversion of described network packet real time data for fallArrange form, obtain the retrieve data of network packet real time data.
9. device according to claim 8, is characterized in that,
Described acquisition module, also for obtaining history file data;
Described conversion module, also, for described history file data are converted into retrieve data form, obtainsThe retrieve data of history file data;
Described enter library module, also for by the retrieve data of network packet real time data and described history file numberAccording to retrieve data put in storage for retrieval.
10. device according to claim 9, is characterized in that, also comprises:
Mark module, for after described acquisition module obtains real time data, to described real time data markNote timestamp; And after described acquisition module obtains history file data, to described history file numberAccording to mark timestamp;
Described enter library module, also for according to the timestamp of mark, to the inspection of described network packet real time dataThe retrieve data of rope data and described history file data is put in storage and is processed for retrieval.
11. devices according to claim 10, is characterized in that,
Described enter library module, also for judging retrieve data and the described history of described network packet real time dataWhether the retrieve data of file data there is the identical but situation that content is different of search index; If so, divideDo not obtain the timestamp of retrieve data and the retrieval number of history file data of described network packet real time dataAccording to timestamp; Judge timestamp and the history file data of the retrieve data of described network packet real time dataThe sequencing of timestamp of retrieve data; The retrieval number of the network packet real time data after timestamp is leaned onAccording to or the retrieve data of history file data put in storage for retrieval.
12. according to the device described in any one in claim 7-11, it is characterized in that,
Described enter library module, also for corresponding retrieve data being imported into searching system warehouse-in.
CN201410577196.0A 2014-10-24 2014-10-24 Data processing method and device Active CN105589884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410577196.0A CN105589884B (en) 2014-10-24 2014-10-24 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410577196.0A CN105589884B (en) 2014-10-24 2014-10-24 Data processing method and device

Publications (2)

Publication Number Publication Date
CN105589884A true CN105589884A (en) 2016-05-18
CN105589884B CN105589884B (en) 2020-11-03

Family

ID=55929470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410577196.0A Active CN105589884B (en) 2014-10-24 2014-10-24 Data processing method and device

Country Status (1)

Country Link
CN (1) CN105589884B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909598A (en) * 2016-07-01 2017-06-30 阿里巴巴集团控股有限公司 It is a kind of to ensure processing method, the apparatus and system for calculating data consistency
CN108762679A (en) * 2018-05-30 2018-11-06 郑州云海信息技术有限公司 A kind of online DDP is the same as the offline DDP methods being combined and its relevant apparatus

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8291058B2 (en) * 2010-02-19 2012-10-16 Intrusion, Inc. High speed network data extractor
CN102831245A (en) * 2012-09-17 2012-12-19 洛阳翔霏机电科技有限责任公司 Real-time data storage and reading method of relational database
CN103139288A (en) * 2012-12-21 2013-06-05 中国飞行试验研究院 Embedded type onboard network data fast processing system
CN103281213A (en) * 2013-04-18 2013-09-04 西安交通大学 Method for extracting, analyzing and searching network flow and content
CN103577425A (en) * 2012-07-24 2014-02-12 中兴通讯股份有限公司 Data processing method and device
CN103605805A (en) * 2013-12-09 2014-02-26 冶金自动化研究设计院 Storage method of massive time series data
CN103778136A (en) * 2012-10-19 2014-05-07 阿里巴巴集团控股有限公司 Cross-room database synchronization method and system
CN103970903A (en) * 2014-05-27 2014-08-06 重庆大学 Large industrial system feedback data real-time processing method and system based on Web

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8291058B2 (en) * 2010-02-19 2012-10-16 Intrusion, Inc. High speed network data extractor
CN103577425A (en) * 2012-07-24 2014-02-12 中兴通讯股份有限公司 Data processing method and device
CN102831245A (en) * 2012-09-17 2012-12-19 洛阳翔霏机电科技有限责任公司 Real-time data storage and reading method of relational database
CN103778136A (en) * 2012-10-19 2014-05-07 阿里巴巴集团控股有限公司 Cross-room database synchronization method and system
CN103139288A (en) * 2012-12-21 2013-06-05 中国飞行试验研究院 Embedded type onboard network data fast processing system
CN103281213A (en) * 2013-04-18 2013-09-04 西安交通大学 Method for extracting, analyzing and searching network flow and content
CN103605805A (en) * 2013-12-09 2014-02-26 冶金自动化研究设计院 Storage method of massive time series data
CN103970903A (en) * 2014-05-27 2014-08-06 重庆大学 Large industrial system feedback data real-time processing method and system based on Web

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909598A (en) * 2016-07-01 2017-06-30 阿里巴巴集团控股有限公司 It is a kind of to ensure processing method, the apparatus and system for calculating data consistency
CN108762679A (en) * 2018-05-30 2018-11-06 郑州云海信息技术有限公司 A kind of online DDP is the same as the offline DDP methods being combined and its relevant apparatus
CN108762679B (en) * 2018-05-30 2021-06-29 郑州云海信息技术有限公司 Method for combining online DDP (distributed data processing) and offline DDP (distributed data processing) and related device thereof

Also Published As

Publication number Publication date
CN105589884B (en) 2020-11-03

Similar Documents

Publication Publication Date Title
US20130185320A1 (en) Display program, display apparatus, information processing method, recording medium, and information processing apparatus
US20120254309A1 (en) Information processing apparatus and method, electronic device and control method thereof, and log collection system
CN102752729A (en) Reminding method, terminal, cloud server and system
CN103488786A (en) Method and client terminal for providing information search
JP2005346734A (en) Method of providing content
CN103678531A (en) Friend recommendation method and friend recommendation device
CN103210386A (en) Method, system and apparatus for hybrid federated search
CN111241182A (en) Data processing method and apparatus, storage medium, and electronic apparatus
CN103023753A (en) Method, client-side and system for interactive content correlation output in instant messaging interaction
CN102904765A (en) Method and equipment for data reporting
CN105718533A (en) Information pushing method and device
CN104834646A (en) Webpage display method, client and system
CN102004788A (en) Method and system for intelligently positioning linkman of social networking services
CN103401933A (en) Method and system for batch uploading resource information and corresponding resource file
CN105589884A (en) Data processing method and data processing device
CN108959294B (en) Method and device for accessing search engine
WO2014029358A1 (en) Method and apparatus for switching search engine to repeat search
CN103064839A (en) Portable document format (Pdf) full-text on-line retrieval method
CN1971558A (en) An embedded browsing device and method
EP4224322A1 (en) Application testing method and apparatus, electronic device and storage medium
US20170034344A1 (en) Phone Number Canonicalization and Information Discoverability
CN105956013A (en) Method, device, and system for extracting website keyword
CN105808628A (en) Webpage transcoding method, apparatus and system
CN113742101B (en) Data intercommunication method, device, equipment and readable storage medium
US10701238B1 (en) Context-adaptive scanning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant