CN106649358A - Data acquisition method and apparatus - Google Patents
Data acquisition method and apparatus Download PDFInfo
- Publication number
- CN106649358A CN106649358A CN201510728970.8A CN201510728970A CN106649358A CN 106649358 A CN106649358 A CN 106649358A CN 201510728970 A CN201510728970 A CN 201510728970A CN 106649358 A CN106649358 A CN 106649358A
- Authority
- CN
- China
- Prior art keywords
- target
- data
- time
- platform
- identification information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data acquisition method and apparatus. The method comprises the steps of receiving a data acquisition request, wherein the data acquisition request is used for requesting to acquire to-be-acquired data generated in a target time segment; segmenting the target time segment according to a target preset rule to obtain a plurality of sub-time segments; and obtaining multiple groups of data generated in the sub-time segments in sequence, wherein the to-be-acquired data generated in a sub-time segment is a group of data. Through the method and the apparatus, the technical problem that historical data with a relatively large time span cannot be effectively acquired in related technologies is solved.
Description
Technical field
The application is related to internet arena, in particular to a kind of data capture method and device.
Background technology
In internet arena, it is often necessary to from the media platform synchronization history data of search engine.For example, in internet
Advertisement field, carries out the real time bid ranking of keyword, needs synchronously to report from the media platform of each big search engine daily
Table data.But, each media platform often sets to ask period and the size of request report of different type form
Different constraintss are determined.For example, certain search engine media platform require entity report request initial time and
No more than 1 year, the Report Time Span of search word was from initial time to end time for span between end time
Span was no more than one month etc..
But, under many circumstances, need going through from the disposable synchronization long-time span of media platform or big data quantity
History data.For example, all of history report data of request client are fully synchronized comes.At this moment, the when span of data
Degree may be very big, so as to the maximum time span allowed beyond media platform.Or, need synchronous form number
It is very big according to amount, beyond the synchronous amount of maximum that media platform is allowed.Above-mentioned two situations can cause media platform to be refused
The synchronization request of exhausted user.
In the related, because media platform (account, popularization plan, can be pushed away according to different platform and different entities
Wide unit, intention, keyword etc.) time span and data volume size to obtaining historical data limit respectively,
Therefore, user is often just for the different platform and different entities synchronous report data in its restriction range respectively.
For example, using the pattern that day is synchronous, i.e., the report data of the previous day of synchronous current time.When data synchronization request surpasses
When going out the restriction of platform and entity, it will usually with reference to different platform and the restriction parameter adjustment lock in time of entity.The party
Method can only often obtain the data in finite time section, for time span than it is larger in the case of, system cannot be automatically complete
Into, artificial participation is needed, namely on a time period by batch manual synchronization.When the data volume of request is excessive, meeting is asked
Go whistle.Also, because the synchronous data volume size of request is in media end dynamic change, thus cannot anticipation (i.e.
Make that according to historical data rough estimate can be carried out, but this process seems excessively coarse, accuracy is very poor).
For the problem that cannot effectively obtain the larger historical data of time span in correlation technique, not yet propose have at present
The solution of effect.
The content of the invention
The embodiment of the present application provides a kind of data capture method and device, so that at least solve cannot be effective in correlation technique
Obtain the technical problem of the larger historical data of time span.
According to the one side of the embodiment of the present application, there is provided a kind of data capture method, the method includes:Receive number
Ask according to obtaining, wherein, data acquisition request is used for the data to be obtained generated in acquisition request target time section;Press
Cutting is carried out to target time section according to target preset rules, multiple sub- time periods are obtained;And obtain successively in many height
The multi-group data generated in time period, wherein, the data to be obtained generated in a sub- time period are one group of data.
Further, target time section is the time period from first time point to the second time point, first time point earlier than
Second time point, cutting is carried out according to target preset rules to target time section, is obtained multiple sub- time periods and is included:With
Second time point is cutting starting point, with preset time period as cutting spacing, to target time section cutting is performed, and is obtained
Multiple sub- time periods.
Further, before cutting is carried out to target time section according to target preset rules, the method also includes:Really
Set the goal the identification information of platform, wherein, target platform is the platform for providing data to be obtained;And according to pre-
If the identification information of mapping relations and target platform obtains target preset rules, wherein, default mapping relations are to build in advance
Mapping relations between the identification information of vertical different platform preset rules corresponding with different platform, different platform correspondence
Preset rules include target preset rules, different platform include target platform.
Further, before target preset rules are obtained according to the identification information of default mapping relations and target platform,
The method also includes:The default restriction parameter of different platform is obtained respectively, obtains multiple restriction parameters;According to multiple
Limit parameter and obtain the corresponding preset rules of different platform respectively;And to set up the identification information of different platform flat from different
Mapping relations between the corresponding preset rules of platform, obtain default mapping relations.
Further, it is determined that the identification information of target platform includes:Whether the current identification information of detection target platform be
Default identification information, obtaining target preset rules according to the identification information of default mapping relations and target platform includes:Such as
Fruit detects that the current identification information of target platform is default identification information, then according to default mapping relations and default mark
Acquisition of information target preset rules.
Further, identification information is used for the default restriction parameter of unique mark platform, if detecting target platform
Current identification information be not default identification information, the method also includes:When judging according to target preset rules to target
Between section carry out whether cutting can get the multi-group data generated in multiple sub- time periods;If it is judged that be it is yes,
Then renewal is not performed to target preset rules;If it is judged that being no, then determine that target is put down according to current identification information
The current restriction parameter of platform;The corresponding current preset rule of parameter acquiring target platform is limited according to current;And will be pre-
If update of identification information is current identification information, and is updated to current preset rule by target preset rules, to set up
Mapping relations between the current identification information of target platform and current preset rule.
Further, before the multi-group data generated within multiple sub- time periods is obtained successively, the method also includes:
By multiple sub- time periods according to the sequencing of time, preserve successively to default queue, obtain successively in multiple sub- times
The multi-group data generated in section includes:Each the sub- time being successively read in the multiple sub- time period preserved in default queue
Section, often reads a sub- time period, then obtain the one group of data generated in the sub- time period.
Further, data to be obtained include the data to be obtained of various dimensions, obtain successively within multiple sub- time periods
The multi-group data of generation includes:Data to be obtained are classified according to dimension, multiclass data to be obtained are obtained;And
Every class data to be obtained in correspondence multiclass data to be obtained, obtain successively the multigroup number generated within multiple sub- time periods
According to.
According to the another aspect of the embodiment of the present application, a kind of data acquisition facility is additionally provided, the device includes:Receive
Unit, obtains for receiving data and asks, wherein, data acquisition request is used to be generated in acquisition request target time section
Data to be obtained;Cutting unit, for carrying out cutting to target time section according to target preset rules, obtains multiple
The sub- time period;And acquiring unit, for obtaining the multi-group data generated within multiple sub- time periods successively, wherein,
The data to be obtained generated in one sub- time period are one group of data.
Further, target time section is the time period from first time point to the second time point, first time point earlier than
Second time point, cutting unit includes:Cutting module, for the second time point as cutting starting point, with it is default when
Between section be cutting spacing, to target time section perform cutting, obtain multiple sub- time periods.
In the embodiment of the present application, by adopting following methods:Receiving data obtains request, wherein, data acquisition please
Seek the data to be obtained for generating in acquisition request target time section;Target time section is entered according to target preset rules
Row cutting, obtains multiple sub- time periods;And the multi-group data generated within multiple sub- time periods is obtained successively, wherein,
The data to be obtained generated in one sub- time period are one group of data, solve the time that cannot effectively obtain in correlation technique
The technical problem of the larger historical data of span, so as to by carrying out cutting to target time section according to target preset rules,
Multiple sub- time periods are obtained, and obtains the multi-group data generated within multiple sub- time periods successively, reached effective acquisition
The technique effect of the larger historical data of time span.
Description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen
Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In accompanying drawing
In:
Fig. 1 is the flow chart of the data capture method according to the embodiment of the present application;
Fig. 2 is the schematic diagram for obtaining the time period cutting of multi-dimensional data according to the embodiment of the present application;And
Fig. 3 is the schematic diagram of the data acquisition facility according to the embodiment of the present application.
Specific embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment
The only embodiment of the application part, rather than the embodiment of whole.Based on the embodiment in the application, ability
The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, all should belong to
The scope of the application protection.
It should be noted that the description and claims of this application and the term " first " in above-mentioned accompanying drawing, "
Two " it is etc. the object for distinguishing similar, without for describing specific order or precedence.It should be appreciated that this
The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except
Here the order beyond those for illustrating or describing is implemented.Additionally, term " comprising " and " having " and they
Any deformation, it is intended that covering is non-exclusive to be included, and for example, contains process, the side of series of steps or unit
Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear
List or other steps intrinsic for these processes, method, product or equipment or unit.
According to the embodiment of the present application, there is provided a kind of embodiment of the method for data capture method, it should be noted that
The step of flow process of accompanying drawing is illustrated can perform in the such as computer system of one group of computer executable instructions, and
And, although show logical order in flow charts, but in some cases, can be with different from order herein
Perform shown or described step.
Fig. 1 is the flow chart of the data capture method according to the embodiment of the present application, as shown in figure 1, the method include as
Lower step:
Step S102, receiving data obtains request, wherein, data acquisition request is used in acquisition request target time section
The data to be obtained for generating.
Data to be obtained may include the substantial amounts of data not generated in the same time, and for example, data to be obtained can be derived from
A large amount of historical datas of the media platform of search engine.For example, user need from the media platform of search engine obtain from
Current time plays (hypothesis current time is 2015.3.1), the entity report within two months, namely target time section
It is the time period from 2015.1.1 to 2015.3.1.Can be by target time section here according to the real needs of user
Starting, termination time point are set to different precision.For example, user needs to obtain and is accurate to the data in day, then can be with
Time point is accurate into day, such as target time section is from 2015.1.1 to 2015.5.1;It is little that user needs acquisition to be accurate to
When data, then time point can be accurate to hour, such as 2015.1.1,10:00 to 2015.1.3,20:00.This
Application starting, the specific restriction of levels of precision work of termination time point not to target time section.
Step S104, cutting is carried out according to target preset rules to target time section, obtains multiple sub- time periods.
Because the media platform of most search engine all can make to the time span of data to be obtained or data volume
Limit, for example, the initial time and the time span of end time of the request of certain media platform setting entity report can not
More than 1 year.Therefore, in order to avoid data to be obtained cause data acquisition request to suffer matchmaker because time span is too big
The refusal of body platform, can carry out cutting by target time section, obtain multiple time spans less sub- time period, from
And ensure successfully to obtain the data to be obtained generated in each sub- time period.Target preset rules are set in advance to mesh
The mark time period performs the segmentation rules followed during cutting.Platform of the target preset rules for offer data to be obtained
And set.Target time section is carried out after cutting according to target preset rules, the multiple sub- time period for obtaining should meet
Following condition:Time span threshold value of the length of each sub- time period less than or equal to platform setting.
For example, the time span threshold value for downloading entity report that the media platform of certain search engine sets was as 1 year.User
The target time section of the data that request is downloaded is from 2013.1.1 to 2015.6.1, then for the media of the search engine
The segmentation rules (target preset rules) of platform setting should be ensured that each being syncopated as sub- time period is respectively less than 1 year.
Such as, can be by 2013.1.1 to 2015.6.1 cuttings:First sub- time period:2013.1.1 to 2014.1.1;
Second sub- time period:2014.1.1 to 2015.1.1;And the 3rd sub- time period:2015.1.1 to 2015.6.1.
According to the multiple sub- time period that target preset rules cutting is obtained, the time span of each sub- time period need to be met not
More than the time span threshold value of media platform, wherein, the time span of arbitrary two sub- time periods can with equal,
Could be arranged to the time span of multiple sub- time periods.The application is not to the when span between multiple sub- time periods
Degree relation is specifically limited.
Step S106, obtains successively the multi-group data generated within multiple sub- time periods, wherein, in a sub- time period
The data to be obtained for generating are one group of data.
After target time section to be carried out cutting, that is, multiple sub- time periods are obtained, equivalent to data to be obtained are pressed
Multi-group data, every group of data one sub- time period of correspondence are split as according to the generation time.User is sending asking for acquisition data
After asking, the time span included during this is asked is split as multiple sub- time periods, corresponds to each sub- time period flat to media
Platform sends once asks, and the time span threshold value without departing from media platform is asked every time, therefore can accordingly get
The data generated in the sub- time period.
For example, user's request obtain data (namely data to be obtained) be from 2010.1.1 to 2012.4.1 when
Between entity report data in section.First, according to target preset rules by target time section 2010.1.1 to 2012.4.1
Cutting is the first sub- time period 2010.1.1 to 2012.1.1, and the second sub- time period was 2012.1.1 to 2012.4.1.
Next, sends request to media platform successively, and the first request is the entity form number for obtaining generation in the first sub- time period
According to request, second request for obtain the second sub- time period in generate entity report data request, according to first please
The request of summation second can respectively get one group of data, that is, realize the entity report for obtaining 2010.1.1 to 2012.4.1
The purpose of table data.
According to the data capture method of the embodiment, due to including:Receiving data obtains request, wherein, data acquisition
Ask the data to be obtained for generating in acquisition request target time section;According to target preset rules to target time section
Cutting is carried out, multiple sub- time periods are obtained;And the multi-group data generated within multiple sub- time periods is obtained successively, its
In, the data to be obtained generated in a sub- time period are one group of data, and solving in correlation technique effectively to obtain
The technical problem of the larger historical data of time span, so as to by carrying out to target time section according to target preset rules
Cutting, obtains multiple sub- time periods, and obtains the multi-group data generated within multiple sub- time periods successively, and having reached has
Effect obtains the technique effect of the larger historical data of time span.
Preferably, target time section is the time period from first time point to the second time point, and first time point is earlier than
Two time points, cutting is carried out according to target preset rules to target time section, is obtained multiple sub- time periods and is included:With
Two time points are cutting starting point, with preset time period as cutting spacing, to target time section cutting are performed, and obtain many
The individual sub- time period.
For example, data to be obtained are the visit capacity data in 2014.1.1 to current time (2015.1.1), it is assumed that matchmaker
The time span threshold value of body platform is 2 months, then can set preset time period as 2 months.With 2015.1.1 to cut
Divide starting point, with 2 months as cutting spacing, cutting is performed to target time section, obtain multiple sub- time periods as follows:
2015.1.1 to 2014.11.1.1;2014.11.1.1 to 2014.9.1;2014.9.1 to 2014.7.1;2014.7.1
To 2014.5.1;2014.5.1 to 2014.3.1;And 2014.3.1 to 2014.1.1.
Further, since for target time section is possibly for preset time period, it is impossible to round just, namely Preset Time
The time span of section is not the integral multiple of the time span of preset time period, at this moment in cutting, can be obtained after last cutting
To time span less than preset time period the sub- time period.Because the time span of the sub- time period is certainly less than pre-
If the time span of time period, therefore, will not go whistle to the data in the media platform acquisition request sub- time period.
For example, data to be obtained are the visit capacity data in 2014.2.1 to current time (2015.1.1), it is assumed that matchmaker
The time span threshold value of body platform is 2 months, then can set preset time period as 2 months.With 2015.1.1 to cut
Divide starting point, with 2 months as cutting spacing, cutting is performed to target time section, obtain multiple sub- time periods as follows:
2015.1.1 to 2014.11.1.1;2014.11.1.1 to 2014.9.1;2014.9.1 to 2014.7.1;2014.7.1
To 2014.5.1;2014.5.1 to 2014.3.1;And 2014.3.1 to 2014.2.1.It can be seen that, it is therein last
One sub- time period from 2014.3.1 to 2014.2.1, less than preset time period 2 months (can from media platform into
Work(is obtained).
Preferably, before cutting is carried out to target time section according to target preset rules, the method also includes:It is determined that
The identification information of target platform, wherein, target platform is the platform for providing data to be obtained;And according to default
The identification information of mapping relations and target platform obtains target preset rules, wherein, default mapping relations are to pre-build
Different platform identification information preset rules corresponding with different platform between mapping relations, different platform is corresponding
Preset rules include target preset rules, and different platform includes target platform.
The media platform being typically different obtains time span, data volume of data etc. to user and can arrange different requirements.
Such as, media platform A required the initial time and the span between the end time of entity report request no more than 1 year,
Media platform B requires the initial time and the span between the end time of entity report request no more than half a year.It is then right
Need to arrange different preset rules in different media platforms.In database, different platforms can be prestored
Identification information and preset rules between mapping relations.It is determined that after target platform, searching in database and being somebody's turn to do
The identification information of target platform sets up the preset rules (target preset rules) for having mapping relations, with default according to the target
Rule carries out cutting to target time section.The embodiment, can improve time period cutting and data acquisition performs effect
Rate.
It should be noted that the determination of preset rules can be through repeatedly testing repeatedly so that system process performance is most
A good rule.The determination of the preset rules is related to the species of specific networking operational environment and industry.True
After a fixed preset rules, the reasonability for verifying the preset rules is needed (to judge whether it ensure that successfully to obtain
Data to be obtained).After verifying that its is reasonable, corresponding to the preset rules of each platform can bind the identification information of the platform,
If identification information does not change, verifying logic will not be again triggered, in order to avoid reduce the efficiency of data acquisition.
Preferably, before target preset rules are obtained according to the identification information of default mapping relations and target platform, should
Method also includes:The default restriction parameter of different platform is obtained respectively, obtains multiple restriction parameters;According to multiple limits
Parameter processed obtains respectively the corresponding preset rules of different platform;And set up the identification information and different platform of different platform
Mapping relations between corresponding preset rules, obtain default mapping relations.
In this embodiment it is possible to set preset rules according to the default restriction parameter of different platform.Due to limiting
What parameter itself was limited is the maximum time span or maximum amount of data of the data that user obtains every time, therefore, preset
The foundation of rule may be referred to limit parameter to be set, to avoid causing request to meet with beyond the restriction for limiting parameter
To the refusal of media platform.For example, platform A presets the time span of the data that user takes every time and must not exceed 1
Individual month.Then to should platform A setting preset rules when, preset time period must not exceed 1 month.Getting not
After with the corresponding preset rules of platform, reflecting between the identification information of different platform and different preset rules can be set up
Penetrate relation (default mapping relations).Default mapping relations can be stored into database, need to obtain particular platform
During corresponding preset rules, directly make a look up in database according to default mapping relations.
Optionally it is determined that the identification information of target platform includes:Whether the current identification information of detection target platform is pre-
If identification information, obtaining target preset rules according to the identification information of default mapping relations and target platform includes:If
The current identification information for detecting target platform is default identification information, then according to default mapping relations and default mark letter
Breath obtains target preset rules.
For a media platform, typically according to the identification information of the media platform, in database search with
Its foundation has the preset rules of default mapping relations.That is, the identification information for only guaranteeing the media platform is stored in
During default identification information in data, corresponding target preset rules could be accurately found in database.It is no
Then, it is likely that the target preset rules for getting are not particularly suited for the media platform (because the identification information of media platform
It may happen that change, and the change of identification information can represent the change for limiting parameter).Therefore, if detected
The current identification information of target platform is default identification information, then may insure to get corresponding with the default identification information
Target preset rules apply to current media platform, and then can be according to the target preset rules to target time section
Carry out cutting.
Alternatively, identification information is used for the default restriction parameter of unique mark platform, if detecting target platform
Current identification information is not default identification information, and the method also includes:Judge according to target preset rules to the object time
Whether Duan Jinhang cuttings can get the multi-group data generated in multiple sub- time periods;If it is judged that be it is yes, then
Renewal is not performed to target preset rules;If it is judged that being no, then target platform is determined according to current identification information
Current restriction parameter;The corresponding current preset rule of parameter acquiring target platform is limited according to current;And will be default
Update of identification information is current identification information, and target preset rules are updated into current preset rule, to set up mesh
Mapping relations between the current identification information and current preset rule of mark platform.
Identification information is used for the default restriction parameter of unique mark platform, wherein, restriction parameter is different platform to not
With the restriction parameter (such as entity report data, click volume report data) of the report request of dimension.Version identifier can be with
As the voucher whether the restriction parameter of each platform changes.For example:Certain media platform, call format is csv
Form;Separator is comma;Capacity requirement is less than 100,000;The time span of entity data reporting must not exceed 1
Year;Search word report provides only data of generation etc. in 30 days before current time.
For a media platform, its restriction parameter for arranging is possible to adjust.For example, ginseng will be limited
Time span threshold value was adjusted to 0.5 year by 1 year in number.Because identification information is used for the default limit of unique mark platform
Parameter processed, when restriction parameter changes, identification information also can change.Therefore, if detecting that target is put down
The current identification information of platform is not default identification information, then need to verify target preset rules, to judge the mesh
Whether mark preset rules stand good in the media platform for being provided with current restriction parameter.
Cutting can be carried out to target time section according further still to target preset rules, judge whether that many height can be got
The multi-group data generated in time period.When the restriction parameter of media platform changes, different situations of change may
Different impacts are produced, namely after may changing parameter is limited, target preset rules stand good in current limit
The media platform of parameter processed, it is also possible to which after restriction parameter changes, target preset rules are no longer desirable for currently
Limit the media platform of parameter.Time span threshold value is likely to from large to small, then target preset rules and is no longer suitable for, and
Time span threshold value is changed from small to big, then target preset rules very likely stand good.For example, in target preset rules
It it is 1 year to the cutting spacing of target time section, initially default time span threshold value (is corresponded to and is stored in database
User totem information) be 2 years.If time span threshold value was changed into 0.5 year from 2 years, target preset rules
No longer it is suitable for;If time span threshold value be changed into 2.5 years from 2 years, target preset rules stand good.
If it is judged that target preset rules are no longer desirable for conditions present, then determine that target is put down according to current identification information
The current restriction parameter of platform, according to current the corresponding current preset rule of parameter acquiring target platform is limited, and will be default
Update of identification information is current identification information, target preset rules is updated to current preset rule, put down with setting up target
Mapping relations between the current identification information of platform and current preset rule.The embodiment, can improve the default rule of acquisition
Accuracy then, and then the accuracy of dicing process and data acquisition is effectively ensured.
Wherein, because the identification information of above-mentioned target platform may have occurred the current identification letter of change, i.e. target platform
The default identification information of breath and the target platform for storing may be inconsistent, then determines whether same target platform
Process can be completed by manual identified.The accuracy of data capture method in order to improve the application, can be with every Jing
Whether cross preset time period (such as month) to the current identification information of target platform is that default identification information is detected.
Preferably, before the multi-group data generated within multiple sub- time periods is obtained successively, the method also includes:Will
Multiple sub- time periods, according to the sequencing of time, are preserved successively to default queue, are obtained successively in multiple sub- time periods
The multi-group data of interior generation includes:Each the sub- time period being successively read in the multiple sub- time period preserved in default queue,
A sub- time period is often read, then obtains the one group of data generated in the sub- time period.
In this embodiment, multiple sub- time periods are preserved successively to default queue according to the sequencing of time, is being read
When taking can the first read access time first sub- time period, and obtain the data generated in first sub- time period time.The reality
Apply example, it is ensured that the order of the data of acquisition, be conducive to follow-up statistics to data, analysis and manage.
Alternatively, data to be obtained include the data to be obtained of various dimensions, obtain raw within multiple sub- time periods successively
Into multi-group data include:Data to be obtained are classified according to dimension, multiclass data to be obtained are obtained;And it is right
The every class data to be obtained in multiclass data to be obtained are answered, the multi-group data generated within multiple sub- time periods is obtained successively.
The data of various dimensions may be included in data to be obtained, for example, data to be obtained include entity form number
According to, search word report data and click volume report data.For the ease of number of the later stage to different dimensions in data to be obtained
According to unified statistics, analysis etc. is carried out, the data of various dimensions in data to be obtained can be adopted same preset rules.
It should be noted that due to for a media platform, may be different to the restriction parameter of the data of different dimensions.
For example, the time span threshold value of entity report data is 1 year, and the time span threshold value of search word report data is 2
Month.Therefore, during the preset rules for following when the data setting for different dimensions is obtained, to meet each dimension
Data are defined.For example, in the above example, preset time period should be at least below 2 months (search word report datas
Time span threshold value).
Fig. 2 is the schematic diagram for obtaining the time period cutting of multi-dimensional data according to the embodiment of the present application.Such as Fig. 2
Shown, data to be obtained include the data of three kinds of dimensions:Entity report data, search word report data and click volume
Report data.By the way of end time alignment, cutting is carried out to target time section, obtain multiple sub- time period (P1
To P8), and concrete starting and the deadline of each sub- time period are calculated respectively.It can be seen that per height
Comprising a kind of (such as P6, P7 and P8), two kinds (such as P5) or three kinds (such as P1, P2, P3 and P4) in time period
The data of dimension.Because report data is related to calculate and counts, it is therefore necessary to the initial time earliest sub- time period
It is gradually synchronous, it is therefore desirable to which that a queue multiple sub- time period of the storage with sequencing is set, storage order be P8,
P7、P6、P5、P4、P3、P2、P1.When data are obtained, the data for obtaining every kind of dimension successively are started from P8.
The embodiment, supports to obtain the big historical data of time span (in theory to historical data time span not about
Beam), Operating Complexity and error-prone property that manual handle is brought can be reduced;And support laterally (time) and longitudinal direction
(dimension) cutting, ensure that the number of dimensions of the data included in each sub- time period is as more as possible, so as to improve
Data acquisition efficiency.
Below according to the embodiment of the present application, a kind of data acquisition facility is additionally provided.
It should be noted that can be used for performing according to the data acquisition facility of the embodiment of the present application being implemented according to the application
The data capture method of example, can also be by according to the embodiment of the present application according to the data capture method of the embodiment of the present application
Data acquisition facility performing.
Fig. 3 is the schematic diagram of the data acquisition facility according to the embodiment of the present application.As shown in figure 3, the device includes:
Receiving unit 20, cutting unit 40 and acquiring unit 60.
Receiving unit 20, obtains for receiving data and asks, wherein, when data acquisition request is used for acquisition request target
Between the data to be obtained that generate in section.
Cutting unit 40, for carrying out cutting to target time section according to target preset rules, obtains multiple sub- time periods.
Acquiring unit 60, for obtaining the multi-group data generated within multiple sub- time periods successively, wherein, a period of the day from 11 p.m. to 1 a.m
Between the data to be obtained that generate in section be one group of data.
According to the data acquisition facility of the embodiment, due to including:The receiving data of receiving unit 20 obtains request, wherein,
Data acquisition request is used for the data to be obtained generated in acquisition request target time section;Cutting unit 40 is pre- according to target
If rule carries out cutting to target time section, multiple sub- time periods are obtained;And acquiring unit 60 is obtained successively multiple
The multi-group data generated in the sub- time period, wherein, the data to be obtained generated in a sub- time period are one group of data,
Solving cannot effectively obtain the technical problem of the larger historical data of time span in correlation technique, so as to pass through cutting
Unit 40 carries out cutting according to target preset rules to target time section, obtains multiple sub- time periods, acquiring unit 60
The multi-group data generated within multiple sub- time periods is obtained successively, has reached the larger history number of effective acquisition time span
According to technique effect.
Preferably, target time section is the time period from first time point to the second time point, and first time point is earlier than
Two time points, cutting unit 40 includes:Cutting module, for the second time point as cutting starting point, with it is default when
Between section be cutting spacing, to target time section perform cutting, obtain multiple sub- time periods.
Above-mentioned the embodiment of the present application sequence number is for illustration only, does not represent the quality of embodiment.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment
The part of detailed description, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, other can be passed through
Mode realize.Wherein, device embodiment described above is only schematic, such as division of described unit,
Can be a kind of division of logic function, can there is other dividing mode, such as multiple units or component when actually realizing
Can with reference to or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, institute
The coupling each other for showing or discussing or direct-coupling or communication connection can be by some interfaces, unit or mould
The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The unit as separating component explanation can be or may not be it is physically separate, it is aobvious as unit
The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to
On multiple units.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme
Purpose.
In addition, each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.It is above-mentioned integrated
Unit both can be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is realized and as independent production marketing or when using using in the form of SFU software functional unit,
During a computer read/write memory medium can be stored in.Based on such understanding, the technical scheme essence of the application
On all or part of prior art is contributed part in other words or the technical scheme can be with software product
Form is embodied, and the computer software product is stored in a storage medium, including some instructions are used so that one
Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application
State all or part of step of method.And aforesaid storage medium includes:USB flash disk, read-only storage (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD
Etc. it is various can be with the medium of store program codes.
The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art
For member, on the premise of without departing from the application principle, some improvements and modifications can also be made, these improve and moisten
Decorations also should be regarded as the protection domain of the application.
Claims (10)
1. a kind of data capture method, it is characterised in that include:
Receiving data obtains request, wherein, the data acquisition request is used for raw in acquisition request target time section
Into data to be obtained;
Cutting is carried out to target time section according to target preset rules, multiple sub- time periods are obtained;And
The multi-group data generated within the plurality of sub- time period is obtained successively, wherein, it is raw in a sub- time period
Into data to be obtained be one group of data.
2. method according to claim 1, it is characterised in that the target time section is from first time point to
The time period of two time points, the first time point earlier than second time point, according to target preset rules pair
Target time section carries out cutting, obtains multiple sub- time periods and includes:
With second time point as cutting starting point, with preset time period as cutting spacing, during to the target
Between section perform cutting, obtain the plurality of sub- time period.
3. method according to claim 1, it is characterised in that target time section is being entered according to target preset rules
Before row cutting, methods described also includes:
Determine the identification information of target platform, wherein, the target platform is for providing the data to be obtained
Platform;And
The target preset rules are obtained according to the identification information of default mapping relations and the target platform, wherein,
The default mapping relations are the identification information preset rules corresponding with different platform of the different platform for pre-building
Between mapping relations, the corresponding preset rules of the different platform include the target preset rules, it is described not
Include the target platform with platform.
4. method according to claim 3, it is characterised in that according to default mapping relations and the target platform
Identification information obtain before the target preset rules, methods described also includes:
The default restriction parameter of the different platform is obtained respectively, obtains multiple restriction parameters;
The corresponding preset rules of the different platform are obtained respectively according to the plurality of restriction parameter;And
The mapping set up between the identification information preset rules corresponding with the different platform of the different platform is closed
System, obtains the default mapping relations.
5. method according to claim 3, it is characterised in that
Determining the identification information of target platform includes:Whether the current identification information for detecting the target platform is pre-
If identification information,
Obtaining the target preset rules according to the identification information of default mapping relations and the target platform includes:
If the current identification information for detecting the target platform is the default identification information, according to described default
Mapping relations and the default identification information obtain the target preset rules.
6. method according to claim 5, it is characterised in that it is pre- that the identification information is used for unique mark platform institute
If restriction parameter, if the current identification information for detecting the target platform is not the default identification information,
Methods described also includes:
Judge to be carried out the target time section described in whether cutting can get according to the target preset rules
The multi-group data generated in multiple sub- time periods;
If it is judged that being yes, then renewal is not performed to the target preset rules;
If it is judged that being no, then the current restriction of the target platform is determined according to the current identification information
Parameter;
The corresponding current preset rule of target platform according to the current restriction parameter acquiring;And
It is the current identification information by the default update of identification information, and by the target preset rules more
It is newly current preset rule, is advised with the current preset with the current identification information for setting up the target platform
Mapping relations between then.
7. method according to claim 1, it is characterised in that
Before the multi-group data generated within the plurality of sub- time period is obtained successively, methods described also includes:
By the plurality of sub- time period according to the sequencing of time, preserve successively to default queue,
Obtaining the multi-group data generated within the plurality of sub- time period successively includes:It is successively read the default team
Each sub- time period in the plurality of sub- time period preserved in row, a sub- time period is often read,
Then obtain the one group of data generated in the sub- time period.
8. method according to claim 1, it is characterised in that the data to be obtained include various dimensions wait obtain
Fetch data, the multi-group data generated within the plurality of sub- time period is obtained successively to be included:
The data to be obtained are classified according to dimension, multiclass data to be obtained are obtained;And
Every class data to be obtained in correspondence multiclass data to be obtained, obtained successively in the plurality of sub- time
The multi-group data generated in section.
9. a kind of data acquisition facility, it is characterised in that include:
Receiving unit, obtains for receiving data and asks, wherein, the data acquisition request is used for acquisition request
The data to be obtained generated in target time section;
Cutting unit, for carrying out cutting to target time section according to target preset rules, obtains multiple sub- times
Section;And
Acquiring unit, for obtaining the multi-group data generated within the plurality of sub- time period successively, wherein, one
The data to be obtained generated in the individual sub- time period are one group of data.
10. device according to claim 9, it is characterised in that the target time section is from first time point to
The time period of two time points, the first time point includes earlier than second time point, the cutting unit:
Cutting module, for second time point as cutting starting point, with preset time period as cutting spacing,
Cutting is performed to the target time section, the plurality of sub- time period is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510728970.8A CN106649358A (en) | 2015-10-30 | 2015-10-30 | Data acquisition method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510728970.8A CN106649358A (en) | 2015-10-30 | 2015-10-30 | Data acquisition method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106649358A true CN106649358A (en) | 2017-05-10 |
Family
ID=58809258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510728970.8A Pending CN106649358A (en) | 2015-10-30 | 2015-10-30 | Data acquisition method and apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649358A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019037379A1 (en) * | 2017-08-25 | 2019-02-28 | 北京汽车集团有限公司 | Method and apparatus for outputting composite material failure mode information |
CN109582543A (en) * | 2017-09-28 | 2019-04-05 | 北京国双科技有限公司 | Data retrogressive method and device |
CN110704507A (en) * | 2019-09-27 | 2020-01-17 | 京东城市(北京)数字科技有限公司 | Method and device for storing data and method and device for querying data |
CN111830913A (en) * | 2019-04-22 | 2020-10-27 | 北京国电智深控制技术有限公司 | Data acquisition method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1687950A (en) * | 2005-05-31 | 2005-10-26 | 威盛电子股份有限公司 | On line reserved processing system and method |
CN101785216A (en) * | 2007-08-20 | 2010-07-21 | 三星电子株式会社 | System and method for multiple contention access periods |
US20140006401A1 (en) * | 2012-06-30 | 2014-01-02 | Microsoft Corporation | Classification of data in main memory database systems |
CN103593453A (en) * | 2013-11-20 | 2014-02-19 | 北京国双科技有限公司 | Method and device for calculating user retention ratio |
CN104239557A (en) * | 2014-09-25 | 2014-12-24 | 北京国双科技有限公司 | Method and device for monitoring promoted accounts |
CN104834660A (en) * | 2014-02-12 | 2015-08-12 | Sap欧洲公司 | Interval based fuzzy database search |
-
2015
- 2015-10-30 CN CN201510728970.8A patent/CN106649358A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1687950A (en) * | 2005-05-31 | 2005-10-26 | 威盛电子股份有限公司 | On line reserved processing system and method |
CN101785216A (en) * | 2007-08-20 | 2010-07-21 | 三星电子株式会社 | System and method for multiple contention access periods |
US20140006401A1 (en) * | 2012-06-30 | 2014-01-02 | Microsoft Corporation | Classification of data in main memory database systems |
CN103593453A (en) * | 2013-11-20 | 2014-02-19 | 北京国双科技有限公司 | Method and device for calculating user retention ratio |
CN104834660A (en) * | 2014-02-12 | 2015-08-12 | Sap欧洲公司 | Interval based fuzzy database search |
CN104239557A (en) * | 2014-09-25 | 2014-12-24 | 北京国双科技有限公司 | Method and device for monitoring promoted accounts |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019037379A1 (en) * | 2017-08-25 | 2019-02-28 | 北京汽车集团有限公司 | Method and apparatus for outputting composite material failure mode information |
CN109582543A (en) * | 2017-09-28 | 2019-04-05 | 北京国双科技有限公司 | Data retrogressive method and device |
CN111830913A (en) * | 2019-04-22 | 2020-10-27 | 北京国电智深控制技术有限公司 | Data acquisition method and device |
CN110704507A (en) * | 2019-09-27 | 2020-01-17 | 京东城市(北京)数字科技有限公司 | Method and device for storing data and method and device for querying data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649358A (en) | Data acquisition method and apparatus | |
CN104504077B (en) | The statistical method and device of web page access data | |
CN105243169B (en) | A kind of data query method and system | |
EP3324352A1 (en) | Testing system | |
CN112382362B (en) | Data analysis method and device for target drugs | |
CN103577660B (en) | Gray scale experiment system and method | |
CN109461053B (en) | Dynamic distribution method of multiple recommendation channels, electronic device and storage medium | |
CN104484558A (en) | Method and system for automatically generating analysis reports of biological information projects | |
CN108022123B (en) | Automatic adjustment method and device for business model | |
CN110209714A (en) | Report form generation method, device, computer equipment and computer readable storage medium | |
CN113242159A (en) | Application access relation determining method and device | |
CN108197207A (en) | Batch data matches introduction method | |
CN111241217A (en) | Data processing method, device and system | |
CN108243046B (en) | Service quality assessment method and device based on data audit | |
CN112817832B (en) | Method, device and equipment for acquiring health state of game server and storage medium | |
CN106780062A (en) | Based on groups of users update method and system that social networks and big data are analyzed | |
CN110781340A (en) | Offline evaluation method, system and device for recall strategy of recommendation system and storage medium | |
CN103812912B (en) | A kind of method and device of maintenance organization structural information | |
CN104917812A (en) | Service node selection method applied to group intelligence calculation | |
CN116090349A (en) | Optical film production process optimization method, equipment and storage medium | |
CN115905373A (en) | Data query and analysis method, device, equipment and storage medium | |
CN106708873A (en) | Data integration method data integration device | |
CN112269879B (en) | Method and equipment for analyzing middle station log based on k-means algorithm | |
CN108287909A (en) | A kind of paper method for pushing and device | |
CN113312902A (en) | Intelligent auditing and checking method and device for same text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: Beijing Guoshuang Technology Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: Beijing Guoshuang Technology Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170510 |