CN106649358A - Data acquisition method and apparatus - Google Patents

Data acquisition method and apparatus Download PDF

Info

Publication number
CN106649358A
CN106649358A CN201510728970.8A CN201510728970A CN106649358A CN 106649358 A CN106649358 A CN 106649358A CN 201510728970 A CN201510728970 A CN 201510728970A CN 106649358 A CN106649358 A CN 106649358A
Authority
CN
China
Prior art keywords
target
data
time
platform
identification information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510728970.8A
Other languages
Chinese (zh)
Inventor
商平锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510728970.8A priority Critical patent/CN106649358A/en
Publication of CN106649358A publication Critical patent/CN106649358A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data acquisition method and apparatus. The method comprises the steps of receiving a data acquisition request, wherein the data acquisition request is used for requesting to acquire to-be-acquired data generated in a target time segment; segmenting the target time segment according to a target preset rule to obtain a plurality of sub-time segments; and obtaining multiple groups of data generated in the sub-time segments in sequence, wherein the to-be-acquired data generated in a sub-time segment is a group of data. Through the method and the apparatus, the technical problem that historical data with a relatively large time span cannot be effectively acquired in related technologies is solved.

Description

Data capture method and device
Technical field
The application is related to internet arena, in particular to a kind of data capture method and device.
Background technology
In internet arena, it is often necessary to from the media platform synchronization history data of search engine.For example, in internet Advertisement field, carries out the real time bid ranking of keyword, needs synchronously to report from the media platform of each big search engine daily Table data.But, each media platform often sets to ask period and the size of request report of different type form Different constraintss are determined.For example, certain search engine media platform require entity report request initial time and No more than 1 year, the Report Time Span of search word was from initial time to end time for span between end time Span was no more than one month etc..
But, under many circumstances, need going through from the disposable synchronization long-time span of media platform or big data quantity History data.For example, all of history report data of request client are fully synchronized comes.At this moment, the when span of data Degree may be very big, so as to the maximum time span allowed beyond media platform.Or, need synchronous form number It is very big according to amount, beyond the synchronous amount of maximum that media platform is allowed.Above-mentioned two situations can cause media platform to be refused The synchronization request of exhausted user.
In the related, because media platform (account, popularization plan, can be pushed away according to different platform and different entities Wide unit, intention, keyword etc.) time span and data volume size to obtaining historical data limit respectively, Therefore, user is often just for the different platform and different entities synchronous report data in its restriction range respectively. For example, using the pattern that day is synchronous, i.e., the report data of the previous day of synchronous current time.When data synchronization request surpasses When going out the restriction of platform and entity, it will usually with reference to different platform and the restriction parameter adjustment lock in time of entity.The party Method can only often obtain the data in finite time section, for time span than it is larger in the case of, system cannot be automatically complete Into, artificial participation is needed, namely on a time period by batch manual synchronization.When the data volume of request is excessive, meeting is asked Go whistle.Also, because the synchronous data volume size of request is in media end dynamic change, thus cannot anticipation (i.e. Make that according to historical data rough estimate can be carried out, but this process seems excessively coarse, accuracy is very poor).
For the problem that cannot effectively obtain the larger historical data of time span in correlation technique, not yet propose have at present The solution of effect.
The content of the invention
The embodiment of the present application provides a kind of data capture method and device, so that at least solve cannot be effective in correlation technique Obtain the technical problem of the larger historical data of time span.
According to the one side of the embodiment of the present application, there is provided a kind of data capture method, the method includes:Receive number Ask according to obtaining, wherein, data acquisition request is used for the data to be obtained generated in acquisition request target time section;Press Cutting is carried out to target time section according to target preset rules, multiple sub- time periods are obtained;And obtain successively in many height The multi-group data generated in time period, wherein, the data to be obtained generated in a sub- time period are one group of data.
Further, target time section is the time period from first time point to the second time point, first time point earlier than Second time point, cutting is carried out according to target preset rules to target time section, is obtained multiple sub- time periods and is included:With Second time point is cutting starting point, with preset time period as cutting spacing, to target time section cutting is performed, and is obtained Multiple sub- time periods.
Further, before cutting is carried out to target time section according to target preset rules, the method also includes:Really Set the goal the identification information of platform, wherein, target platform is the platform for providing data to be obtained;And according to pre- If the identification information of mapping relations and target platform obtains target preset rules, wherein, default mapping relations are to build in advance Mapping relations between the identification information of vertical different platform preset rules corresponding with different platform, different platform correspondence Preset rules include target preset rules, different platform include target platform.
Further, before target preset rules are obtained according to the identification information of default mapping relations and target platform, The method also includes:The default restriction parameter of different platform is obtained respectively, obtains multiple restriction parameters;According to multiple Limit parameter and obtain the corresponding preset rules of different platform respectively;And to set up the identification information of different platform flat from different Mapping relations between the corresponding preset rules of platform, obtain default mapping relations.
Further, it is determined that the identification information of target platform includes:Whether the current identification information of detection target platform be Default identification information, obtaining target preset rules according to the identification information of default mapping relations and target platform includes:Such as Fruit detects that the current identification information of target platform is default identification information, then according to default mapping relations and default mark Acquisition of information target preset rules.
Further, identification information is used for the default restriction parameter of unique mark platform, if detecting target platform Current identification information be not default identification information, the method also includes:When judging according to target preset rules to target Between section carry out whether cutting can get the multi-group data generated in multiple sub- time periods;If it is judged that be it is yes, Then renewal is not performed to target preset rules;If it is judged that being no, then determine that target is put down according to current identification information The current restriction parameter of platform;The corresponding current preset rule of parameter acquiring target platform is limited according to current;And will be pre- If update of identification information is current identification information, and is updated to current preset rule by target preset rules, to set up Mapping relations between the current identification information of target platform and current preset rule.
Further, before the multi-group data generated within multiple sub- time periods is obtained successively, the method also includes: By multiple sub- time periods according to the sequencing of time, preserve successively to default queue, obtain successively in multiple sub- times The multi-group data generated in section includes:Each the sub- time being successively read in the multiple sub- time period preserved in default queue Section, often reads a sub- time period, then obtain the one group of data generated in the sub- time period.
Further, data to be obtained include the data to be obtained of various dimensions, obtain successively within multiple sub- time periods The multi-group data of generation includes:Data to be obtained are classified according to dimension, multiclass data to be obtained are obtained;And Every class data to be obtained in correspondence multiclass data to be obtained, obtain successively the multigroup number generated within multiple sub- time periods According to.
According to the another aspect of the embodiment of the present application, a kind of data acquisition facility is additionally provided, the device includes:Receive Unit, obtains for receiving data and asks, wherein, data acquisition request is used to be generated in acquisition request target time section Data to be obtained;Cutting unit, for carrying out cutting to target time section according to target preset rules, obtains multiple The sub- time period;And acquiring unit, for obtaining the multi-group data generated within multiple sub- time periods successively, wherein, The data to be obtained generated in one sub- time period are one group of data.
Further, target time section is the time period from first time point to the second time point, first time point earlier than Second time point, cutting unit includes:Cutting module, for the second time point as cutting starting point, with it is default when Between section be cutting spacing, to target time section perform cutting, obtain multiple sub- time periods.
In the embodiment of the present application, by adopting following methods:Receiving data obtains request, wherein, data acquisition please Seek the data to be obtained for generating in acquisition request target time section;Target time section is entered according to target preset rules Row cutting, obtains multiple sub- time periods;And the multi-group data generated within multiple sub- time periods is obtained successively, wherein, The data to be obtained generated in one sub- time period are one group of data, solve the time that cannot effectively obtain in correlation technique The technical problem of the larger historical data of span, so as to by carrying out cutting to target time section according to target preset rules, Multiple sub- time periods are obtained, and obtains the multi-group data generated within multiple sub- time periods successively, reached effective acquisition The technique effect of the larger historical data of time span.
Description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In accompanying drawing In:
Fig. 1 is the flow chart of the data capture method according to the embodiment of the present application;
Fig. 2 is the schematic diagram for obtaining the time period cutting of multi-dimensional data according to the embodiment of the present application;And
Fig. 3 is the schematic diagram of the data acquisition facility according to the embodiment of the present application.
Specific embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment The only embodiment of the application part, rather than the embodiment of whole.Based on the embodiment in the application, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, all should belong to The scope of the application protection.
It should be noted that the description and claims of this application and the term " first " in above-mentioned accompanying drawing, " Two " it is etc. the object for distinguishing similar, without for describing specific order or precedence.It should be appreciated that this The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except Here the order beyond those for illustrating or describing is implemented.Additionally, term " comprising " and " having " and they Any deformation, it is intended that covering is non-exclusive to be included, and for example, contains process, the side of series of steps or unit Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear List or other steps intrinsic for these processes, method, product or equipment or unit.
According to the embodiment of the present application, there is provided a kind of embodiment of the method for data capture method, it should be noted that The step of flow process of accompanying drawing is illustrated can perform in the such as computer system of one group of computer executable instructions, and And, although show logical order in flow charts, but in some cases, can be with different from order herein Perform shown or described step.
Fig. 1 is the flow chart of the data capture method according to the embodiment of the present application, as shown in figure 1, the method include as Lower step:
Step S102, receiving data obtains request, wherein, data acquisition request is used in acquisition request target time section The data to be obtained for generating.
Data to be obtained may include the substantial amounts of data not generated in the same time, and for example, data to be obtained can be derived from A large amount of historical datas of the media platform of search engine.For example, user need from the media platform of search engine obtain from Current time plays (hypothesis current time is 2015.3.1), the entity report within two months, namely target time section It is the time period from 2015.1.1 to 2015.3.1.Can be by target time section here according to the real needs of user Starting, termination time point are set to different precision.For example, user needs to obtain and is accurate to the data in day, then can be with Time point is accurate into day, such as target time section is from 2015.1.1 to 2015.5.1;It is little that user needs acquisition to be accurate to When data, then time point can be accurate to hour, such as 2015.1.1,10:00 to 2015.1.3,20:00.This Application starting, the specific restriction of levels of precision work of termination time point not to target time section.
Step S104, cutting is carried out according to target preset rules to target time section, obtains multiple sub- time periods.
Because the media platform of most search engine all can make to the time span of data to be obtained or data volume Limit, for example, the initial time and the time span of end time of the request of certain media platform setting entity report can not More than 1 year.Therefore, in order to avoid data to be obtained cause data acquisition request to suffer matchmaker because time span is too big The refusal of body platform, can carry out cutting by target time section, obtain multiple time spans less sub- time period, from And ensure successfully to obtain the data to be obtained generated in each sub- time period.Target preset rules are set in advance to mesh The mark time period performs the segmentation rules followed during cutting.Platform of the target preset rules for offer data to be obtained And set.Target time section is carried out after cutting according to target preset rules, the multiple sub- time period for obtaining should meet Following condition:Time span threshold value of the length of each sub- time period less than or equal to platform setting.
For example, the time span threshold value for downloading entity report that the media platform of certain search engine sets was as 1 year.User The target time section of the data that request is downloaded is from 2013.1.1 to 2015.6.1, then for the media of the search engine The segmentation rules (target preset rules) of platform setting should be ensured that each being syncopated as sub- time period is respectively less than 1 year. Such as, can be by 2013.1.1 to 2015.6.1 cuttings:First sub- time period:2013.1.1 to 2014.1.1; Second sub- time period:2014.1.1 to 2015.1.1;And the 3rd sub- time period:2015.1.1 to 2015.6.1.
According to the multiple sub- time period that target preset rules cutting is obtained, the time span of each sub- time period need to be met not More than the time span threshold value of media platform, wherein, the time span of arbitrary two sub- time periods can with equal, Could be arranged to the time span of multiple sub- time periods.The application is not to the when span between multiple sub- time periods Degree relation is specifically limited.
Step S106, obtains successively the multi-group data generated within multiple sub- time periods, wherein, in a sub- time period The data to be obtained for generating are one group of data.
After target time section to be carried out cutting, that is, multiple sub- time periods are obtained, equivalent to data to be obtained are pressed Multi-group data, every group of data one sub- time period of correspondence are split as according to the generation time.User is sending asking for acquisition data After asking, the time span included during this is asked is split as multiple sub- time periods, corresponds to each sub- time period flat to media Platform sends once asks, and the time span threshold value without departing from media platform is asked every time, therefore can accordingly get The data generated in the sub- time period.
For example, user's request obtain data (namely data to be obtained) be from 2010.1.1 to 2012.4.1 when Between entity report data in section.First, according to target preset rules by target time section 2010.1.1 to 2012.4.1 Cutting is the first sub- time period 2010.1.1 to 2012.1.1, and the second sub- time period was 2012.1.1 to 2012.4.1. Next, sends request to media platform successively, and the first request is the entity form number for obtaining generation in the first sub- time period According to request, second request for obtain the second sub- time period in generate entity report data request, according to first please The request of summation second can respectively get one group of data, that is, realize the entity report for obtaining 2010.1.1 to 2012.4.1 The purpose of table data.
According to the data capture method of the embodiment, due to including:Receiving data obtains request, wherein, data acquisition Ask the data to be obtained for generating in acquisition request target time section;According to target preset rules to target time section Cutting is carried out, multiple sub- time periods are obtained;And the multi-group data generated within multiple sub- time periods is obtained successively, its In, the data to be obtained generated in a sub- time period are one group of data, and solving in correlation technique effectively to obtain The technical problem of the larger historical data of time span, so as to by carrying out to target time section according to target preset rules Cutting, obtains multiple sub- time periods, and obtains the multi-group data generated within multiple sub- time periods successively, and having reached has Effect obtains the technique effect of the larger historical data of time span.
Preferably, target time section is the time period from first time point to the second time point, and first time point is earlier than Two time points, cutting is carried out according to target preset rules to target time section, is obtained multiple sub- time periods and is included:With Two time points are cutting starting point, with preset time period as cutting spacing, to target time section cutting are performed, and obtain many The individual sub- time period.
For example, data to be obtained are the visit capacity data in 2014.1.1 to current time (2015.1.1), it is assumed that matchmaker The time span threshold value of body platform is 2 months, then can set preset time period as 2 months.With 2015.1.1 to cut Divide starting point, with 2 months as cutting spacing, cutting is performed to target time section, obtain multiple sub- time periods as follows: 2015.1.1 to 2014.11.1.1;2014.11.1.1 to 2014.9.1;2014.9.1 to 2014.7.1;2014.7.1 To 2014.5.1;2014.5.1 to 2014.3.1;And 2014.3.1 to 2014.1.1.
Further, since for target time section is possibly for preset time period, it is impossible to round just, namely Preset Time The time span of section is not the integral multiple of the time span of preset time period, at this moment in cutting, can be obtained after last cutting To time span less than preset time period the sub- time period.Because the time span of the sub- time period is certainly less than pre- If the time span of time period, therefore, will not go whistle to the data in the media platform acquisition request sub- time period.
For example, data to be obtained are the visit capacity data in 2014.2.1 to current time (2015.1.1), it is assumed that matchmaker The time span threshold value of body platform is 2 months, then can set preset time period as 2 months.With 2015.1.1 to cut Divide starting point, with 2 months as cutting spacing, cutting is performed to target time section, obtain multiple sub- time periods as follows: 2015.1.1 to 2014.11.1.1;2014.11.1.1 to 2014.9.1;2014.9.1 to 2014.7.1;2014.7.1 To 2014.5.1;2014.5.1 to 2014.3.1;And 2014.3.1 to 2014.2.1.It can be seen that, it is therein last One sub- time period from 2014.3.1 to 2014.2.1, less than preset time period 2 months (can from media platform into Work(is obtained).
Preferably, before cutting is carried out to target time section according to target preset rules, the method also includes:It is determined that The identification information of target platform, wherein, target platform is the platform for providing data to be obtained;And according to default The identification information of mapping relations and target platform obtains target preset rules, wherein, default mapping relations are to pre-build Different platform identification information preset rules corresponding with different platform between mapping relations, different platform is corresponding Preset rules include target preset rules, and different platform includes target platform.
The media platform being typically different obtains time span, data volume of data etc. to user and can arrange different requirements. Such as, media platform A required the initial time and the span between the end time of entity report request no more than 1 year, Media platform B requires the initial time and the span between the end time of entity report request no more than half a year.It is then right Need to arrange different preset rules in different media platforms.In database, different platforms can be prestored Identification information and preset rules between mapping relations.It is determined that after target platform, searching in database and being somebody's turn to do The identification information of target platform sets up the preset rules (target preset rules) for having mapping relations, with default according to the target Rule carries out cutting to target time section.The embodiment, can improve time period cutting and data acquisition performs effect Rate.
It should be noted that the determination of preset rules can be through repeatedly testing repeatedly so that system process performance is most A good rule.The determination of the preset rules is related to the species of specific networking operational environment and industry.True After a fixed preset rules, the reasonability for verifying the preset rules is needed (to judge whether it ensure that successfully to obtain Data to be obtained).After verifying that its is reasonable, corresponding to the preset rules of each platform can bind the identification information of the platform, If identification information does not change, verifying logic will not be again triggered, in order to avoid reduce the efficiency of data acquisition.
Preferably, before target preset rules are obtained according to the identification information of default mapping relations and target platform, should Method also includes:The default restriction parameter of different platform is obtained respectively, obtains multiple restriction parameters;According to multiple limits Parameter processed obtains respectively the corresponding preset rules of different platform;And set up the identification information and different platform of different platform Mapping relations between corresponding preset rules, obtain default mapping relations.
In this embodiment it is possible to set preset rules according to the default restriction parameter of different platform.Due to limiting What parameter itself was limited is the maximum time span or maximum amount of data of the data that user obtains every time, therefore, preset The foundation of rule may be referred to limit parameter to be set, to avoid causing request to meet with beyond the restriction for limiting parameter To the refusal of media platform.For example, platform A presets the time span of the data that user takes every time and must not exceed 1 Individual month.Then to should platform A setting preset rules when, preset time period must not exceed 1 month.Getting not After with the corresponding preset rules of platform, reflecting between the identification information of different platform and different preset rules can be set up Penetrate relation (default mapping relations).Default mapping relations can be stored into database, need to obtain particular platform During corresponding preset rules, directly make a look up in database according to default mapping relations.
Optionally it is determined that the identification information of target platform includes:Whether the current identification information of detection target platform is pre- If identification information, obtaining target preset rules according to the identification information of default mapping relations and target platform includes:If The current identification information for detecting target platform is default identification information, then according to default mapping relations and default mark letter Breath obtains target preset rules.
For a media platform, typically according to the identification information of the media platform, in database search with Its foundation has the preset rules of default mapping relations.That is, the identification information for only guaranteeing the media platform is stored in During default identification information in data, corresponding target preset rules could be accurately found in database.It is no Then, it is likely that the target preset rules for getting are not particularly suited for the media platform (because the identification information of media platform It may happen that change, and the change of identification information can represent the change for limiting parameter).Therefore, if detected The current identification information of target platform is default identification information, then may insure to get corresponding with the default identification information Target preset rules apply to current media platform, and then can be according to the target preset rules to target time section Carry out cutting.
Alternatively, identification information is used for the default restriction parameter of unique mark platform, if detecting target platform Current identification information is not default identification information, and the method also includes:Judge according to target preset rules to the object time Whether Duan Jinhang cuttings can get the multi-group data generated in multiple sub- time periods;If it is judged that be it is yes, then Renewal is not performed to target preset rules;If it is judged that being no, then target platform is determined according to current identification information Current restriction parameter;The corresponding current preset rule of parameter acquiring target platform is limited according to current;And will be default Update of identification information is current identification information, and target preset rules are updated into current preset rule, to set up mesh Mapping relations between the current identification information and current preset rule of mark platform.
Identification information is used for the default restriction parameter of unique mark platform, wherein, restriction parameter is different platform to not With the restriction parameter (such as entity report data, click volume report data) of the report request of dimension.Version identifier can be with As the voucher whether the restriction parameter of each platform changes.For example:Certain media platform, call format is csv Form;Separator is comma;Capacity requirement is less than 100,000;The time span of entity data reporting must not exceed 1 Year;Search word report provides only data of generation etc. in 30 days before current time.
For a media platform, its restriction parameter for arranging is possible to adjust.For example, ginseng will be limited Time span threshold value was adjusted to 0.5 year by 1 year in number.Because identification information is used for the default limit of unique mark platform Parameter processed, when restriction parameter changes, identification information also can change.Therefore, if detecting that target is put down The current identification information of platform is not default identification information, then need to verify target preset rules, to judge the mesh Whether mark preset rules stand good in the media platform for being provided with current restriction parameter.
Cutting can be carried out to target time section according further still to target preset rules, judge whether that many height can be got The multi-group data generated in time period.When the restriction parameter of media platform changes, different situations of change may Different impacts are produced, namely after may changing parameter is limited, target preset rules stand good in current limit The media platform of parameter processed, it is also possible to which after restriction parameter changes, target preset rules are no longer desirable for currently Limit the media platform of parameter.Time span threshold value is likely to from large to small, then target preset rules and is no longer suitable for, and Time span threshold value is changed from small to big, then target preset rules very likely stand good.For example, in target preset rules It it is 1 year to the cutting spacing of target time section, initially default time span threshold value (is corresponded to and is stored in database User totem information) be 2 years.If time span threshold value was changed into 0.5 year from 2 years, target preset rules No longer it is suitable for;If time span threshold value be changed into 2.5 years from 2 years, target preset rules stand good.
If it is judged that target preset rules are no longer desirable for conditions present, then determine that target is put down according to current identification information The current restriction parameter of platform, according to current the corresponding current preset rule of parameter acquiring target platform is limited, and will be default Update of identification information is current identification information, target preset rules is updated to current preset rule, put down with setting up target Mapping relations between the current identification information of platform and current preset rule.The embodiment, can improve the default rule of acquisition Accuracy then, and then the accuracy of dicing process and data acquisition is effectively ensured.
Wherein, because the identification information of above-mentioned target platform may have occurred the current identification letter of change, i.e. target platform The default identification information of breath and the target platform for storing may be inconsistent, then determines whether same target platform Process can be completed by manual identified.The accuracy of data capture method in order to improve the application, can be with every Jing Whether cross preset time period (such as month) to the current identification information of target platform is that default identification information is detected.
Preferably, before the multi-group data generated within multiple sub- time periods is obtained successively, the method also includes:Will Multiple sub- time periods, according to the sequencing of time, are preserved successively to default queue, are obtained successively in multiple sub- time periods The multi-group data of interior generation includes:Each the sub- time period being successively read in the multiple sub- time period preserved in default queue, A sub- time period is often read, then obtains the one group of data generated in the sub- time period.
In this embodiment, multiple sub- time periods are preserved successively to default queue according to the sequencing of time, is being read When taking can the first read access time first sub- time period, and obtain the data generated in first sub- time period time.The reality Apply example, it is ensured that the order of the data of acquisition, be conducive to follow-up statistics to data, analysis and manage.
Alternatively, data to be obtained include the data to be obtained of various dimensions, obtain raw within multiple sub- time periods successively Into multi-group data include:Data to be obtained are classified according to dimension, multiclass data to be obtained are obtained;And it is right The every class data to be obtained in multiclass data to be obtained are answered, the multi-group data generated within multiple sub- time periods is obtained successively.
The data of various dimensions may be included in data to be obtained, for example, data to be obtained include entity form number According to, search word report data and click volume report data.For the ease of number of the later stage to different dimensions in data to be obtained According to unified statistics, analysis etc. is carried out, the data of various dimensions in data to be obtained can be adopted same preset rules. It should be noted that due to for a media platform, may be different to the restriction parameter of the data of different dimensions. For example, the time span threshold value of entity report data is 1 year, and the time span threshold value of search word report data is 2 Month.Therefore, during the preset rules for following when the data setting for different dimensions is obtained, to meet each dimension Data are defined.For example, in the above example, preset time period should be at least below 2 months (search word report datas Time span threshold value).
Fig. 2 is the schematic diagram for obtaining the time period cutting of multi-dimensional data according to the embodiment of the present application.Such as Fig. 2 Shown, data to be obtained include the data of three kinds of dimensions:Entity report data, search word report data and click volume Report data.By the way of end time alignment, cutting is carried out to target time section, obtain multiple sub- time period (P1 To P8), and concrete starting and the deadline of each sub- time period are calculated respectively.It can be seen that per height Comprising a kind of (such as P6, P7 and P8), two kinds (such as P5) or three kinds (such as P1, P2, P3 and P4) in time period The data of dimension.Because report data is related to calculate and counts, it is therefore necessary to the initial time earliest sub- time period It is gradually synchronous, it is therefore desirable to which that a queue multiple sub- time period of the storage with sequencing is set, storage order be P8, P7、P6、P5、P4、P3、P2、P1.When data are obtained, the data for obtaining every kind of dimension successively are started from P8.
The embodiment, supports to obtain the big historical data of time span (in theory to historical data time span not about Beam), Operating Complexity and error-prone property that manual handle is brought can be reduced;And support laterally (time) and longitudinal direction (dimension) cutting, ensure that the number of dimensions of the data included in each sub- time period is as more as possible, so as to improve Data acquisition efficiency.
Below according to the embodiment of the present application, a kind of data acquisition facility is additionally provided.
It should be noted that can be used for performing according to the data acquisition facility of the embodiment of the present application being implemented according to the application The data capture method of example, can also be by according to the embodiment of the present application according to the data capture method of the embodiment of the present application Data acquisition facility performing.
Fig. 3 is the schematic diagram of the data acquisition facility according to the embodiment of the present application.As shown in figure 3, the device includes: Receiving unit 20, cutting unit 40 and acquiring unit 60.
Receiving unit 20, obtains for receiving data and asks, wherein, when data acquisition request is used for acquisition request target Between the data to be obtained that generate in section.
Cutting unit 40, for carrying out cutting to target time section according to target preset rules, obtains multiple sub- time periods.
Acquiring unit 60, for obtaining the multi-group data generated within multiple sub- time periods successively, wherein, a period of the day from 11 p.m. to 1 a.m Between the data to be obtained that generate in section be one group of data.
According to the data acquisition facility of the embodiment, due to including:The receiving data of receiving unit 20 obtains request, wherein, Data acquisition request is used for the data to be obtained generated in acquisition request target time section;Cutting unit 40 is pre- according to target If rule carries out cutting to target time section, multiple sub- time periods are obtained;And acquiring unit 60 is obtained successively multiple The multi-group data generated in the sub- time period, wherein, the data to be obtained generated in a sub- time period are one group of data, Solving cannot effectively obtain the technical problem of the larger historical data of time span in correlation technique, so as to pass through cutting Unit 40 carries out cutting according to target preset rules to target time section, obtains multiple sub- time periods, acquiring unit 60 The multi-group data generated within multiple sub- time periods is obtained successively, has reached the larger history number of effective acquisition time span According to technique effect.
Preferably, target time section is the time period from first time point to the second time point, and first time point is earlier than Two time points, cutting unit 40 includes:Cutting module, for the second time point as cutting starting point, with it is default when Between section be cutting spacing, to target time section perform cutting, obtain multiple sub- time periods.
Above-mentioned the embodiment of the present application sequence number is for illustration only, does not represent the quality of embodiment.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment The part of detailed description, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, other can be passed through Mode realize.Wherein, device embodiment described above is only schematic, such as division of described unit, Can be a kind of division of logic function, can there is other dividing mode, such as multiple units or component when actually realizing Can with reference to or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, institute The coupling each other for showing or discussing or direct-coupling or communication connection can be by some interfaces, unit or mould The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The unit as separating component explanation can be or may not be it is physically separate, it is aobvious as unit The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to On multiple units.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme Purpose.
In addition, each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.It is above-mentioned integrated Unit both can be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is realized and as independent production marketing or when using using in the form of SFU software functional unit, During a computer read/write memory medium can be stored in.Based on such understanding, the technical scheme essence of the application On all or part of prior art is contributed part in other words or the technical scheme can be with software product Form is embodied, and the computer software product is stored in a storage medium, including some instructions are used so that one Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application State all or part of step of method.And aforesaid storage medium includes:USB flash disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD Etc. it is various can be with the medium of store program codes.
The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art For member, on the premise of without departing from the application principle, some improvements and modifications can also be made, these improve and moisten Decorations also should be regarded as the protection domain of the application.

Claims (10)

1. a kind of data capture method, it is characterised in that include:
Receiving data obtains request, wherein, the data acquisition request is used for raw in acquisition request target time section Into data to be obtained;
Cutting is carried out to target time section according to target preset rules, multiple sub- time periods are obtained;And
The multi-group data generated within the plurality of sub- time period is obtained successively, wherein, it is raw in a sub- time period Into data to be obtained be one group of data.
2. method according to claim 1, it is characterised in that the target time section is from first time point to The time period of two time points, the first time point earlier than second time point, according to target preset rules pair Target time section carries out cutting, obtains multiple sub- time periods and includes:
With second time point as cutting starting point, with preset time period as cutting spacing, during to the target Between section perform cutting, obtain the plurality of sub- time period.
3. method according to claim 1, it is characterised in that target time section is being entered according to target preset rules Before row cutting, methods described also includes:
Determine the identification information of target platform, wherein, the target platform is for providing the data to be obtained Platform;And
The target preset rules are obtained according to the identification information of default mapping relations and the target platform, wherein, The default mapping relations are the identification information preset rules corresponding with different platform of the different platform for pre-building Between mapping relations, the corresponding preset rules of the different platform include the target preset rules, it is described not Include the target platform with platform.
4. method according to claim 3, it is characterised in that according to default mapping relations and the target platform Identification information obtain before the target preset rules, methods described also includes:
The default restriction parameter of the different platform is obtained respectively, obtains multiple restriction parameters;
The corresponding preset rules of the different platform are obtained respectively according to the plurality of restriction parameter;And
The mapping set up between the identification information preset rules corresponding with the different platform of the different platform is closed System, obtains the default mapping relations.
5. method according to claim 3, it is characterised in that
Determining the identification information of target platform includes:Whether the current identification information for detecting the target platform is pre- If identification information,
Obtaining the target preset rules according to the identification information of default mapping relations and the target platform includes: If the current identification information for detecting the target platform is the default identification information, according to described default Mapping relations and the default identification information obtain the target preset rules.
6. method according to claim 5, it is characterised in that it is pre- that the identification information is used for unique mark platform institute If restriction parameter, if the current identification information for detecting the target platform is not the default identification information, Methods described also includes:
Judge to be carried out the target time section described in whether cutting can get according to the target preset rules The multi-group data generated in multiple sub- time periods;
If it is judged that being yes, then renewal is not performed to the target preset rules;
If it is judged that being no, then the current restriction of the target platform is determined according to the current identification information Parameter;
The corresponding current preset rule of target platform according to the current restriction parameter acquiring;And
It is the current identification information by the default update of identification information, and by the target preset rules more It is newly current preset rule, is advised with the current preset with the current identification information for setting up the target platform Mapping relations between then.
7. method according to claim 1, it is characterised in that
Before the multi-group data generated within the plurality of sub- time period is obtained successively, methods described also includes: By the plurality of sub- time period according to the sequencing of time, preserve successively to default queue,
Obtaining the multi-group data generated within the plurality of sub- time period successively includes:It is successively read the default team Each sub- time period in the plurality of sub- time period preserved in row, a sub- time period is often read, Then obtain the one group of data generated in the sub- time period.
8. method according to claim 1, it is characterised in that the data to be obtained include various dimensions wait obtain Fetch data, the multi-group data generated within the plurality of sub- time period is obtained successively to be included:
The data to be obtained are classified according to dimension, multiclass data to be obtained are obtained;And
Every class data to be obtained in correspondence multiclass data to be obtained, obtained successively in the plurality of sub- time The multi-group data generated in section.
9. a kind of data acquisition facility, it is characterised in that include:
Receiving unit, obtains for receiving data and asks, wherein, the data acquisition request is used for acquisition request The data to be obtained generated in target time section;
Cutting unit, for carrying out cutting to target time section according to target preset rules, obtains multiple sub- times Section;And
Acquiring unit, for obtaining the multi-group data generated within the plurality of sub- time period successively, wherein, one The data to be obtained generated in the individual sub- time period are one group of data.
10. device according to claim 9, it is characterised in that the target time section is from first time point to The time period of two time points, the first time point includes earlier than second time point, the cutting unit:
Cutting module, for second time point as cutting starting point, with preset time period as cutting spacing, Cutting is performed to the target time section, the plurality of sub- time period is obtained.
CN201510728970.8A 2015-10-30 2015-10-30 Data acquisition method and apparatus Pending CN106649358A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510728970.8A CN106649358A (en) 2015-10-30 2015-10-30 Data acquisition method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510728970.8A CN106649358A (en) 2015-10-30 2015-10-30 Data acquisition method and apparatus

Publications (1)

Publication Number Publication Date
CN106649358A true CN106649358A (en) 2017-05-10

Family

ID=58809258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510728970.8A Pending CN106649358A (en) 2015-10-30 2015-10-30 Data acquisition method and apparatus

Country Status (1)

Country Link
CN (1) CN106649358A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019037379A1 (en) * 2017-08-25 2019-02-28 北京汽车集团有限公司 Method and apparatus for outputting composite material failure mode information
CN109582543A (en) * 2017-09-28 2019-04-05 北京国双科技有限公司 Data retrogressive method and device
CN110704507A (en) * 2019-09-27 2020-01-17 京东城市(北京)数字科技有限公司 Method and device for storing data and method and device for querying data
CN111830913A (en) * 2019-04-22 2020-10-27 北京国电智深控制技术有限公司 Data acquisition method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1687950A (en) * 2005-05-31 2005-10-26 威盛电子股份有限公司 On line reserved processing system and method
CN101785216A (en) * 2007-08-20 2010-07-21 三星电子株式会社 System and method for multiple contention access periods
US20140006401A1 (en) * 2012-06-30 2014-01-02 Microsoft Corporation Classification of data in main memory database systems
CN103593453A (en) * 2013-11-20 2014-02-19 北京国双科技有限公司 Method and device for calculating user retention ratio
CN104239557A (en) * 2014-09-25 2014-12-24 北京国双科技有限公司 Method and device for monitoring promoted accounts
CN104834660A (en) * 2014-02-12 2015-08-12 Sap欧洲公司 Interval based fuzzy database search

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1687950A (en) * 2005-05-31 2005-10-26 威盛电子股份有限公司 On line reserved processing system and method
CN101785216A (en) * 2007-08-20 2010-07-21 三星电子株式会社 System and method for multiple contention access periods
US20140006401A1 (en) * 2012-06-30 2014-01-02 Microsoft Corporation Classification of data in main memory database systems
CN103593453A (en) * 2013-11-20 2014-02-19 北京国双科技有限公司 Method and device for calculating user retention ratio
CN104834660A (en) * 2014-02-12 2015-08-12 Sap欧洲公司 Interval based fuzzy database search
CN104239557A (en) * 2014-09-25 2014-12-24 北京国双科技有限公司 Method and device for monitoring promoted accounts

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019037379A1 (en) * 2017-08-25 2019-02-28 北京汽车集团有限公司 Method and apparatus for outputting composite material failure mode information
CN109582543A (en) * 2017-09-28 2019-04-05 北京国双科技有限公司 Data retrogressive method and device
CN111830913A (en) * 2019-04-22 2020-10-27 北京国电智深控制技术有限公司 Data acquisition method and device
CN110704507A (en) * 2019-09-27 2020-01-17 京东城市(北京)数字科技有限公司 Method and device for storing data and method and device for querying data

Similar Documents

Publication Publication Date Title
CN106649358A (en) Data acquisition method and apparatus
CN104504077B (en) The statistical method and device of web page access data
CN105243169B (en) A kind of data query method and system
EP3324352A1 (en) Testing system
CN112382362B (en) Data analysis method and device for target drugs
CN103577660B (en) Gray scale experiment system and method
CN109461053B (en) Dynamic distribution method of multiple recommendation channels, electronic device and storage medium
CN104484558A (en) Method and system for automatically generating analysis reports of biological information projects
CN108022123B (en) Automatic adjustment method and device for business model
CN110209714A (en) Report form generation method, device, computer equipment and computer readable storage medium
CN113242159A (en) Application access relation determining method and device
CN108197207A (en) Batch data matches introduction method
CN111241217A (en) Data processing method, device and system
CN108243046B (en) Service quality assessment method and device based on data audit
CN112817832B (en) Method, device and equipment for acquiring health state of game server and storage medium
CN106780062A (en) Based on groups of users update method and system that social networks and big data are analyzed
CN110781340A (en) Offline evaluation method, system and device for recall strategy of recommendation system and storage medium
CN103812912B (en) A kind of method and device of maintenance organization structural information
CN104917812A (en) Service node selection method applied to group intelligence calculation
CN116090349A (en) Optical film production process optimization method, equipment and storage medium
CN115905373A (en) Data query and analysis method, device, equipment and storage medium
CN106708873A (en) Data integration method data integration device
CN112269879B (en) Method and equipment for analyzing middle station log based on k-means algorithm
CN108287909A (en) A kind of paper method for pushing and device
CN113312902A (en) Intelligent auditing and checking method and device for same text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170510