CN104298739A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN104298739A
CN104298739A CN201410527974.5A CN201410527974A CN104298739A CN 104298739 A CN104298739 A CN 104298739A CN 201410527974 A CN201410527974 A CN 201410527974A CN 104298739 A CN104298739 A CN 104298739A
Authority
CN
China
Prior art keywords
data
segment
pending
sample
index information
Prior art date
Application number
CN201410527974.5A
Other languages
Chinese (zh)
Other versions
CN104298739B (en
Inventor
余正宁
崔文革
罗喜霜
Original Assignee
北京经纬恒润科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京经纬恒润科技有限公司 filed Critical 北京经纬恒润科技有限公司
Priority to CN201410527974.5A priority Critical patent/CN104298739B/en
Publication of CN104298739A publication Critical patent/CN104298739A/en
Application granted granted Critical
Publication of CN104298739B publication Critical patent/CN104298739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures

Abstract

The invention discloses a data processing method and device. The data processing method includes that acquiring data to be processed, and judging whether the data volume of the data to be processed is larger than a preset threshold; if the data volume of the data to be processed is larger than the preset threshold, dividing the data to be processed into a plurality of data segments, wherein the data volume of each data segment is less than or equal to the preset threshold; selecting at least one sample data from the data segments, using the selected at least one sample data to construct a data sub-set for global browsing, wherein the data volume of the data sub-set is less than or equal to the preset threshold. By means of the data processing method, a mass of data to be processed can be divided into a plurality of small data segments, and then a plurality of sample data are selected from the plurality of data segments to form the data sub-set, the data volume of each of the data segments and data sub-set is less than or equal to the preset threshold, an existing data checking software can browse and check the data sub-set, and accordingly the global browsing for a mass of data can be realized through browsing the constructed data sub-set.

Description

A kind of data processing method and device
Technical field
The application relates to technical field of data processing, particularly relates to a kind of data processing method and device.
Background technology
In various engineering test or simulation process, generally all test or emulated data can be preserved, facilitate later stage playback and check.
For some test or simulating scenes, working time is very long, needs the data volume of keeping records very large, generally can reach GB rank.If all data are all recorded in a data file, data file is not split, when data volume is very large, this data file will be very large, what even the large young pathbreaker of individual data file exceeded software checks the limit, software all directly cannot check the data in this data file to cause existing data to be checked, causes browsing failure.Therefore, large data file, in the process of record data, generally all can be split as multiple little data file, preserve respectively the data of different section by prior art, reduces the size of data file, is convenient to software playback and checks.
But, in prior art, at least there are the following problems: when large data file is split as multiple little data file, the data in the little data file of current selection can only be checked, the overall situation cannot be carried out to mass data to browse, when the little data file particularly split into is a lot, browse very inconvenient.
Summary of the invention
In view of this, the application provides a kind of data processing method and device, to realize browsing the overall situation of mass data.
To achieve these goals, the technical scheme that provides of the embodiment of the present application is as follows:
A kind of data processing method, comprising:
Obtain pending data, judge whether the data volume of described pending data is greater than predetermined threshold value;
If the data volume of described pending data is greater than predetermined threshold value, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value;
From described multiple data segment, select at least one sample data, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
Preferably, described is multiple data segment by described pending Data Placement, comprising:
Obtain the first index information of described pending data;
Be multiple data segment according to described first index information described pending Data Placement, described first index information is for locating the data in each data segment.
Preferably, describedly from described multiple data segment, select at least one sample data, comprising:
Determine at least one sample data in described pending data according to the predetermined sampling interval, and determine the data segment at each sample data place according to described first index information;
From the data segment at each sample data place, select described sample data, and record the corresponding relation of the data segment at each sample data and place.
Preferably, it is characterized in that, also comprise:
Obtain and the selection of the sample data in described data subset is operated, determine the described sample data selected selected by operation;
According to the corresponding relation of the data segment at each sample data and place, extract and show the data in the described data segment selecting the sample data selected by operation corresponding.
Preferably, also comprise:
Obtain the second index information of each data segment;
According to described second index information, each data segment is divided into multiple subdata section, described second index information is for locating the data in each subdata section;
Obtain and operation is selected to second of the data in described subdata section, according to described second index information, extract and show the data in the described second subdata section selecting the data selected by operation corresponding.
The application also provides a kind of data processing equipment, comprising:
First acquisition module, for obtaining pending data, judges whether the data volume of described pending data is greater than predetermined threshold value;
First divides module, if be greater than predetermined threshold value for the data volume of described pending data, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value;
Build module, for selecting at least one sample data from described multiple data segment, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
Preferably, described first divides module, comprising:
Acquiring unit, for obtaining the first index information of described pending data;
Division unit, for being multiple data segment according to described first index information described pending Data Placement, described first index information is for locating the data in each data segment.
Preferably, described structure module, comprising:
Sample unit, for determining at least one sample data in described pending data according to the predetermined sampling interval, and determines the data segment at each sample data place according to described first index information;
Selection unit, for selecting described sample data in the data segment from each sample data place, and records the corresponding relation of the data segment at each sample data and place.
Preferably, it is characterized in that, also comprise:
Determination module, operates the selection of the sample data in described data subset for obtaining, and determines the described sample data selected selected by operation;
Extraction module, for the corresponding relation of the data segment according to each sample data and place, extracts and shows the data in the described data segment selecting the sample data selected by operation corresponding.
Preferably, also comprise:
Second acquisition module, for obtaining the second index information of each data segment;
Second divides module, and for each data segment being divided into multiple subdata section according to described second index information, described second index information is for locating the data in each subdata section;
Second extraction module, selects operation for obtaining to second of the data in described subdata section, according to described second index information, extracts and shows the data in the described second subdata section selecting the data selected by operation corresponding.
The data processing method provided by above the application and device, obtain pending data, judges whether the data volume of described pending data is greater than predetermined threshold value; If the data volume of described pending data is greater than predetermined threshold value, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value; From described multiple data segment, select at least one sample data, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.Like this, can be multiple little data segments by the pending Data Placement of magnanimity, then from multiple data segment, select multiple sample data to form data subset, data volume in data segment and data subset is all not more than predetermined threshold value, can check that software is browsed and checks for existing data, thus be browsed by the overall situation that the data subset browsing structure can realize mass data.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, the accompanying drawing that the following describes is only some embodiments recorded in the application, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The process flow diagram of the data processing method that Fig. 1 provides for the embodiment of the present application one;
The process flow diagram of the data processing method that Fig. 2 provides for the embodiment of the present application two;
The process flow diagram of the data processing method that Fig. 3 provides for the embodiment of the present application three;
The process flow diagram of the data processing method that Fig. 4 provides for the embodiment of the present application four;
The process flow diagram of the data processing method that Fig. 5 provides for the embodiment of the present application five;
The structural representation of a kind of data processing equipment that Fig. 6 provides for the application;
The structural representation of the another kind of data processing equipment that Fig. 7 provides for the application;
The structural representation of another data processing equipment that Fig. 8 provides for the application;
The structural representation of another data processing equipment that Fig. 9 provides for the application;
The structural representation of another data processing equipment that Figure 10 provides for the application.
Embodiment
Technical scheme in the application is understood better in order to make those skilled in the art person, below in conjunction with accompanying drawing, the technical scheme of the application is clearly and completely described, obviously, described embodiment is only some embodiments of the present application, instead of whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not making the every other embodiment obtained under creative work prerequisite, all should belong to the scope of the application's protection.
In order to make those skilled in the art person understand the application's scheme better, below in conjunction with accompanying drawing, the application is described in further detail:
Embodiment one:
The process flow diagram of the data processing method that Fig. 1 provides for the embodiment of the present application one.
With reference to shown in Fig. 1, the data processing method that the embodiment of the present application provides, comprising:
Step S11: obtain pending data, judges whether the data volume of described pending data is greater than predetermined threshold value.
In the embodiment of the present application, after getting pending data, can obtain the data message of pending data further, comprise data name, storing path, data type, data layout and data volume size etc., the function that all can be carried by operating system is directly obtained.
Due to the data that pending data may be magnanimity, be difficult to be read by software after saving as a data file, therefore after the data message getting pending data, can judge whether the data volume of pending data is greater than predetermined threshold value according to the data message of pending data, predetermined threshold value here can be the maximal value of the data volume that popular software can read.
Step S12: if the data volume of described pending data is greater than predetermined threshold value, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value.
In the embodiment of the present application, when the data volume of pending data is greater than predetermined threshold value, namely when the data volume of pending data is greater than the maximal value of the data volume that software can read, can be multiple data segment by pending Data Placement, the data volume in each data segment is made all to be not more than described predetermined threshold value, so that each data segment all can be read by software.
Step S13: select at least one sample data from described multiple data segment, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
In the embodiment of the present application, at least one sample data is selected from multiple data segments that step S12 divides, a sample data can be selected from each data segment, also multiple sample data can be selected from any one data segment, a sample data can also be selected from multiple data segment, also namely the number of sample data is not necessarily consistent with the number of data segment, the number of preferred sample data is The more the better, the number of the data segment at place is The more the better, then at least one sample data selected is utilized to build the data subset browsed for the overall situation, the data volume in described data subset is made to be not more than described predetermined threshold value equally, namely the data subset browsed for the overall situation of structure is opened by software, the overall situation of the pending data of general view.
The data processing method provided by above the embodiment of the present application, obtains pending data, judges whether the data volume of described pending data is greater than predetermined threshold value; If the data volume of described pending data is greater than predetermined threshold value, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value; From described multiple data segment, select at least one sample data, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.Like this, can be multiple little data segments by the pending Data Placement of magnanimity, then from multiple data segment, select multiple sample data to form data subset, data volume in data segment and data subset is all not more than predetermined threshold value, can check that software is browsed and checks for existing data, thus be browsed by the overall situation that the data subset browsing structure can realize mass data.
Embodiment two:
The process flow diagram of the data processing method that Fig. 2 provides for the embodiment of the present application two.
With reference to shown in Fig. 2, the data processing method that the embodiment of the present application provides, comprising:
Step S21: obtain pending data, judges whether the data volume of described pending data is greater than predetermined threshold value.
Step S22: if the data volume of described pending data is greater than predetermined threshold value, obtain the first index information of described pending data.
In the embodiment of the present application, if the data volume of described pending data is greater than predetermined threshold value, then index process is carried out to data, obtain the first index information of described pending data.Such as, if comprise temporal information in pending data, can therefrom extracting time information, as the first index information, here the first index information is the foundation of follow-up data process, if now only have data message to have no time information in pending data, then can automatically give extra temporal information, jointly be kept in an index file together with data message.
Step S23: be multiple data segment described pending Data Placement according to described first index information, described first index information is for locating the data in each data segment, and the data volume in each data segment is all not more than described predetermined threshold value.
In the embodiment of the present application, first index information is for locating the data in each data segment, refer to the first index information can uniquely locate in pending data certain or certain group data information, here the first index information may be certain data name, data value or the combination of the two, such as: " data when test period is 0.1 second ", here, " test period " is the first index information, because " 0.1 second " this time value is unique, there will not be multiple 0.1 second.Contrary, " data when speed is 0.1m/s ", then uniquely can not determine data or one group of data, because speed can change, repeatedly can reach 0.1m/s, " speed " therefore in this example is not the first index information.
According to the first index information selected, set the corresponding relation of the data segment of the first index information and pending data, can be just multiple data segment according to the first index information described pending Data Placement according to this corresponding relation, and this corresponding relation can be kept in an index file jointly together with data message obtained before.
For the pending data test.txt that a size is 3.29GB, its data layout is with reference to table 1 below.When often row in pending data all preserve a variable data, the line number of pending data, more than 2,000 ten thousand row, utilizes conventional data scan tool all cannot browse.
And Time variable is unique in test.txt, therefore can using this variable as the first index information, conveniently search, the value of the first index information and Time is needed to be mapped with the line number of pending data, therefore, according to the line number of pending data corresponding to the value of Time, pending data can be divided into different data segments according to line number.Such as, the information of preserving is needed to comprise in the index file of this pending data:
The data name of pending data: test.txt
The data volume size of pending data: 3.29GB
The position of pending data: C: (example)
The data layout of pending data: TXT
First index information of pending data:
Time (time) Line (line number)
0 2~20001
100 20002~40001
200 40002~60001
…… ……
Table 1
Wherein, the data segment that first index information Time is corresponding when being 0 is 2 ~ 20001 row, the data segment that first index information Time is corresponding when being 100 is 20002 ~ 40001 row, and data segment corresponding when the first index information Time is 200 is 40002 ~ 60001 row, and the rest may be inferred.
Step S24: select at least one sample data from described multiple data segment, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
The data processing method provided by above the embodiment of the present application, obtains pending data, judges whether the data volume of described pending data is greater than predetermined threshold value; If the data volume of described pending data is greater than predetermined threshold value, be multiple data segment according to the first index information of described pending data obtained by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value; From described multiple data segment, select at least one sample data, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.Like this, can be multiple little data segments by the pending Data Placement of magnanimity according to the first index information, then from multiple data segment, select multiple sample data to form data subset, data volume in data segment and data subset is all not more than predetermined threshold value, can check that software is browsed and checks for existing data, thus be browsed by the overall situation that the data subset browsing structure can realize mass data.
Embodiment three:
The process flow diagram of the data processing method that Fig. 3 provides for the embodiment of the present application three.
With reference to shown in Fig. 3, the data processing method that the embodiment of the present application provides, comprising:
Step S31: obtain pending data, judges whether the data volume of described pending data is greater than predetermined threshold value.
Step S32: if the data volume of described pending data is greater than predetermined threshold value, obtain the first index information of described pending data.
In the embodiment of the present application, if the data volume of described pending data is greater than predetermined threshold value, then index process is carried out to data, obtain the first index information of described pending data.Such as, if comprise temporal information in pending data, can therefrom extracting time information, as the first index information, here the first index information is the foundation of follow-up data process, if now only have data message to have no time information in pending data, then can automatically give extra temporal information, jointly be kept in an index file together with data message.
Step S33: be multiple data segment described pending Data Placement according to described first index information, described first index information is for locating the data in each data segment, and the data volume in each data segment is all not more than described predetermined threshold value.
In the embodiment of the present application, first index information is for locating the data in each data segment, refer to the first index information can uniquely locate in pending data certain or certain group data information, here the first index information may be certain data name, data value or the combination of the two, such as: " data when test period is 0.1 second ", here, " test period " is the first index information, because " 0.1 second " this time value is unique, there will not be multiple 0.1 second.Contrary, " data when speed is 0.1m/s ", then uniquely can not determine data or one group of data, because speed can change, repeatedly can reach 0.1m/s, " speed " therefore in this example is not the first index information.
According to the first index information selected, set the corresponding relation of the data segment of the first index information and pending data, can be just multiple data segment according to the first index information described pending Data Placement according to this corresponding relation, and this corresponding relation can be kept in an index file jointly together with data message obtained before.
Step S34: determine at least one sample data in described pending data according to the predetermined sampling interval, and determine the data segment at each sample data place according to described first index information.
In the embodiment of the present application, the predetermined sampling interval can be determined according to the first index information in index file, also can according to demand sets itself, here sampling interval can be corresponding with the first index information the number of data segment consistent, also can be inconsistent, situation preferably consistent in the embodiment of the present application.
In the embodiment of the present application, after determining the sampling interval of described pending data, just can determine sample data to be extracted according to this sampling interval from pending data, and, because the first index information is corresponding with data segment, after determining each sample data, just can determine the data segment at each sample data place according to described first index information.
Step S35: select described sample data from the data segment at each sample data place, and record the corresponding relation of the data segment at each sample data and place, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
In the embodiment of the present application, after determining the first index information, can sample as required to pending data, obtain a data subset, the data volume of this data subset is applicable to existing data and checks that software directly processes.This data subset can reach the effect of preview overall situation trend, reaches preview effect to global data by reading and showing this data set.
Such as, for the pending data test.txt of above example, can according to the corresponding relation of the first index information Time and data segment and line number Line, one group of sample data is extracted at interval of 20000 row, like this, the data volume of data subset then only has about 1000 row, and existing data scan tool can easily be browsed completely.
The data processing method provided by above the embodiment of the present application, obtains pending data, judges whether the data volume of described pending data is greater than predetermined threshold value; If the data volume of described pending data is greater than predetermined threshold value, be multiple data segment according to the first index information of described pending data obtained by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value; From described multiple data segment, select at least one sample data, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.Like this, can be multiple little data segments by the pending Data Placement of magnanimity according to the first index information, then from multiple data segment, select multiple sample data to form data subset, data volume in data segment and data subset is all not more than predetermined threshold value, can check that software is browsed and checks for existing data, thus be browsed by the overall situation that the data subset browsing structure can realize mass data.
Embodiment four:
The process flow diagram of the data processing method that Fig. 4 provides for the embodiment of the present application four.
With reference to shown in Fig. 4, the data processing method that the embodiment of the present application provides, comprising:
Step S41: obtain pending data, judges whether the data volume of described pending data is greater than predetermined threshold value.
Step S42: if the data volume of described pending data is greater than predetermined threshold value, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value.
Step S43: select at least one sample data from described multiple data segment, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
Step S44: obtain and select operation to first of the sample data in described data subset, determines the sample data selected by described first selection operation.
In the embodiment of the present application, user, after browsing the trend of whole pending data according to the data subset overall situation, can also as required, select the interested sample data in described data subset.
Step S45: according to the corresponding relation of the data segment at each sample data and place, extracts and shows the data in the described first data segment selecting the sample data selected by operation corresponding.
In the embodiment of the present application, after determining that user's first selects the sample data selected by operation, the data segment at the sample data place that user selects can be determined, and go out the total data of this data segment from pending extracting data, check in detail.
Wherein, when comprising the first index information in pending data, the data segment at the sample data place that user selects can be determined according to the first index information, when not comprising the first index information in pending data, also directly can determine the data segment at the sample data place that user selects according to the data segment divided, the application is not limited in any way this.
Continue for above-mentioned example, if user selects the sample data of checking the 20000th row in detail, then can directly determine that the sample data of the 20000th row is in the data segment of 2nd ~ 20001 by the data segment divided, also 0 can determine that data segment corresponding to the sample data of the 20000th row is 2 ~ 20001 row by the first index information Time corresponding according to the sample data of the 20000th row, and then all data can extracted in the data segment of 2nd ~ 20001 showing, check in detail for user.
The data processing method provided by above the embodiment of the present application, obtains pending data, judges whether the data volume of described pending data is greater than predetermined threshold value; If the data volume of described pending data is greater than predetermined threshold value, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value; From described multiple data segment, select at least one sample data, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value; Obtain and operation is selected to first of the sample data in described data subset, determine the sample data selected by described first selection operation; According to the corresponding relation of the data segment at each sample data and place, extract and show the data in the described first data segment selecting the sample data selected by operation corresponding.Like this, can be multiple little data segments by the pending Data Placement of magnanimity, then from multiple data segment, select multiple sample data to form data subset, data volume in data segment and data subset is all not more than predetermined threshold value, can check that software is browsed and checks for existing data, thus browsed by the overall situation that the data subset browsing structure can realize mass data, then interested sample data can also be selected from data subset, check the total data of the data segment at this sample data place in detail, can after the overall situation be browsed, scope is browsed for user's rapid drop data, the experience browsing part data of interest is in detail provided.
Embodiment five:
The process flow diagram of the data processing method that Fig. 5 provides for the embodiment of the present application five.
With reference to shown in Fig. 5, the data processing method that the embodiment of the present application provides, comprising:
Step S51: obtain pending data, judges whether the data volume of described pending data is greater than predetermined threshold value.
Step S52: if the data volume of described pending data is greater than predetermined threshold value, obtain the first index information of described pending data.
Step S53: be multiple data segment described pending Data Placement according to described first index information, described first index information is for locating the data in each data segment, and the data volume in each data segment is all not more than described predetermined threshold value.
In the embodiment of the present application, first index information is for locating the data in each data segment, refer to the first index information can uniquely locate in pending data certain or certain group data information, here the first index information may be certain data name, data value or the combination of the two, such as: " data when test period is 0.1 second ", here, " test period " is the first index information, because " 0.1 second " this time value is unique, there will not be multiple 0.1 second.Contrary, " data when speed is 0.1m/s ", then uniquely can not determine data or one group of data, because speed can change, repeatedly can reach 0.1m/s, " speed " therefore in this example is not the first index information.
According to the first index information selected, set the corresponding relation of the data segment of the first index information and pending data, can be just multiple data segment according to the first index information described pending Data Placement according to this corresponding relation, and this corresponding relation can be kept in an index file jointly together with data message obtained before.
Step S54: select at least one sample data from described multiple data segment, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
Step S55: the second index information obtaining each data segment.
In the embodiment of the present application, in order to prevent the information of preserving in single index information too many, affecting processing speed, can also two-stage index be set up, namely outside the first index information of pending data, for each data segment sets up the second index information.
Step S56: each data segment is divided into multiple subdata section according to described second index information, described second index information is for locating the data in each subdata section.
In the embodiment of the present application, because the second index information is the second index information of each data segment, so the second index information can be multiple subdata sections each data segment Further Division.
For the pending data test.txt that a size is 3.29GB, its data layout is with reference to table 2 below.When often row in pending data all preserve a variable data, the line number of pending data, more than 2,000 ten thousand row, utilizes conventional data scan tool all cannot browse.
And Time variable is unique in test.txt, therefore can using this variable as the first index information, conveniently search, the value of the first index information and Time is needed to be mapped with the line number of pending data, therefore, according to the line number of pending data corresponding to the value of Time, pending data can be divided into different data segments according to line number.Such as, the information of preserving is needed to comprise in the index file of this pending data:
The data name of pending data: test.txt
The data volume size of pending data: 3.29GB
The position of pending data: C: (example)
The data layout of pending data: TXT
First index information of pending data:
Time (time) Line (line number) Secondary index
0 2~20001 ?
100 20002~40001 1.index
200 40002~60001 2.index
…… …… ……
Table 2
Wherein, the data segment that first index information Time is corresponding when being 0 is 2 ~ 20001 row, the data segment that first index information Time is corresponding when being 100 is 20002 ~ 40001 row, and data segment corresponding when the first index information Time is 200 is 40002 ~ 60001 row, and the rest may be inferred.
And due to the index span in upper table 2 larger, therefore add " secondary index " i.e. the second index information, convenient to after data Primary Location, again reduce load position.Such as, save the second index information that the first index information Time is 2 ~ 20001 row data segments from the Line of 0 to 100 correspondences in 1.index, its content is as following table 3:
Table 3
Wherein, the subdata section that second index information Time is corresponding when being 0 is 2 ~ 2001 row, the subdata section that second index information Time is corresponding when being 10 is 2002 ~ 4001 row, and subdata section corresponding when the second index information Time is 20 is 4002 ~ 6001 row, and the rest may be inferred.
Contrasted from table 2 and table 3, Time from the data between 5 to 15, in the first index information Time from 0 to 100 between corresponding data segment, i.e. between the 2 to 20002 row of pending data.Owing to there is the second index information, therefore according to the second index information 1.index, can locator data position further, Time from data second index information between 5 to 15 Time from the subdata section between 0 to 20, by such location, data area can be reduced 1000 times.Between 2 to 4002 row of i.e. pending data, conventional data scan tool can show the data of this data volume.
Step S57: obtain and select operation to second of the data in described subdata section, according to described second index information, extracts and shows the data in the described second subdata section selecting the data selected by operation corresponding.
In the embodiment of the present application, user, after viewing the total data in interested data segment, can also as required, select data interested in described data segment.
In the embodiment of the present application, after determining that user's second selects the data selected by operation, the subdata section in the data place data segment that user selects can be determined, and extract the total data of this subdata section, check in detail.
Continue for above-mentioned example, if user selects the sample data of checking the 20000th row in detail, the first index information Time corresponding according to the sample data of the 20000th row 0 determines that data segment corresponding to the sample data of the 20000th row is 2 ~ 20001 row, extract all data in the data segment of 2nd ~ 20001 and show, after checking in detail for user, if user selects the data of checking the 2000th row further, then 0 can determine that subdata section corresponding to the data of the 2000th row is 2 ~ 2001 row by the second index information Time corresponding according to the data of the 2000th row, and then extract 2nd ~ 2001 subdata section in all data and show, check in detail for user.
And, after to utilize software to read Time be all data in the subdata section of 0 2nd ~ 2001 user, because data area is very little, data more among a small circle can also be searched voluntarily at software inhouse, the display carrying out next step with check.
In the embodiment of the present application, do not limit the grade quantity of index information, therefore except the first index information in above-described embodiment and the second index information, the 3rd index information and the 4th index information etc. can also be set, the rest may be inferred, reduces data area so that less.
The data processing method provided by above the embodiment of the present application, obtains pending data, judges whether the data volume of described pending data is greater than predetermined threshold value; If the data volume of described pending data is greater than predetermined threshold value, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value; From described multiple data segment, select at least one sample data, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value; Obtain and operation is selected to first of the sample data in described data subset, determine the sample data selected by described first selection operation; According to the corresponding relation of the data segment at each sample data and place, extract and show the data in the described first data segment selecting the sample data selected by operation corresponding.Like this, can be multiple little data segments by the pending Data Placement of magnanimity, then from multiple data segment, select multiple sample data to form data subset, data volume in data segment and data subset is all not more than predetermined threshold value, can check that software is browsed and checks for existing data, thus browsed by the overall situation that the data subset browsing structure can realize mass data, then interested sample data can also be selected from data subset, the total data of the data segment at this sample data place is checked in detail according to the first index information, the total data in the subdata section at the data place in data segment is checked in detail according to the second index information, can after the overall situation be browsed, scope is browsed for user's rapid drop data, the experience browsing part data of interest is in detail provided.
Be understandable that, for aforesaid each embodiment, if judge that the data volume of described pending data is not more than predetermined threshold value, can directly opened by data scan tool and check pending data, without the need to dividing data section again and carry out subsequent treatment.
For aforesaid each embodiment of the method, in order to simple description, therefore it is all expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not by the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.
The above disclosed data processing method of the present invention, accordingly, the invention also discloses the device applying above-mentioned data processing method, and this device is browsed the overall situation of mass data for realizing.
The structural representation of a kind of data processing equipment that Fig. 6 provides for the application.
With reference to shown in Fig. 6, the data processing equipment that the embodiment of the present application provides, comprising:
First acquisition module 1, for obtaining pending data, judges whether the data volume of described pending data is greater than predetermined threshold value.
First divides module 2, if be greater than predetermined threshold value for the data volume of described pending data, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value.
Build module 3, for selecting at least one sample data from described multiple data segment, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
The data processing equipment that the embodiment of the present application provides, can adopt the data processing method in said method embodiment, repeat no more herein.
The structural representation of the another kind of data processing equipment that Fig. 7 provides for the application.
With reference to shown in Fig. 7, the data processing equipment that the embodiment of the present application provides, comprising:
First acquisition module 1, for obtaining pending data, judges whether the data volume of described pending data is greater than predetermined threshold value.
First divides module 2, if be greater than predetermined threshold value for the data volume of described pending data, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value.
Described first divides module 2 specifically comprises: acquiring unit 21, if be greater than predetermined threshold value for the data volume of described pending data, obtains the first index information of described pending data.Division unit 22, for being multiple data segment according to described first index information described pending Data Placement, described first index information is for locating the data in each data segment, and the data volume in each data segment is all not more than described predetermined threshold value.
Build module 3, for selecting at least one sample data from described multiple data segment, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
The data processing equipment that the embodiment of the present application provides, can adopt the data processing method in said method embodiment, repeat no more herein.
The structural representation of another data processing equipment that Fig. 8 provides for the application.
With reference to shown in Fig. 8, the data processing equipment that the embodiment of the present application provides, comprising:
First acquisition module 1, for obtaining pending data, judges whether the data volume of described pending data is greater than predetermined threshold value.
First divides module 2, if be greater than predetermined threshold value for the data volume of described pending data, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value.
Described first divides module 2 specifically comprises: acquiring unit 21, if be greater than predetermined threshold value for the data volume of described pending data, obtains the first index information of described pending data.Division unit 22, for being multiple data segment according to described first index information described pending Data Placement, described first index information is for locating the data in each data segment, and the data volume in each data segment is all not more than described predetermined threshold value.
Build module 3, for selecting at least one sample data from described multiple data segment, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
Described structure module 3 specifically comprises: sample unit 31, for determining at least one sample data in described pending data according to the predetermined sampling interval, and determines the data segment at each sample data place according to described first index information.Selection unit 32, for selecting described sample data in the data segment from each sample data place, and record the corresponding relation of the data segment at each sample data and place, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
The data processing equipment that the embodiment of the present application provides, can adopt the data processing method in said method embodiment, repeat no more herein.
The structural representation of another data processing equipment that Fig. 9 provides for the application.
With reference to shown in Fig. 9, the data processing equipment that the embodiment of the present application provides, comprising:
First acquisition module 1, for obtaining pending data, judges whether the data volume of described pending data is greater than predetermined threshold value.
First divides module 2, if be greater than predetermined threshold value for the data volume of described pending data, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value.
Build module 3, for selecting at least one sample data from described multiple data segment, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
Determination module 4, operates the selection of the sample data in described data subset for obtaining, and determines the described sample data selected selected by operation.
First extraction module 5, for the corresponding relation of the data segment according to each sample data and place, extracts and shows the data in the described data segment selecting the sample data selected by operation corresponding.
The data processing equipment that the embodiment of the present application provides, can adopt the data processing method in said method embodiment, repeat no more herein.
The structural representation of a kind of data processing equipment that Figure 10 provides for the application.
With reference to shown in Figure 10, the data processing equipment that the embodiment of the present application provides, comprising:
First acquisition module 1, for obtaining pending data, judges whether the data volume of described pending data is greater than predetermined threshold value.
First divides module 2, if be greater than predetermined threshold value for the data volume of described pending data, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value.
Described first divides module 2 specifically comprises: acquiring unit 21, if be greater than predetermined threshold value for the data volume of described pending data, obtains the first index information of described pending data.Division unit 22, for being multiple data segment according to described first index information described pending Data Placement, described first index information is for locating the data in each data segment, and the data volume in each data segment is all not more than described predetermined threshold value.
Build module 3, for selecting at least one sample data from described multiple data segment, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
Described structure module 3 specifically comprises: sample unit 31, for determining at least one sample data in described pending data according to the predetermined sampling interval, and determines the data segment at each sample data place according to described first index information.Selection unit 32, for selecting described sample data in the data segment from each sample data place, and record the corresponding relation of the data segment at each sample data and place, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
Determination module 4, operates the selection of the sample data in described data subset for obtaining, and determines the described sample data selected selected by operation.
First extraction module 5, for the corresponding relation of the data segment according to each sample data and place, extracts and shows the data in the described data segment selecting the sample data selected by operation corresponding.
Second acquisition module 6, for obtaining the second index information of each data segment.
Second divides module 7, and for each data segment being divided into multiple subdata section according to described second index information, described second index information is for locating the data in each subdata section.
Second extraction module 8, selects operation for obtaining to second of the data in described subdata section, according to described second index information, extracts and shows the data in the described second subdata section selecting the data selected by operation corresponding.
The data processing equipment that the embodiment of the present application provides, can adopt the data processing method in said method embodiment, repeat no more herein.
The data processing method provided by above the application and device, obtain pending data, judges whether the data volume of described pending data is greater than predetermined threshold value; If the data volume of described pending data is greater than predetermined threshold value, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value; From described multiple data segment, select at least one sample data, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.Like this, can be multiple little data segments by the pending Data Placement of magnanimity, then from multiple data segment, select multiple sample data to form data subset, data volume in data segment and data subset is all not more than predetermined threshold value, can check that software is browsed and checks for existing data, thus be browsed by the overall situation that the data subset browsing structure can realize mass data.
For convenience of description, various unit is divided into describe respectively with function when describing above device.Certainly, the function of each unit can be realized in same or multiple software and/or hardware when implementing the application.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for device or system embodiment, because it is substantially similar to embodiment of the method, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.Apparatus and system embodiment described above is only schematic, the wherein said unit illustrated as separating component or can may not be and physically separates, parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of module wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.
Professional can also recognize further, in conjunction with unit and the algorithm steps of each example of embodiment disclosed herein description, can realize with electronic hardware, computer software or the combination of the two, in order to the interchangeability of hardware and software is clearly described, generally describe composition and the step of each example in the above description according to function.These functions perform with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can use distinct methods to realize described function to each specifically should being used for, but this realization should not thought and exceeds scope of the present invention.
The software module that the method described in conjunction with embodiment disclosed herein or the step of algorithm can directly use hardware, processor to perform, or the combination of the two is implemented.Software module can be placed in the storage medium of other form any known in random access memory (RAM), internal memory, ROM (read-only memory) (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field.
To the above-mentioned explanation of the disclosed embodiments, professional and technical personnel in the field are realized or uses the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims (10)

1. a data processing method, is characterized in that, comprising:
Obtain pending data, judge whether the data volume of described pending data is greater than predetermined threshold value;
If the data volume of described pending data is greater than predetermined threshold value, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value;
From described multiple data segment, select at least one sample data, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
2. data processing method according to claim 1, is characterized in that, described is multiple data segment by described pending Data Placement, comprising:
Obtain the first index information of described pending data;
Be multiple data segment according to described first index information described pending Data Placement, described first index information is for locating the data in each data segment.
3. data processing method according to claim 2, is characterized in that, describedly from described multiple data segment, selects at least one sample data, comprising:
Determine at least one sample data in described pending data according to the predetermined sampling interval, and determine the data segment at each sample data place according to described first index information;
From the data segment at each sample data place, select described sample data, and record the corresponding relation of the data segment at each sample data and place.
4. according to the data processing method in claim 1-3 described in any one, it is characterized in that, also comprise:
Obtain and operation is selected to first of the sample data in described data subset, determine the sample data selected by described first selection operation;
According to the corresponding relation of the data segment at each sample data and place, extract and show the data in the described first data segment selecting the sample data selected by operation corresponding.
5. data processing method according to claim 4, is characterized in that, also comprises:
Obtain the second index information of each data segment;
According to described second index information, each data segment is divided into multiple subdata section, described second index information is for locating the data in each subdata section;
Obtain and operation is selected to second of the data in described subdata section, according to described second index information, extract and show the data in the described second subdata section selecting the data selected by operation corresponding.
6. a data processing equipment, is characterized in that, comprising:
First acquisition module, for obtaining pending data, judges whether the data volume of described pending data is greater than predetermined threshold value;
First divides module, if be greater than predetermined threshold value for the data volume of described pending data, be multiple data segment by described pending Data Placement, and the data volume in each data segment is all not more than described predetermined threshold value;
Build module, for selecting at least one sample data from described multiple data segment, utilize at least one sample data selected to build the data subset browsed for the overall situation, the data volume in described data subset is not more than described predetermined threshold value.
7. data processing equipment according to claim 6, is characterized in that, described first divides module, comprising:
Acquiring unit, for obtaining the first index information of described pending data;
Division unit, for being multiple data segment according to described first index information described pending Data Placement, described first index information is for locating the data in each data segment.
8. data processing equipment according to claim 7, is characterized in that, described structure module, comprising:
Sample unit, for determining at least one sample data in described pending data according to the predetermined sampling interval, and determines the data segment at each sample data place according to described first index information;
Selection unit, for selecting described sample data in the data segment from each sample data place, and records the corresponding relation of the data segment at each sample data and place.
9. according to the data processing equipment in claim 6-8 described in any one, it is characterized in that, also comprise:
Determination module, operates the selection of the sample data in described data subset for obtaining, and determines the described sample data selected selected by operation;
First extraction module, for the corresponding relation of the data segment according to each sample data and place, extracts and shows the data in the described data segment selecting the sample data selected by operation corresponding.
10. data processing equipment according to claim 9, is characterized in that, also comprises:
Second acquisition module, for obtaining the second index information of each data segment;
Second divides module, and for each data segment being divided into multiple subdata section according to described second index information, described second index information is for locating the data in each subdata section;
Second extraction module, selects operation for obtaining to second of the data in described subdata section, according to described second index information, extracts and shows the data in the described second subdata section selecting the data selected by operation corresponding.
CN201410527974.5A 2014-10-09 2014-10-09 A kind of data processing method and device Active CN104298739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410527974.5A CN104298739B (en) 2014-10-09 2014-10-09 A kind of data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410527974.5A CN104298739B (en) 2014-10-09 2014-10-09 A kind of data processing method and device

Publications (2)

Publication Number Publication Date
CN104298739A true CN104298739A (en) 2015-01-21
CN104298739B CN104298739B (en) 2018-05-25

Family

ID=52318464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410527974.5A Active CN104298739B (en) 2014-10-09 2014-10-09 A kind of data processing method and device

Country Status (1)

Country Link
CN (1) CN104298739B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881475A (en) * 2015-06-02 2015-09-02 北京京东尚科信息技术有限公司 Method and system for randomly sampling big data
CN105302853A (en) * 2015-09-17 2016-02-03 浪潮(北京)电子信息产业有限公司 Data reconstruction method and apparatus
CN105678048A (en) * 2015-12-11 2016-06-15 重庆川仪自动化股份有限公司 Data stability processing method and apparatus applied to instrument development system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106816A1 (en) * 2002-09-27 2006-05-18 Lionel Oisel Method of grouping images from a video sequence
CN101308501A (en) * 2008-06-30 2008-11-19 腾讯科技(深圳)有限公司 Method, system and device for generating video frequency abstract
CN103200463A (en) * 2013-03-27 2013-07-10 天脉聚源(北京)传媒科技有限公司 Method and device for generating video summary
CN103377294A (en) * 2013-07-07 2013-10-30 浙江大学 Color distribution analysis based video summary content extraction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106816A1 (en) * 2002-09-27 2006-05-18 Lionel Oisel Method of grouping images from a video sequence
CN101308501A (en) * 2008-06-30 2008-11-19 腾讯科技(深圳)有限公司 Method, system and device for generating video frequency abstract
CN103200463A (en) * 2013-03-27 2013-07-10 天脉聚源(北京)传媒科技有限公司 Method and device for generating video summary
CN103377294A (en) * 2013-07-07 2013-10-30 浙江大学 Color distribution analysis based video summary content extraction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
付磊: "一种基于关键帧的视频摘要技术在移动平台上的设计与实现方法", 《中国优秀硕士学位论文全文数据库》 *
姜帆等: "新闻视频的场景分段索引及摘要生成", 《计算机学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881475A (en) * 2015-06-02 2015-09-02 北京京东尚科信息技术有限公司 Method and system for randomly sampling big data
CN105302853A (en) * 2015-09-17 2016-02-03 浪潮(北京)电子信息产业有限公司 Data reconstruction method and apparatus
CN105678048A (en) * 2015-12-11 2016-06-15 重庆川仪自动化股份有限公司 Data stability processing method and apparatus applied to instrument development system
CN105678048B (en) * 2015-12-11 2018-10-12 重庆川仪自动化股份有限公司 A kind of data stability processing method and processing device applied to instrument development system

Also Published As

Publication number Publication date
CN104298739B (en) 2018-05-25

Similar Documents

Publication Publication Date Title
US10409822B2 (en) Systems and methods for presenting ranked search results
US9438850B2 (en) Determining importance of scenes based upon closed captioning data
Sun et al. Ranking domain-specific highlights by analyzing edited videos
US10552754B2 (en) Systems and methods for recognizing ambiguity in metadata
Zubiaga et al. Towards real-time summarization of scheduled events from twitter streams
US8826320B1 (en) System and method for voting on popular video intervals
EP2732383B1 (en) Methods and systems of providing visual content editing functions
US9728230B2 (en) Techniques to bias video thumbnail selection using frequently viewed segments
US20140032557A1 (en) Internal Linking Co-Convergence Using Clustering With Hierarchy
Grewal et al. The university rankings game: Modeling the competition among universities for ranking
CN104123332B (en) The display methods and device of search result
US8724957B2 (en) Method for selecting parts of an audiovisual program and device therefor
JP2017522657A (en) User relationship data Search based on combination of user relationship data
US20150161174A1 (en) Content-based image ranking
US8599203B2 (en) Systems and methods for presenting visualizations of media access patterns
US20170169018A1 (en) Method and Electronic Device for Recommending Media Data
US20160034471A1 (en) Entity detection and extraction for entity cards
KR100849420B1 (en) Image-based searching system and method therefor
US20160180402A1 (en) Method for recommending products based on a user profile derived from metadata of multimedia content
CN103064956B (en) For searching for the method for digital content, calculating system and computer-readable medium
CN105138541B (en) The method and apparatus of audio-frequency fingerprint matching inquiry
US10789634B2 (en) Personalized recommendation method and system, and computer-readable record medium
US8843483B2 (en) Method and system for interactive search result filter
CN104143005B (en) A kind of related search system and method
CN102982153B (en) A kind of information retrieval method and device thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 "change of name, title or address"
CP03 "change of name, title or address"

Address after: 4 / F, building 1, No.14 Jiuxianqiao Road, Chaoyang District, Beijing 100020

Patentee after: Beijing Jingwei Hengrun Technology Co., Ltd

Address before: 8 / F, block B, No. 11, Anxiang Beili, Chaoyang District, Beijing 100101

Patentee before: Beijing Jingwei HiRain Technologies Co.,Ltd.