CN107346312A - A kind of big data processing method and system - Google Patents
A kind of big data processing method and system Download PDFInfo
- Publication number
- CN107346312A CN107346312A CN201610294824.3A CN201610294824A CN107346312A CN 107346312 A CN107346312 A CN 107346312A CN 201610294824 A CN201610294824 A CN 201610294824A CN 107346312 A CN107346312 A CN 107346312A
- Authority
- CN
- China
- Prior art keywords
- data
- original document
- files
- file
- memory database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of big data processing method, including:Obtain the original document for including different type large data files;The original document is split as to multiple subdata files of different classification according to the type of large data files;The corresponding server of distribution is sorted out according to different to the multiple subdata file, and the multiple subdata file handled simultaneously on different server.The present invention further simultaneously discloses a kind of big data processing system.
Description
Technical field
The present invention relates to technical field of information processing, more particularly to a kind of big data processing method and system.
Background technology
In many application scenarios, often there is following data handling procedure:Sender is by some different types
Data file be stored in certain form in a file, then by this document folder be compressed after send out
Recipient is given, is parsed after recipient receives compressed file, then to the content in the compressed file,
And logical process.
In above-mentioned data handling procedure, if data file is not very big, and recipient is to processing time
When again without very high requirement, then single server or single thread can be used to be handled.In this case,
System still can normal operation, simply recipient handle time of these file datas may be longer.But
In actual applications, people are frequently encountered the data processing needs of big data quantity, such as:School eduaction people
Member needs to report student data, the processing of large-scale website daily record and two large scale systems to Bureau of Education step by step
Between data syn-chronization etc..At this moment, it is necessary to which the file data of transmission is very big or quantity of documents is a lot, and connect
Debit has very high requirement to processing time again, such as:Recipient requires the number of files that sender sends
According to must be disposed in 1 minute (or in the shorter time).Now, if only relying on separate unit clothes
The processing system of business device or single thread cannot meet the demand.
In addition, under many circumstances, the file data of sender to recipient is that timing transmits, such as often
5 minutes transmission once, and recipient can tolerate the maximum delay of data transfer be it is conditional, now,
If recipient handles these endless data in predetermined time interval, vicious circle will be formed so that
Data in the last cycle are also untreated to be finished, and new data are sent again, and the data of such recipient are prolonged
When will be more and more, finally there is the phenomenon of system crash.
To solve the above problems, big data is entered using K averages (K-MEANS) algorithm in the prior art
Row clustering processing, but the processing procedure is usually directed to the situation that data bulk n is fixed value, and for n
For the situation of changing value, in processing procedure, n often changes once, such as n value increases by 1, corresponds to
Need data to be processed will increase a new data record, then need to re-execute the complete of whole algorithm
Process.Thus considerably increase the operating process of whole system, it is more likely that handle at the appointed time not
Complete need data to be processed, so as to bring very big delay to recipient.
In summary, using prior art, the operating process for how to reduce system as far as possible, in regulation
It is interior to have handled Volume data, alleviate the delay process of data, there is no effective solution.
The content of the invention
In view of this, the embodiment of the present invention it is expected to provide a kind of big data processing method and system, can be to big
Data volume data are fast and effectively handled, to solve that big data quantity can not have been handled at the appointed time
Data and caused by handle delay, and the problem of system crash.
To reach above-mentioned purpose, what the technical scheme of the embodiment of the present invention was realized in:
The embodiment of the present invention provides a kind of big data processing method, and methods described includes:
Obtain the original document for including different type large data files;
The original document is split as to multiple subdatas text of different classification according to the type of large data files
Part;
The corresponding server of distribution is sorted out according to different to the multiple subdata file, and in different server
It is upper that the multiple subdata file is handled simultaneously.
In such scheme, the acquisition includes the original document of different type large data files, including:
Create the form for realizing typesetting function through secondary development;
The incidence relation established between the display logic and memory database of the form;
Identify to the operational order of the form, according to the operational order and incidence relation, from described interior
The original document for including different type large data files is obtained in deposit data storehouse, and is presented in a tabular form;
Wherein, the memory database is used to store different types of large data files.
In such scheme, associating between the display logic for establishing the form and memory database
After system, methods described also includes:
According to the line number of the form, index is established to the data in memory database, and according to the foundation
Index read memory database in corresponding data.
In such scheme, during multiple subdata files that the original document is split as to different classification described,
Methods described also includes:
Collection is to sql sentences corresponding to the original document fractured operation;
Parse the tables of data in the sql sentences and the field and field value in the tables of data;
Field and field value in the tables of data and the tables of data, automatic code generating, and it is right
The code of the generation is compiled, and is generated dynamic link library file or executable program file, is performed fractionation
Original document comprising large data files.
In such scheme, when carrying out fractured operation, if the quantity that request is split is more than prior limitation, inquire about
The historical data of user;
Corresponding classification is searched in classification based on the historical data after cluster.
The embodiment of the present invention also provides a kind of big data processing system, and the system includes:Acquiring unit, tear open
Subdivision and processing unit;Wherein,
The acquiring unit, for obtaining the original document for including different type large data files;
The split cells, for the original document to be split as not reaching the same goal according to the type of large data files
Multiple subdata files of class;
The processing unit, for sorting out the corresponding service of distribution according to different to the multiple subdata file
Device, and the multiple subdata file is handled simultaneously on different server.
In such scheme, the acquiring unit includes:
Form creating unit, for creating the form for realizing typesetting function through secondary development;
Incidence relation establishes unit, the pass between display logic and memory database for establishing the form
Connection relation;
First processing units, for identifying to the operational order of the form, according to the operational order and
Incidence relation, the original document for including different type large data files is obtained from the memory database, and
Present in a tabular form;
Wherein, the memory database is used to store different types of large data files.
In such scheme, the acquiring unit also establishes unit including index, for being built in the incidence relation
Vertical unit is established after the incidence relation between the display logic and memory database of the form, according to described
The line number of form, the data in memory database are established with index, and according in the reading of the index of the foundation
Corresponding data in deposit data storehouse.
In such scheme, the original document is split as to multiple subnumbers of different classification in the split cells
During according to file, the system also includes:
Collecting unit, for gathering to sql sentences corresponding to the original document fractured operation;
Resolution unit, for parsing the tables of data in the sql sentences and the field in the tables of data
And field value;
Second processing unit, for the field and field value in the tables of data and the tables of data,
Automatic code generating, and the code of the generation is compiled, generate dynamic link library file or executable
Program file, perform the original document for splitting and including large data files.
In such scheme, when carrying out fractured operation, if the quantity that request is split is more than prior limitation, inquire about
The historical data of user;
Corresponding classification is searched in classification based on the historical data after cluster.
The big data processing method and system that the embodiment of the present invention is provided, acquisition include different type big data
The original document of file;The original document is split as the more of different classification according to the type of large data files
Individual sub- data file;To the multiple subdata file according to the corresponding server of different classification distribution, and
The multiple subdata file is handled simultaneously on different server.It so, it is possible at the appointed time
Volume data is quickly and efficiently handled, alleviates the delay process of data.
In addition, the embodiment of the present invention, which is based on form, the operation such as can be consulted Volume data, analyzed,
The functions such as the overall situation is sorted in real time, big data is presented also are supported simultaneously.In addition, the embodiment of the present invention passes through
The field of sql sentences is analyzed, splits big data automatically, both ensure that the efficiency for splitting big data, is ensured again
The validity of big data;And based on user's history data, the corresponding classification of lookup in the classification after cluster,
So as to obtain dynamic amount according to predetermined mapping ruler, the expense of hardware resource is saved.
Brief description of the drawings
Fig. 1 is the implementation process schematic diagram of big data processing method of the embodiment of the present invention;
Fig. 2 is the specific implementation schematic flow sheet of big data processing method of the embodiment of the present invention;
Fig. 3 is the structural representation of big data processing system of the embodiment of the present invention.
Embodiment
The characteristics of in order to more fully hereinafter understand the embodiment of the present invention and technology contents, below in conjunction with the accompanying drawings
Realization to the embodiment of the present invention is described in detail, and appended accompanying drawing purposes of discussion only for reference, is not used for
Limit the present invention.
As shown in figure 1, in the embodiment of the present invention big data processing method implementation process, comprise the following steps:
Step 101:Obtain the original document for including different type large data files;
This step 101 specifically includes:
S1011:Create the form for realizing typesetting function through secondary development;
Here, the typesetting function includes but is not limited to:The overall situation is sorted in real time, line number is shown, row freeze,
Column heading enter a new line automatically display, blank column filtering etc. function.
Wherein, the row head ranking function of the form through secondary development is configured to the knot with memory database
The Order by operations that fruit collection is ranked up are bound, and global sequence work(is carried out by clicking on gauge outfit to realize
Energy.
Here, specifically how secondary development is carried out to form and belongs to prior art, will not be repeated here.
S1012:The incidence relation established between the display logic and memory database of the form;
Here, it is described establish the incidence relation between the display logic and memory database of the form after,
Methods described also includes:
According to the line number of the form, index is established to the data in memory database, and according to the foundation
Index read memory database in corresponding data.
S1013:Identify to the operational order of the form, according to the operational order and incidence relation, from
The original document for including different type large data files is obtained in the memory database, and is in a tabular form
It is existing.
Wherein, the memory database is used to store different types of large data files.
Here, the operational order of the identification is the operational order of the various typesettings for form of user's input.
According to the operational order and incidence relation to the form, big data is obtained from memory database
After measuring data, first the Volume data is cached to intermediate file, then existed again according to intermediate file
Volume data is presented on form, so, Volume data is presented by form, and it is arranged
During version, the EMS memory occupation of system can be reduced, realizes the presentation and operation of Volume data.
The secondary development provided in an embodiment of the present invention that typesetting function is realized based on form, and by described through two
The form of secondary exploitation is bound with memory database, needs to enter Volume data by form in user
When row access, analysis etc. operate, the form also supports the work(such as global real-time sequence, big data quantity presentation simultaneously
Energy.
Step 102:The original document is split as the multiple of different classification according to the type of large data files
Subdata file;
Wherein, described split the original document according to the type of large data files can be specifically:According to
The naming rule of the different type large data files is split;
Here, the original document is split as the more of different classification according to the type of large data files described
During individual sub- data file, methods described also includes:
Collection is to sql sentences corresponding to the original document fractured operation;
Parse the tables of data in the sql sentences and the field and field value in the tables of data;
Field and field value in the tables of data and the tables of data, automatic code generating, and it is right
The code of the generation is compiled, and generates dynamic link library (dll) file or executable program (exe) text
Part, perform the original document for splitting and including large data files.
Here, when carrying out fractured operation, if the quantity that request is split is more than prior limitation, inquire about user's
Historical data;
Corresponding classification is searched in classification based on the historical data after cluster.
Here, the classification after the cluster, can be that system carries out clustering processing to the big data of predetermined quantity
The classification obtained afterwards.
The cluster operation, the data object that can choose predetermined quantity perform cluster.For big data,
The data object that data scale selects representative quantity can be regarded.
Wherein, for one group of data object of each user, data object corresponding to a user can be with
Including one or more data, therefore, cluster can be performed to the data object including one or more data.
Here, the prior limitation, can be manually set, or set in itself by system.
In the case of the latter, prior limitation can be calculated according to certain rule.For example, it can lead to
Cross the outlier deleting madel based on density and determine the reserve quota.
The specific setting to prior limitation is described further below, is comprised the following steps:
S2321:Fractionation quantity in user history information is ranked up;
For example, it can be ranked up by order from big to small or from small to large, then the order difference after sorting
For:D1, d2 ..., dn, wherein, n is the integer more than 1.
S2322:The d of following formula will be metiPoint is judged as outlier;
|di-di-k|>C, i=k+1 ..., n (1)
In above-mentioned (1) formula, i represents i-th order, d1, d2 ..., dn be according to from big to small or
Order from small to large be ranked up after fractionation quantity, C is predetermined threshold value, and k is pre-determined distance.Pass through
Above formula calculate, if i-th order apart from the amount of money of its k order be more than predetermined threshold value when, diPoint is recognized
To be outlier.
S2323:Reject outlier;
S2324:Maximum in group after rejecting outlier is set as prior limitation.
Such as:The order amount of money after being ranked up according to order from small to large is respectively d1=100,
D2=110, d3=123, d4=195, d5=229, d6=1410, d7=2100.C is set to 300, k and is set to
3。
Then by above-mentioned (1) Shi Ke get:
| d4-d1 |=195-100=95<300;
| d5-d2 |=229-110=119<300;
| d6-d3 |=1410-123=1287>300;
| d7-d4 |=2100-195=1905>300;
Therefore, judge d6, d7 point for outlier.Above-mentioned outlier d6, d7 is rejected from group, then
Point in remaining group includes d1~d5;Because the maximum in d1~d5 is 229, then by the maximum
229 are set as prior limitation.
Here, the fractionation may include splitting the fractionation species in information, that is, it is probably numerous to split quantity
The quantity of certain class commodity in information.
The information that splits can include or not the fractionation information under different scenes for the fractionation information
With the situation of the sequence information under scene, correspondingly, the prior limitation can include pre- under different scenes
Fixed limit volume.The prior limitation under different scenes can be calculated using above-mentioned S2321~S2324.
Cluster in the embodiment of the present invention can be completed previously according to the big data of predetermined quantity, be received newly
In the case that user sends request of data, it is not necessary to re-start the big data including the new reception data
Cluster, on the contrary, only classification need to be corresponded to based on being searched in classification of the user's history data after cluster, from
And obtain dynamic amount according to predetermined mapping ruler.So, the expense of hardware resource can be saved.
Step 103:To the multiple subdata file according to the corresponding server of different classification distribution, and
The multiple subdata file is handled simultaneously on different server.
The specific implementation process of big data processing method of the embodiment of the present invention is described in detail below.
As shown in Fig. 2 in the embodiment of the present invention big data processing method specific implementation flow, it is including following
Step:
Step 201:Obtain the original document for including different type large data files;
Step 202:Judge whether the classification of large data files in original document is more than five classes, if being less than or waiting
In five classes, then step 203 is performed, if being more than five classes, step 207 is jumped to, terminates this handling process;
Step 203:Original document is split as multiple subdata files according to the different type of large data files;
Step 204:Statistics primary sources number is A, secondary sources number is B, the 3rd class data
Number is C, the 4th class data amount check is D, the 5th class data amount check is E, calculates and compiles according to equation below
Code value N;
N=A*101+B*102+C*103+D*104+E*105
(primary sources number A such as .doc class files) --- * 101;
(secondary sources number B such as .jpg class files) --- * 102;
(the 3rd class data amount check C such as .txt class files) --- * 103;
(the 4th class data amount check D such as .pdf class files) --- * 104;
(the 5th class data amount check E such as .exe class files) --- * 105;
Wherein, if data category is less than five classes, the number of corresponding data classification calculates by 0.
Step 205:Encoded radio N is conveyed to server;
Step 206:Server carries out data recombination to N;
That is, in step 103, processing of the server to multiple subdata files comprises the following steps:
Server carries out data recombination according to equation below:
N/101=A (only retains a position using the mode of rounding up);
N/102=B (only retains a position using the mode of rounding up);
N/103=C (only retains a position using the mode of rounding up);
N/104=D (only retains a position using the mode of rounding up);
N/105=E (only retains a position using the mode of rounding up);
Step 207:Terminate.
Such scheme utilizes single encoded radio N, conveys Volume data to server, avoids more item numbers
Conveyed jointly according to server, so as to the congestion brought to server channels and confusion.The embodiment of the present invention is led to
The control of concurrency policies is crossed, multiple servers can be disposed while large-data documents are split and handled,
The disposal ability of system is greatly improved, ensures that quickly and efficiently processing counts greatly system at the appointed time
According to amount data;Moreover, this split and handle file by file designation rule distribution different server
Concurrency policies, ensure only have a server to be split to original document, for every after fractionation
Individual sub- data file, also correspondingly there is a server to handle it, so as to avoid resource contention.
For example for, it is assumed that it is A that primary sources, which have " .doc class files " number, such as 1;
It is B that secondary sources, which have " .jpg class files " number, such as 6;3rd class data have " .txt classes
File " number is C, such as 8;It is D that 4th class data, which have " .pdf class files " number, such as 4
It is individual;It is E that 5th class data, which have " .exe class files " number, such as 5;
So, formula N=A*10 is utilized1+B*102+C*103+D*104+E*105, calculation code value N, i.e.,
N=1*101+6*102+8*103+4*104+5*105=548610;Encoded radio N=548610 is sent to server,
Server is handled as follows for encoded radio N afterwards:
548610/101=54861 (only retaining a position using the mode of rounding up), i.e. A=1;
548610/102=5486.1 (only retaining a position using the mode of rounding up), i.e. B=6;
548610/103=548.61 (only retaining a position using the mode of rounding up), i.e. C=8;
548610/104=54.861 (only retaining a position using the mode of rounding up), i.e. D=4;
548610/105=5.4861 (only retaining a position using the mode of rounding up), i.e. E=5.
To realize the above method, the embodiment of the present invention additionally provides a kind of big data processing system, such as Fig. 3 institutes
Show, the system includes acquiring unit 31, split cells 32, processing unit 33;Wherein,
Acquiring unit 31, for obtaining the original document for including different type large data files;
Split cells 32, for the original document to be split as into different classification according to the type of large data files
Multiple subdata files;
Processing unit 33, for sorting out the corresponding service of distribution according to different to the multiple subdata file
Device, and the multiple subdata file is handled simultaneously on different server.
Here, the acquiring unit 31 includes:
Form creating unit 311, for creating the form for realizing typesetting function through secondary development;
Incidence relation establishes unit 312, for establishing between the display logic and memory database of the form
Incidence relation;
First processing units 313, for identifying the operational order to the form, according to the operational order
And incidence relation, the original document for including different type large data files is obtained from the memory database,
And present in a tabular form;
Wherein, the memory database is used to store different types of large data files.
Wherein, the acquiring unit 31 also establishes unit 314 including index, for being built in the incidence relation
Vertical unit 312 is established after the incidence relation between the display logic and memory database of the form, according to
The line number of the form, the data in memory database are established with index, and read according to the index of the foundation
Take corresponding data in memory database.
When the original document is split as multiple subdata files of different classification by the split cells 32,
The system also includes:
Collecting unit 321, for gathering to sql sentences corresponding to the original document fractured operation;
Resolution unit 322, for parsing the tables of data in the sql sentences and the word in the tables of data
Section and field value;
Second processing unit 323, for the field and field in the tables of data and the tables of data
Value, automatic code generating, and the code of the generation is compiled, dll files or exe files are generated,
Perform the original document for splitting and including large data files.
Wherein, when carrying out fractured operation, if the quantity that request is split is more than prior limitation, inquire about user's
Historical data;
Corresponding classification is searched in classification based on the historical data after cluster.
In actual applications, the acquiring unit 31, split cells 32, processing unit 33 can by positioned at
On terminal server central processing unit (CPU, Central Processing Unit), microprocessor (MPU,
Micro Processor Unit), digital signal processor (DSP, Digital Signal Processor) or existing
Field programmable gate array (FPGA, Field Programmable Gate Array) etc. is realized.
The embodiment of the present invention obtains the original document for including different type large data files;By the original document
Multiple subdata files of different classification are split as according to the type of large data files;To the multiple subdata
File sorts out the corresponding server of distribution according to different, and simultaneously to the multiple subnumber on different server
Handled according to file.It so, it is possible quickly and efficiently to handle Volume data at the appointed time,
Alleviate the delay process of data.
In addition, the embodiment of the present invention, which is based on form, the operation such as can be consulted Volume data, analyzed,
The functions such as the overall situation is sorted in real time, big data is presented also are supported simultaneously.In addition, the embodiment of the present invention passes through
The field of sql sentences is analyzed, splits big data automatically, both ensure that the efficiency for splitting big data, is ensured again
The validity of big data;And based on user's history data, the corresponding classification of lookup in the classification after cluster,
So as to obtain dynamic amount according to predetermined mapping ruler, the expense of hardware resource is saved.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the protection model of the present invention
Enclose, all any modification, equivalent and improvement made within the spirit and principles of the invention etc. all should
Within protection scope of the present invention.
Claims (10)
1. a kind of big data processing method, it is characterised in that methods described includes:
Obtain the original document for including different type large data files;
The original document is split as to multiple subdatas text of different classification according to the type of large data files
Part;
The corresponding server of distribution is sorted out according to different to the multiple subdata file, and in different server
It is upper that the multiple subdata file is handled simultaneously.
2. according to the method for claim 1, it is characterised in that the acquisition includes the big number of different type
According to the original document of file, including:
Create the form for realizing typesetting function through secondary development;
The incidence relation established between the display logic and memory database of the form;
Identify to the operational order of the form, according to the operational order and incidence relation, from described interior
The original document for including different type large data files is obtained in deposit data storehouse, and is presented in a tabular form;
Wherein, the memory database is used to store different types of large data files.
3. according to the method for claim 2, it is characterised in that in the display for establishing the form
After incidence relation between logical AND memory database, methods described also includes:
According to the line number of the form, index is established to the data in memory database, and according to the foundation
Index read memory database in corresponding data.
4. according to the method described in claim 1,2 or 3, it is characterised in that it is described will be described original
When file declustering is multiple subdata files of different classification, methods described also includes:
Collection is to sql sentences corresponding to the original document fractured operation;
Parse the tables of data in the sql sentences and the field and field value in the tables of data;
Field and field value in the tables of data and the tables of data, automatic code generating, and it is right
The code of the generation is compiled, and is generated dynamic link library file or executable program file, is performed fractionation
Original document comprising large data files.
5. according to the method for claim 1, it is characterised in that when carrying out fractured operation, if request
The quantity of fractionation is more than prior limitation, inquires about the historical data of user;
Corresponding classification is searched in classification based on the historical data after cluster.
6. a kind of big data processing system, it is characterised in that the system includes:Acquiring unit, split list
Member and processing unit;Wherein,
The acquiring unit, for obtaining the original document for including different type large data files;
The split cells, for the original document to be split as not reaching the same goal according to the type of large data files
Multiple subdata files of class;
The processing unit, for sorting out the corresponding service of distribution according to different to the multiple subdata file
Device, and the multiple subdata file is handled simultaneously on different server.
7. system according to claim 6, it is characterised in that the acquiring unit includes:
Form creating unit, for creating the form for realizing typesetting function through secondary development;
Incidence relation establishes unit, the pass between display logic and memory database for establishing the form
Connection relation;
First processing units, for identifying to the operational order of the form, according to the operational order and
Incidence relation, the original document for including different type large data files is obtained from the memory database, and
Present in a tabular form;
Wherein, the memory database is used to store different types of large data files.
8. system according to claim 7, it is characterised in that the acquiring unit is also built including index
Vertical unit, for establishing the display logic and memory database that unit establishes the form in the incidence relation
Between incidence relation after, according to the line number of the form, index is established to the data in memory database,
And corresponding data in memory database are read according to the index of the foundation.
9. according to the system described in claim 6,7 or 8, it is characterised in that will in the split cells
When the original document is split as multiple subdata files of different classification, the system also includes:
Collecting unit, for gathering to sql sentences corresponding to the original document fractured operation;
Resolution unit, for parsing the tables of data in the sql sentences and the field in the tables of data
And field value;
Second processing unit, for the field and field value in the tables of data and the tables of data,
Automatic code generating, and the code of the generation is compiled, generate dynamic link library file or executable
Program file, perform the original document for splitting and including large data files.
10. system according to claim 6, it is characterised in that when carrying out fractured operation, if please
Ask the quantity of fractionation to be more than prior limitation, inquire about the historical data of user;
Corresponding classification is searched in classification based on the historical data after cluster.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610294824.3A CN107346312A (en) | 2016-05-05 | 2016-05-05 | A kind of big data processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610294824.3A CN107346312A (en) | 2016-05-05 | 2016-05-05 | A kind of big data processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107346312A true CN107346312A (en) | 2017-11-14 |
Family
ID=60254272
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610294824.3A Pending CN107346312A (en) | 2016-05-05 | 2016-05-05 | A kind of big data processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107346312A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107918662A (en) * | 2017-11-22 | 2018-04-17 | 泰康保险集团股份有限公司 | Document method for splitting and device |
CN108509478A (en) * | 2017-11-23 | 2018-09-07 | 平安科技(深圳)有限公司 | Fractionation call method, electronic device and the storage medium of regulation engine file |
CN109856230A (en) * | 2019-01-30 | 2019-06-07 | 山东博戎伝创信息科技有限公司 | Organic compound residue analysis method and device and intelligent monitoring system thereof |
CN110580246A (en) * | 2019-07-30 | 2019-12-17 | 平安科技(深圳)有限公司 | Method and device for migrating data, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060179153A1 (en) * | 2004-03-22 | 2006-08-10 | Nam-Yul Lee | Streaming based contents distribution network system and methods for splitting, merging and retrieving files |
CN101582064A (en) * | 2008-05-15 | 2009-11-18 | 阿里巴巴集团控股有限公司 | Method and system for processing enormous data |
CN102054043A (en) * | 2010-12-30 | 2011-05-11 | 畅捷通软件有限公司 | Method and device for generating big data |
CN104636372A (en) * | 2013-11-11 | 2015-05-20 | 中兴通讯股份有限公司 | Method and device for achieving large data volume processing based on form |
CN104951446A (en) * | 2014-03-25 | 2015-09-30 | 阿里巴巴集团控股有限公司 | Big data processing method and platform |
-
2016
- 2016-05-05 CN CN201610294824.3A patent/CN107346312A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060179153A1 (en) * | 2004-03-22 | 2006-08-10 | Nam-Yul Lee | Streaming based contents distribution network system and methods for splitting, merging and retrieving files |
CN101582064A (en) * | 2008-05-15 | 2009-11-18 | 阿里巴巴集团控股有限公司 | Method and system for processing enormous data |
CN102054043A (en) * | 2010-12-30 | 2011-05-11 | 畅捷通软件有限公司 | Method and device for generating big data |
CN104636372A (en) * | 2013-11-11 | 2015-05-20 | 中兴通讯股份有限公司 | Method and device for achieving large data volume processing based on form |
CN104951446A (en) * | 2014-03-25 | 2015-09-30 | 阿里巴巴集团控股有限公司 | Big data processing method and platform |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107918662A (en) * | 2017-11-22 | 2018-04-17 | 泰康保险集团股份有限公司 | Document method for splitting and device |
CN107918662B (en) * | 2017-11-22 | 2020-11-10 | 泰康保险集团股份有限公司 | Document splitting method and device |
CN108509478A (en) * | 2017-11-23 | 2018-09-07 | 平安科技(深圳)有限公司 | Fractionation call method, electronic device and the storage medium of regulation engine file |
CN109856230A (en) * | 2019-01-30 | 2019-06-07 | 山东博戎伝创信息科技有限公司 | Organic compound residue analysis method and device and intelligent monitoring system thereof |
CN109856230B (en) * | 2019-01-30 | 2021-09-21 | 山东博戎伝创信息科技有限公司 | Organic compound residue analysis method and device and intelligent monitoring system thereof |
CN110580246A (en) * | 2019-07-30 | 2019-12-17 | 平安科技(深圳)有限公司 | Method and device for migrating data, computer equipment and storage medium |
CN110580246B (en) * | 2019-07-30 | 2023-10-20 | 平安科技(深圳)有限公司 | Method, device, computer equipment and storage medium for migrating data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107766371B (en) | Text information classification method and device | |
CN103778148B (en) | Life cycle management method and equipment for data file of Hadoop distributed file system | |
CN104364781B (en) | System and method for calculating classification ratio | |
WO2017097231A1 (en) | Topic processing method and device | |
CN107346312A (en) | A kind of big data processing method and system | |
CN103279478A (en) | Method for extracting features based on distributed mutual information documents | |
CN109684616A (en) | Dynamic statement formula assembles the method and system made a report on | |
EP3617910A1 (en) | Method and apparatus for displaying textual information | |
CN111191111A (en) | Content recommendation method, device and storage medium | |
CN110737630A (en) | Method and device for processing electronic archive file, computer equipment and storage medium | |
CN110069573A (en) | Product data integration method, apparatus, computer equipment and storage medium | |
KR102107474B1 (en) | Social issue deduction system and method using crawling | |
CN110275938B (en) | Knowledge extraction method and system based on unstructured document | |
US20200019601A1 (en) | Method & system for labeling and organizing data for summarizing and referencing content via a communication network | |
KR20170043365A (en) | Important precedents extraction and sorting method using Big Data | |
CN106528566A (en) | Log file output method, server and client | |
Khemani et al. | A review on reddit news headlines with nltk tool | |
CN110874366A (en) | Data processing and query method and device | |
CN117171650A (en) | Document data processing method, system and medium based on web crawler technology | |
CN111580991A (en) | Computer data processing method and system | |
US10163005B2 (en) | Document structure analysis device with image processing | |
CN116595106A (en) | User grouping method, device and storage medium | |
CN111026972A (en) | Subscription data pushing method, device, equipment and storage medium in Internet of things | |
CN103186672B (en) | file ordering method and device thereof | |
CN111159213A (en) | Data query method, device, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171114 |