CN107346312A - A kind of big data processing method and system - Google Patents

A kind of big data processing method and system Download PDF

Info

Publication number
CN107346312A
CN107346312A CN201610294824.3A CN201610294824A CN107346312A CN 107346312 A CN107346312 A CN 107346312A CN 201610294824 A CN201610294824 A CN 201610294824A CN 107346312 A CN107346312 A CN 107346312A
Authority
CN
China
Prior art keywords
data
original document
files
file
memory database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610294824.3A
Other languages
Chinese (zh)
Inventor
岑春祥
王升元
苏文平
郄威
孟利青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Inner Mongolia Co Ltd
Original Assignee
China Mobile Group Inner Mongolia Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Inner Mongolia Co Ltd filed Critical China Mobile Group Inner Mongolia Co Ltd
Priority to CN201610294824.3A priority Critical patent/CN107346312A/en
Publication of CN107346312A publication Critical patent/CN107346312A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of big data processing method, including:Obtain the original document for including different type large data files;The original document is split as to multiple subdata files of different classification according to the type of large data files;The corresponding server of distribution is sorted out according to different to the multiple subdata file, and the multiple subdata file handled simultaneously on different server.The present invention further simultaneously discloses a kind of big data processing system.

Description

A kind of big data processing method and system
Technical field
The present invention relates to technical field of information processing, more particularly to a kind of big data processing method and system.
Background technology
In many application scenarios, often there is following data handling procedure:Sender is by some different types Data file be stored in certain form in a file, then by this document folder be compressed after send out Recipient is given, is parsed after recipient receives compressed file, then to the content in the compressed file, And logical process.
In above-mentioned data handling procedure, if data file is not very big, and recipient is to processing time When again without very high requirement, then single server or single thread can be used to be handled.In this case, System still can normal operation, simply recipient handle time of these file datas may be longer.But In actual applications, people are frequently encountered the data processing needs of big data quantity, such as:School eduaction people Member needs to report student data, the processing of large-scale website daily record and two large scale systems to Bureau of Education step by step Between data syn-chronization etc..At this moment, it is necessary to which the file data of transmission is very big or quantity of documents is a lot, and connect Debit has very high requirement to processing time again, such as:Recipient requires the number of files that sender sends According to must be disposed in 1 minute (or in the shorter time).Now, if only relying on separate unit clothes The processing system of business device or single thread cannot meet the demand.
In addition, under many circumstances, the file data of sender to recipient is that timing transmits, such as often 5 minutes transmission once, and recipient can tolerate the maximum delay of data transfer be it is conditional, now, If recipient handles these endless data in predetermined time interval, vicious circle will be formed so that Data in the last cycle are also untreated to be finished, and new data are sent again, and the data of such recipient are prolonged When will be more and more, finally there is the phenomenon of system crash.
To solve the above problems, big data is entered using K averages (K-MEANS) algorithm in the prior art Row clustering processing, but the processing procedure is usually directed to the situation that data bulk n is fixed value, and for n For the situation of changing value, in processing procedure, n often changes once, such as n value increases by 1, corresponds to Need data to be processed will increase a new data record, then need to re-execute the complete of whole algorithm Process.Thus considerably increase the operating process of whole system, it is more likely that handle at the appointed time not Complete need data to be processed, so as to bring very big delay to recipient.
In summary, using prior art, the operating process for how to reduce system as far as possible, in regulation It is interior to have handled Volume data, alleviate the delay process of data, there is no effective solution.
The content of the invention
In view of this, the embodiment of the present invention it is expected to provide a kind of big data processing method and system, can be to big Data volume data are fast and effectively handled, to solve that big data quantity can not have been handled at the appointed time Data and caused by handle delay, and the problem of system crash.
To reach above-mentioned purpose, what the technical scheme of the embodiment of the present invention was realized in:
The embodiment of the present invention provides a kind of big data processing method, and methods described includes:
Obtain the original document for including different type large data files;
The original document is split as to multiple subdatas text of different classification according to the type of large data files Part;
The corresponding server of distribution is sorted out according to different to the multiple subdata file, and in different server It is upper that the multiple subdata file is handled simultaneously.
In such scheme, the acquisition includes the original document of different type large data files, including:
Create the form for realizing typesetting function through secondary development;
The incidence relation established between the display logic and memory database of the form;
Identify to the operational order of the form, according to the operational order and incidence relation, from described interior The original document for including different type large data files is obtained in deposit data storehouse, and is presented in a tabular form;
Wherein, the memory database is used to store different types of large data files.
In such scheme, associating between the display logic for establishing the form and memory database After system, methods described also includes:
According to the line number of the form, index is established to the data in memory database, and according to the foundation Index read memory database in corresponding data.
In such scheme, during multiple subdata files that the original document is split as to different classification described, Methods described also includes:
Collection is to sql sentences corresponding to the original document fractured operation;
Parse the tables of data in the sql sentences and the field and field value in the tables of data;
Field and field value in the tables of data and the tables of data, automatic code generating, and it is right The code of the generation is compiled, and is generated dynamic link library file or executable program file, is performed fractionation Original document comprising large data files.
In such scheme, when carrying out fractured operation, if the quantity that request is split is more than prior limitation, inquire about The historical data of user;
Corresponding classification is searched in classification based on the historical data after cluster.
The embodiment of the present invention also provides a kind of big data processing system, and the system includes:Acquiring unit, tear open Subdivision and processing unit;Wherein,
The acquiring unit, for obtaining the original document for including different type large data files;
The split cells, for the original document to be split as not reaching the same goal according to the type of large data files Multiple subdata files of class;
The processing unit, for sorting out the corresponding service of distribution according to different to the multiple subdata file Device, and the multiple subdata file is handled simultaneously on different server.
In such scheme, the acquiring unit includes:
Form creating unit, for creating the form for realizing typesetting function through secondary development;
Incidence relation establishes unit, the pass between display logic and memory database for establishing the form Connection relation;
First processing units, for identifying to the operational order of the form, according to the operational order and Incidence relation, the original document for including different type large data files is obtained from the memory database, and Present in a tabular form;
Wherein, the memory database is used to store different types of large data files.
In such scheme, the acquiring unit also establishes unit including index, for being built in the incidence relation Vertical unit is established after the incidence relation between the display logic and memory database of the form, according to described The line number of form, the data in memory database are established with index, and according in the reading of the index of the foundation Corresponding data in deposit data storehouse.
In such scheme, the original document is split as to multiple subnumbers of different classification in the split cells During according to file, the system also includes:
Collecting unit, for gathering to sql sentences corresponding to the original document fractured operation;
Resolution unit, for parsing the tables of data in the sql sentences and the field in the tables of data And field value;
Second processing unit, for the field and field value in the tables of data and the tables of data, Automatic code generating, and the code of the generation is compiled, generate dynamic link library file or executable Program file, perform the original document for splitting and including large data files.
In such scheme, when carrying out fractured operation, if the quantity that request is split is more than prior limitation, inquire about The historical data of user;
Corresponding classification is searched in classification based on the historical data after cluster.
The big data processing method and system that the embodiment of the present invention is provided, acquisition include different type big data The original document of file;The original document is split as the more of different classification according to the type of large data files Individual sub- data file;To the multiple subdata file according to the corresponding server of different classification distribution, and The multiple subdata file is handled simultaneously on different server.It so, it is possible at the appointed time Volume data is quickly and efficiently handled, alleviates the delay process of data.
In addition, the embodiment of the present invention, which is based on form, the operation such as can be consulted Volume data, analyzed, The functions such as the overall situation is sorted in real time, big data is presented also are supported simultaneously.In addition, the embodiment of the present invention passes through The field of sql sentences is analyzed, splits big data automatically, both ensure that the efficiency for splitting big data, is ensured again The validity of big data;And based on user's history data, the corresponding classification of lookup in the classification after cluster, So as to obtain dynamic amount according to predetermined mapping ruler, the expense of hardware resource is saved.
Brief description of the drawings
Fig. 1 is the implementation process schematic diagram of big data processing method of the embodiment of the present invention;
Fig. 2 is the specific implementation schematic flow sheet of big data processing method of the embodiment of the present invention;
Fig. 3 is the structural representation of big data processing system of the embodiment of the present invention.
Embodiment
The characteristics of in order to more fully hereinafter understand the embodiment of the present invention and technology contents, below in conjunction with the accompanying drawings Realization to the embodiment of the present invention is described in detail, and appended accompanying drawing purposes of discussion only for reference, is not used for Limit the present invention.
As shown in figure 1, in the embodiment of the present invention big data processing method implementation process, comprise the following steps:
Step 101:Obtain the original document for including different type large data files;
This step 101 specifically includes:
S1011:Create the form for realizing typesetting function through secondary development;
Here, the typesetting function includes but is not limited to:The overall situation is sorted in real time, line number is shown, row freeze, Column heading enter a new line automatically display, blank column filtering etc. function.
Wherein, the row head ranking function of the form through secondary development is configured to the knot with memory database The Order by operations that fruit collection is ranked up are bound, and global sequence work(is carried out by clicking on gauge outfit to realize Energy.
Here, specifically how secondary development is carried out to form and belongs to prior art, will not be repeated here.
S1012:The incidence relation established between the display logic and memory database of the form;
Here, it is described establish the incidence relation between the display logic and memory database of the form after, Methods described also includes:
According to the line number of the form, index is established to the data in memory database, and according to the foundation Index read memory database in corresponding data.
S1013:Identify to the operational order of the form, according to the operational order and incidence relation, from The original document for including different type large data files is obtained in the memory database, and is in a tabular form It is existing.
Wherein, the memory database is used to store different types of large data files.
Here, the operational order of the identification is the operational order of the various typesettings for form of user's input.
According to the operational order and incidence relation to the form, big data is obtained from memory database After measuring data, first the Volume data is cached to intermediate file, then existed again according to intermediate file Volume data is presented on form, so, Volume data is presented by form, and it is arranged During version, the EMS memory occupation of system can be reduced, realizes the presentation and operation of Volume data.
The secondary development provided in an embodiment of the present invention that typesetting function is realized based on form, and by described through two The form of secondary exploitation is bound with memory database, needs to enter Volume data by form in user When row access, analysis etc. operate, the form also supports the work(such as global real-time sequence, big data quantity presentation simultaneously Energy.
Step 102:The original document is split as the multiple of different classification according to the type of large data files Subdata file;
Wherein, described split the original document according to the type of large data files can be specifically:According to The naming rule of the different type large data files is split;
Here, the original document is split as the more of different classification according to the type of large data files described During individual sub- data file, methods described also includes:
Collection is to sql sentences corresponding to the original document fractured operation;
Parse the tables of data in the sql sentences and the field and field value in the tables of data;
Field and field value in the tables of data and the tables of data, automatic code generating, and it is right The code of the generation is compiled, and generates dynamic link library (dll) file or executable program (exe) text Part, perform the original document for splitting and including large data files.
Here, when carrying out fractured operation, if the quantity that request is split is more than prior limitation, inquire about user's Historical data;
Corresponding classification is searched in classification based on the historical data after cluster.
Here, the classification after the cluster, can be that system carries out clustering processing to the big data of predetermined quantity The classification obtained afterwards.
The cluster operation, the data object that can choose predetermined quantity perform cluster.For big data, The data object that data scale selects representative quantity can be regarded.
Wherein, for one group of data object of each user, data object corresponding to a user can be with Including one or more data, therefore, cluster can be performed to the data object including one or more data.
Here, the prior limitation, can be manually set, or set in itself by system.
In the case of the latter, prior limitation can be calculated according to certain rule.For example, it can lead to Cross the outlier deleting madel based on density and determine the reserve quota.
The specific setting to prior limitation is described further below, is comprised the following steps:
S2321:Fractionation quantity in user history information is ranked up;
For example, it can be ranked up by order from big to small or from small to large, then the order difference after sorting For:D1, d2 ..., dn, wherein, n is the integer more than 1.
S2322:The d of following formula will be metiPoint is judged as outlier;
|di-di-k|>C, i=k+1 ..., n (1)
In above-mentioned (1) formula, i represents i-th order, d1, d2 ..., dn be according to from big to small or Order from small to large be ranked up after fractionation quantity, C is predetermined threshold value, and k is pre-determined distance.Pass through Above formula calculate, if i-th order apart from the amount of money of its k order be more than predetermined threshold value when, diPoint is recognized To be outlier.
S2323:Reject outlier;
S2324:Maximum in group after rejecting outlier is set as prior limitation.
Such as:The order amount of money after being ranked up according to order from small to large is respectively d1=100, D2=110, d3=123, d4=195, d5=229, d6=1410, d7=2100.C is set to 300, k and is set to 3。
Then by above-mentioned (1) Shi Ke get:
| d4-d1 |=195-100=95<300;
| d5-d2 |=229-110=119<300;
| d6-d3 |=1410-123=1287>300;
| d7-d4 |=2100-195=1905>300;
Therefore, judge d6, d7 point for outlier.Above-mentioned outlier d6, d7 is rejected from group, then Point in remaining group includes d1~d5;Because the maximum in d1~d5 is 229, then by the maximum 229 are set as prior limitation.
Here, the fractionation may include splitting the fractionation species in information, that is, it is probably numerous to split quantity The quantity of certain class commodity in information.
The information that splits can include or not the fractionation information under different scenes for the fractionation information With the situation of the sequence information under scene, correspondingly, the prior limitation can include pre- under different scenes Fixed limit volume.The prior limitation under different scenes can be calculated using above-mentioned S2321~S2324.
Cluster in the embodiment of the present invention can be completed previously according to the big data of predetermined quantity, be received newly In the case that user sends request of data, it is not necessary to re-start the big data including the new reception data Cluster, on the contrary, only classification need to be corresponded to based on being searched in classification of the user's history data after cluster, from And obtain dynamic amount according to predetermined mapping ruler.So, the expense of hardware resource can be saved.
Step 103:To the multiple subdata file according to the corresponding server of different classification distribution, and The multiple subdata file is handled simultaneously on different server.
The specific implementation process of big data processing method of the embodiment of the present invention is described in detail below.
As shown in Fig. 2 in the embodiment of the present invention big data processing method specific implementation flow, it is including following Step:
Step 201:Obtain the original document for including different type large data files;
Step 202:Judge whether the classification of large data files in original document is more than five classes, if being less than or waiting In five classes, then step 203 is performed, if being more than five classes, step 207 is jumped to, terminates this handling process;
Step 203:Original document is split as multiple subdata files according to the different type of large data files;
Step 204:Statistics primary sources number is A, secondary sources number is B, the 3rd class data Number is C, the 4th class data amount check is D, the 5th class data amount check is E, calculates and compiles according to equation below Code value N;
N=A*101+B*102+C*103+D*104+E*105
(primary sources number A such as .doc class files) --- * 101
(secondary sources number B such as .jpg class files) --- * 102
(the 3rd class data amount check C such as .txt class files) --- * 103
(the 4th class data amount check D such as .pdf class files) --- * 104
(the 5th class data amount check E such as .exe class files) --- * 105
Wherein, if data category is less than five classes, the number of corresponding data classification calculates by 0.
Step 205:Encoded radio N is conveyed to server;
Step 206:Server carries out data recombination to N;
That is, in step 103, processing of the server to multiple subdata files comprises the following steps: Server carries out data recombination according to equation below:
N/101=A (only retains a position using the mode of rounding up);
N/102=B (only retains a position using the mode of rounding up);
N/103=C (only retains a position using the mode of rounding up);
N/104=D (only retains a position using the mode of rounding up);
N/105=E (only retains a position using the mode of rounding up);
Step 207:Terminate.
Such scheme utilizes single encoded radio N, conveys Volume data to server, avoids more item numbers Conveyed jointly according to server, so as to the congestion brought to server channels and confusion.The embodiment of the present invention is led to The control of concurrency policies is crossed, multiple servers can be disposed while large-data documents are split and handled, The disposal ability of system is greatly improved, ensures that quickly and efficiently processing counts greatly system at the appointed time According to amount data;Moreover, this split and handle file by file designation rule distribution different server Concurrency policies, ensure only have a server to be split to original document, for every after fractionation Individual sub- data file, also correspondingly there is a server to handle it, so as to avoid resource contention.
For example for, it is assumed that it is A that primary sources, which have " .doc class files " number, such as 1; It is B that secondary sources, which have " .jpg class files " number, such as 6;3rd class data have " .txt classes File " number is C, such as 8;It is D that 4th class data, which have " .pdf class files " number, such as 4 It is individual;It is E that 5th class data, which have " .exe class files " number, such as 5;
So, formula N=A*10 is utilized1+B*102+C*103+D*104+E*105, calculation code value N, i.e., N=1*101+6*102+8*103+4*104+5*105=548610;Encoded radio N=548610 is sent to server, Server is handled as follows for encoded radio N afterwards:
548610/101=54861 (only retaining a position using the mode of rounding up), i.e. A=1;
548610/102=5486.1 (only retaining a position using the mode of rounding up), i.e. B=6;
548610/103=548.61 (only retaining a position using the mode of rounding up), i.e. C=8;
548610/104=54.861 (only retaining a position using the mode of rounding up), i.e. D=4;
548610/105=5.4861 (only retaining a position using the mode of rounding up), i.e. E=5.
To realize the above method, the embodiment of the present invention additionally provides a kind of big data processing system, such as Fig. 3 institutes Show, the system includes acquiring unit 31, split cells 32, processing unit 33;Wherein,
Acquiring unit 31, for obtaining the original document for including different type large data files;
Split cells 32, for the original document to be split as into different classification according to the type of large data files Multiple subdata files;
Processing unit 33, for sorting out the corresponding service of distribution according to different to the multiple subdata file Device, and the multiple subdata file is handled simultaneously on different server.
Here, the acquiring unit 31 includes:
Form creating unit 311, for creating the form for realizing typesetting function through secondary development;
Incidence relation establishes unit 312, for establishing between the display logic and memory database of the form Incidence relation;
First processing units 313, for identifying the operational order to the form, according to the operational order And incidence relation, the original document for including different type large data files is obtained from the memory database, And present in a tabular form;
Wherein, the memory database is used to store different types of large data files.
Wherein, the acquiring unit 31 also establishes unit 314 including index, for being built in the incidence relation Vertical unit 312 is established after the incidence relation between the display logic and memory database of the form, according to The line number of the form, the data in memory database are established with index, and read according to the index of the foundation Take corresponding data in memory database.
When the original document is split as multiple subdata files of different classification by the split cells 32, The system also includes:
Collecting unit 321, for gathering to sql sentences corresponding to the original document fractured operation;
Resolution unit 322, for parsing the tables of data in the sql sentences and the word in the tables of data Section and field value;
Second processing unit 323, for the field and field in the tables of data and the tables of data Value, automatic code generating, and the code of the generation is compiled, dll files or exe files are generated, Perform the original document for splitting and including large data files.
Wherein, when carrying out fractured operation, if the quantity that request is split is more than prior limitation, inquire about user's Historical data;
Corresponding classification is searched in classification based on the historical data after cluster.
In actual applications, the acquiring unit 31, split cells 32, processing unit 33 can by positioned at On terminal server central processing unit (CPU, Central Processing Unit), microprocessor (MPU, Micro Processor Unit), digital signal processor (DSP, Digital Signal Processor) or existing Field programmable gate array (FPGA, Field Programmable Gate Array) etc. is realized.
The embodiment of the present invention obtains the original document for including different type large data files;By the original document Multiple subdata files of different classification are split as according to the type of large data files;To the multiple subdata File sorts out the corresponding server of distribution according to different, and simultaneously to the multiple subnumber on different server Handled according to file.It so, it is possible quickly and efficiently to handle Volume data at the appointed time, Alleviate the delay process of data.
In addition, the embodiment of the present invention, which is based on form, the operation such as can be consulted Volume data, analyzed, The functions such as the overall situation is sorted in real time, big data is presented also are supported simultaneously.In addition, the embodiment of the present invention passes through The field of sql sentences is analyzed, splits big data automatically, both ensure that the efficiency for splitting big data, is ensured again The validity of big data;And based on user's history data, the corresponding classification of lookup in the classification after cluster, So as to obtain dynamic amount according to predetermined mapping ruler, the expense of hardware resource is saved.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the protection model of the present invention Enclose, all any modification, equivalent and improvement made within the spirit and principles of the invention etc. all should Within protection scope of the present invention.

Claims (10)

1. a kind of big data processing method, it is characterised in that methods described includes:
Obtain the original document for including different type large data files;
The original document is split as to multiple subdatas text of different classification according to the type of large data files Part;
The corresponding server of distribution is sorted out according to different to the multiple subdata file, and in different server It is upper that the multiple subdata file is handled simultaneously.
2. according to the method for claim 1, it is characterised in that the acquisition includes the big number of different type According to the original document of file, including:
Create the form for realizing typesetting function through secondary development;
The incidence relation established between the display logic and memory database of the form;
Identify to the operational order of the form, according to the operational order and incidence relation, from described interior The original document for including different type large data files is obtained in deposit data storehouse, and is presented in a tabular form;
Wherein, the memory database is used to store different types of large data files.
3. according to the method for claim 2, it is characterised in that in the display for establishing the form After incidence relation between logical AND memory database, methods described also includes:
According to the line number of the form, index is established to the data in memory database, and according to the foundation Index read memory database in corresponding data.
4. according to the method described in claim 1,2 or 3, it is characterised in that it is described will be described original When file declustering is multiple subdata files of different classification, methods described also includes:
Collection is to sql sentences corresponding to the original document fractured operation;
Parse the tables of data in the sql sentences and the field and field value in the tables of data;
Field and field value in the tables of data and the tables of data, automatic code generating, and it is right The code of the generation is compiled, and is generated dynamic link library file or executable program file, is performed fractionation Original document comprising large data files.
5. according to the method for claim 1, it is characterised in that when carrying out fractured operation, if request The quantity of fractionation is more than prior limitation, inquires about the historical data of user;
Corresponding classification is searched in classification based on the historical data after cluster.
6. a kind of big data processing system, it is characterised in that the system includes:Acquiring unit, split list Member and processing unit;Wherein,
The acquiring unit, for obtaining the original document for including different type large data files;
The split cells, for the original document to be split as not reaching the same goal according to the type of large data files Multiple subdata files of class;
The processing unit, for sorting out the corresponding service of distribution according to different to the multiple subdata file Device, and the multiple subdata file is handled simultaneously on different server.
7. system according to claim 6, it is characterised in that the acquiring unit includes:
Form creating unit, for creating the form for realizing typesetting function through secondary development;
Incidence relation establishes unit, the pass between display logic and memory database for establishing the form Connection relation;
First processing units, for identifying to the operational order of the form, according to the operational order and Incidence relation, the original document for including different type large data files is obtained from the memory database, and Present in a tabular form;
Wherein, the memory database is used to store different types of large data files.
8. system according to claim 7, it is characterised in that the acquiring unit is also built including index Vertical unit, for establishing the display logic and memory database that unit establishes the form in the incidence relation Between incidence relation after, according to the line number of the form, index is established to the data in memory database, And corresponding data in memory database are read according to the index of the foundation.
9. according to the system described in claim 6,7 or 8, it is characterised in that will in the split cells When the original document is split as multiple subdata files of different classification, the system also includes:
Collecting unit, for gathering to sql sentences corresponding to the original document fractured operation;
Resolution unit, for parsing the tables of data in the sql sentences and the field in the tables of data And field value;
Second processing unit, for the field and field value in the tables of data and the tables of data, Automatic code generating, and the code of the generation is compiled, generate dynamic link library file or executable Program file, perform the original document for splitting and including large data files.
10. system according to claim 6, it is characterised in that when carrying out fractured operation, if please Ask the quantity of fractionation to be more than prior limitation, inquire about the historical data of user;
Corresponding classification is searched in classification based on the historical data after cluster.
CN201610294824.3A 2016-05-05 2016-05-05 A kind of big data processing method and system Pending CN107346312A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610294824.3A CN107346312A (en) 2016-05-05 2016-05-05 A kind of big data processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610294824.3A CN107346312A (en) 2016-05-05 2016-05-05 A kind of big data processing method and system

Publications (1)

Publication Number Publication Date
CN107346312A true CN107346312A (en) 2017-11-14

Family

ID=60254272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610294824.3A Pending CN107346312A (en) 2016-05-05 2016-05-05 A kind of big data processing method and system

Country Status (1)

Country Link
CN (1) CN107346312A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107918662A (en) * 2017-11-22 2018-04-17 泰康保险集团股份有限公司 Document method for splitting and device
CN108509478A (en) * 2017-11-23 2018-09-07 平安科技(深圳)有限公司 Fractionation call method, electronic device and the storage medium of regulation engine file
CN109856230A (en) * 2019-01-30 2019-06-07 山东博戎伝创信息科技有限公司 Organic compound residue analysis method and device and intelligent monitoring system thereof
CN110580246A (en) * 2019-07-30 2019-12-17 平安科技(深圳)有限公司 Method and device for migrating data, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060179153A1 (en) * 2004-03-22 2006-08-10 Nam-Yul Lee Streaming based contents distribution network system and methods for splitting, merging and retrieving files
CN101582064A (en) * 2008-05-15 2009-11-18 阿里巴巴集团控股有限公司 Method and system for processing enormous data
CN102054043A (en) * 2010-12-30 2011-05-11 畅捷通软件有限公司 Method and device for generating big data
CN104636372A (en) * 2013-11-11 2015-05-20 中兴通讯股份有限公司 Method and device for achieving large data volume processing based on form
CN104951446A (en) * 2014-03-25 2015-09-30 阿里巴巴集团控股有限公司 Big data processing method and platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060179153A1 (en) * 2004-03-22 2006-08-10 Nam-Yul Lee Streaming based contents distribution network system and methods for splitting, merging and retrieving files
CN101582064A (en) * 2008-05-15 2009-11-18 阿里巴巴集团控股有限公司 Method and system for processing enormous data
CN102054043A (en) * 2010-12-30 2011-05-11 畅捷通软件有限公司 Method and device for generating big data
CN104636372A (en) * 2013-11-11 2015-05-20 中兴通讯股份有限公司 Method and device for achieving large data volume processing based on form
CN104951446A (en) * 2014-03-25 2015-09-30 阿里巴巴集团控股有限公司 Big data processing method and platform

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107918662A (en) * 2017-11-22 2018-04-17 泰康保险集团股份有限公司 Document method for splitting and device
CN107918662B (en) * 2017-11-22 2020-11-10 泰康保险集团股份有限公司 Document splitting method and device
CN108509478A (en) * 2017-11-23 2018-09-07 平安科技(深圳)有限公司 Fractionation call method, electronic device and the storage medium of regulation engine file
CN109856230A (en) * 2019-01-30 2019-06-07 山东博戎伝创信息科技有限公司 Organic compound residue analysis method and device and intelligent monitoring system thereof
CN109856230B (en) * 2019-01-30 2021-09-21 山东博戎伝创信息科技有限公司 Organic compound residue analysis method and device and intelligent monitoring system thereof
CN110580246A (en) * 2019-07-30 2019-12-17 平安科技(深圳)有限公司 Method and device for migrating data, computer equipment and storage medium
CN110580246B (en) * 2019-07-30 2023-10-20 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for migrating data

Similar Documents

Publication Publication Date Title
CN107766371B (en) Text information classification method and device
CN103778148B (en) Life cycle management method and equipment for data file of Hadoop distributed file system
CN104364781B (en) System and method for calculating classification ratio
WO2017097231A1 (en) Topic processing method and device
CN107346312A (en) A kind of big data processing method and system
CN103279478A (en) Method for extracting features based on distributed mutual information documents
CN109684616A (en) Dynamic statement formula assembles the method and system made a report on
EP3617910A1 (en) Method and apparatus for displaying textual information
CN111191111A (en) Content recommendation method, device and storage medium
CN110737630A (en) Method and device for processing electronic archive file, computer equipment and storage medium
CN110069573A (en) Product data integration method, apparatus, computer equipment and storage medium
KR102107474B1 (en) Social issue deduction system and method using crawling
CN110275938B (en) Knowledge extraction method and system based on unstructured document
US20200019601A1 (en) Method &amp; system for labeling and organizing data for summarizing and referencing content via a communication network
KR20170043365A (en) Important precedents extraction and sorting method using Big Data
CN106528566A (en) Log file output method, server and client
Khemani et al. A review on reddit news headlines with nltk tool
CN110874366A (en) Data processing and query method and device
CN117171650A (en) Document data processing method, system and medium based on web crawler technology
CN111580991A (en) Computer data processing method and system
US10163005B2 (en) Document structure analysis device with image processing
CN116595106A (en) User grouping method, device and storage medium
CN111026972A (en) Subscription data pushing method, device, equipment and storage medium in Internet of things
CN103186672B (en) file ordering method and device thereof
CN111159213A (en) Data query method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171114