CN107967362A - The self-defined search method of structured file and system based on hadoop - Google Patents
The self-defined search method of structured file and system based on hadoop Download PDFInfo
- Publication number
- CN107967362A CN107967362A CN201711404242.7A CN201711404242A CN107967362A CN 107967362 A CN107967362 A CN 107967362A CN 201711404242 A CN201711404242 A CN 201711404242A CN 107967362 A CN107967362 A CN 107967362A
- Authority
- CN
- China
- Prior art keywords
- server
- file
- request
- hadoop cluster
- condition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application provides a kind of self-defined search method of the structured file based on hadoop, in the case where storage resource condition is constant, the MapReduce functions of being provided by hadoop cluster server carry out quick search to a large amount of daily record class files stored in document storage server, because MapReduce being capable of the substantial amounts of daily record class data of parallel processing, so search efficiency greatly improves, and then solve the problems, such as that daily record class efficiency data query is low.
Description
Technical field
This application involves searching field, more particularly to the self-defined search method of structured file and system based on hadoop.
Background technology
In business bank's IT system, historical data amount is huge caused by transaction.Account flowing water historical data therein,
Stored because to be used for purposes, the higher distributed data bases of most use cost such as customer inquiries, account verification.And some
The historical datas such as daily record class, client's frequency of use is extremely low, only supplies O&M people under the conditions of problem investigation, analysis, tracking etc. is carried out
Member uses.Therefore, for such data, often preserved using some low side storages, protected even with cheap tape
Deposit.And the historical data for being preserved using cheap tape, at present can only be by " recovering tape-recovery database-database
The mode of inquiry " inquires about such data, but since data volume is relatively large, so the inquiry consuming time is relatively long.
As business bank services constantly lifting, the search efficiency requirement for daily record class data is also higher and higher, and
The quantity required of inquiry is also increasing.If stored using distributed data base to such data, its cost is again higher, such as
Relation between the search efficiency and carrying cost of what such data of balance becomes technical problem urgently to be resolved hurrily.
The content of the invention
This application provides a kind of self-defined searching system of the structured file based on hadoop.To achieve these goals,
This application provides following technical scheme:
A kind of self-defined search method of structured file based on hadoop, the method are applied to hadoop cluster service
Device, the described method includes:
Receive the MapReduce that application server is initiated according to Client-initiated inquiry request to ask, the request includes
The filename that need to inquire about, scope, condition, application server checking file name, scope, the validity of condition;
Open multiple Map and the file of multiple daily record classes is read in into hadoop cluster server, daily record class from file server
Document location is calculated by filename of the application server in request, scope, condition, and Map quantity is by application server
Drawn after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request;
By the daily record class file read in hadoop cluster server compared with Client-initiated inquiry request, if full
Sufficient condition is then output to Reduce, and the Reduce is used to the information for meeting the condition being saved in distribution of results formula file system
In system file, wherein, the distribution of results formula file system files are used to return to user, and filename is carried out according to inquiry request
Name.
Wherein, the MapReduce requests that the application server is initiated according to Client-initiated inquiry request specifically include:
Application server directly initiates MapReduce according to Client-initiated inquiry request to hadoop cluster server please
Ask, alternatively, application server inquires about in database server whether preserve relevant pluck according to Client-initiated inquiry request
Information is wanted, if preserving the related abstract information, MapReduce requests are initiated to hadoop cluster server.
A kind of hadoop cluster server, the server include:
Receiving module, is asked for receiving the MapReduce that application server is initiated according to Client-initiated inquiry request,
And verify the validity of the request;
Enquiry module, hadoop cluster is read in for opening multiple Map from file server by the file of multiple daily record classes
Server, Map quantity are carried out comprehensive by application server according to log file size, request scope, condition, number of clusters in request
It is total calculate after draw, daily record class file position is calculated by filename of the application server in request, scope, condition;
Comparison module, for will read in daily record class file and user in hadoop cluster server initiate inquiry request into
Row compares, and Reduce is output to if condition is met, the Reduce is used to the information for meeting the condition being saved in result
In distributed file system file, wherein, distribution of results formula file system files are used to return to user, and filename is according to inquiry
Request is named.
Wherein, the receiving module have be used for, receive application server according to Client-initiated inquiry request directly to
The MapReduce requests that hadoop cluster server is initiated, alternatively, application server initiates inquiry request, inquiry according to user
Whether relevant summary info is preserved in database server, if the related abstract information is preserved, to hadoop cluster
The MapReduce requests that server is initiated.
A kind of self-defined searching system of structured file based on hadoop, the system comprises:Client, application service
Device, document storage server and Hadoop cluster servers, wherein, document storage server is original file storage service
Device, saves a large amount of daily record class text files in the document storage server, the system specifically includes:
Client, initiates inquiry request, the request includes the filename that need to be inquired about, scope, condition for receiving user;
Application server, the inquiry request initiated for receiving client, checking file name, scope, the validity of condition,
After the inspection of validity, sending MapReduce to hadoop cluster server according to the inquiry request that client is initiated please
Ask, and hadoop cluster server is accessed by automatic regular polling mode, if hadoop cluster server analysis finishes, read knot
Fruit distributed file system file, and the distribution of results formula file system files of reading are returned into client, looked into described in mark
Inquiry task has been completed;
Hadoop cluster servers, for receive application server initiation MapReduce request, open multiple Map from
The file of multiple daily record classes is read in hadoop cluster server by file server, will read in the daily record of hadoop cluster server
Class file is output to Reduce compared with Client-initiated inquiry request if condition is met, the Reduce stages by these
Meeting that the information of condition is saved in distribution of results formula file system files, filename is named according to inquiry request, wherein,
Daily record class file position is calculated by filename of the application server in request, scope, condition, and Map quantity is by applying
Server is drawn after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request.
Wherein, the system also includes:Database server, the database server, which is contained in file server, to be deposited
The presupposed information of the daily record class file of storage, the presupposed information are the summary info of the daily record class file;
Application server is specifically used for, and receives the inquiry request that client is initiated, and the inquiry initiated according to client please
Ask and inquired about in database server, relevant summary info is preserved in database server when inquiring, read institute
Relevant summary info is stated, and forms what is initiated to hadoop cluster server according to the related abstract information of reading
Map logic parameters in MapReduce requests, and hadoop cluster server is accessed by automatic regular polling mode, if hadoop collection
Group's server analysis finishes, then reads distribution of results formula file system files, and the distribution of results formula file system of reading is literary
Part returns to client, marks the query task to complete, and shows database server when database server returns the result
In without relevant summary info is preserved, then return to error code reporting client;
Hadoop cluster servers are specifically used for, and receive the MapReduce requests that application server is initiated, open multiple
The file of multiple daily record classes is read in hadoop cluster server by Map from file server, will read in hadoop cluster server
Daily record class file compared with Client-initiated inquiry request, Reduce, Reduce stages are output to if condition is met
These are met that the information of condition is saved in distribution of results formula file system files, filename is ordered according to inquiry request
Name, wherein, daily record class file position is calculated by filename of the application server in request, scope, condition, Map numbers
Amount obtains after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request by application server
Go out.
Compared with prior art, the advantage of the invention is that:Since database server contains institute in file server
Documentary summary info, application server directly do not send MapReduce requests to hadoop cluster server and remove text
Inquire about file in part server, but the summary info of the file by being stored in database server, quickly judge file
The file of client request inquiry whether is preserved in server, when judging result for when being, then the file of next step is carried out and looks into
Ask.
Brief description of the drawings
, below will be to embodiment or description of the prior art in order to illustrate more clearly of the technical solution in the embodiment of the present application
Needed in attached drawing be briefly described, it should be apparent that, drawings in the following description are only some of the application
Embodiment, for those of ordinary skill in the art, without creative efforts, can also be attached according to these
Figure obtains other attached drawings.
Fig. 1 is a kind of flow diagram of the self-defined search method of structured file based on hadoop;
Fig. 2 is a kind of hadoop cluster server architecture schematic diagram;
Fig. 3 is a kind of self-defined searching system schematic diagram of structured file based on hadoop;
Fig. 4 is another self-defined searching system schematic diagram of structured file based on hadoop.
Embodiment
This application provides a kind of self-defined searching system of the structured file based on hadoop, which can be applied to need
In the system to be retrieved in a large amount of historical datas are stored, such as banking system, communication system.
MapReduce is introduced first, and MapReduce is a kind of software frame for parallel processing large data sets.
The root of MapReduce is map the and reduce functions in functionality programming.It may include many examples by two and (be permitted
More Map and Reduce) operation composition.Map functions receive one group of data and are converted into a key/value to list, input
Each element in domain corresponds to a key/value pair.Reduce functions receive Map functions generation list, then according to they
Key (for each key generate a key/value to) reduce key/value to list.
It is described in detail below and realizes the self-defined search method of structured file based on hadoop.
Fig. 1 provides a kind of self-defined search method of the structured file based on hadoop, and this method is applied to hadoop collection
Group's server, this method are specific as follows:
S101:The MapReduce that application server is initiated according to Client-initiated inquiry request is received to ask.The request bag
Containing need to inquire about filename, scope, the information such as condition, application server checking file name, scope, the validity of condition.
S102:Open multiple Map and the file of multiple daily record classes is read in into hadoop cluster server from file server E.
Wherein, daily record class file position is counted by filename of the application server in request, scope, condition etc.
Calculate, Map quantity is integrated by application server according to log file size, request scope, condition, number of clusters etc. in request
Drawn after calculating, i.e.,:
Y=f (x, y, z, t)
Wherein, t is number of clusters, and x is file size, and y is file extent, and z is search condition.
S103:Compared with the daily record class file read in hadoop cluster server is initiated inquiry request with user,
Reduce is output to if condition is met, these are met that the information of condition is saved in distribution of results formula file system by the Reduce stages
In system (Hadoop Distributed File System, hdfs) file, wherein, as a result hdfs files are used to return to use
Family, filename are named according to inquiry request.
In the present embodiment, document storage server is original document storage server, i.e., in storage resource condition not
In the case of change, the MapReduce functions of being provided by hadoop cluster server are big to what is stored in document storage server
Measure daily record class file and carry out quick search because MapReduce can the substantial amounts of daily record class data of parallel processing, inquiry effect
Rate greatly improves, and then solves the problems, such as that daily record class efficiency data query is low.
Wherein, in S101 steps, application server initiates inquiry request directly to hadoop cluster service according to user
Device initiate MapReduce request, or application server according to user initiate inquiry request inquiry database server in whether
Relevant summary info is preserved, if preserving related abstract information, initiating MapReduce to hadoop cluster server please
Ask.
Since database server contains the summary info of All Files in file server, application server does not have
Directly MapReduce requests are sent to hadoop cluster server to go to inquire about file in file server, but pass through database
The summary info of the file stored in server, quickly judges client request inquiry whether is preserved in file server
File, when judging result is to be, then carries out the file polling of next step.
Fig. 2 provides a kind of hadoop cluster server, and the server is including being:
Receiving module, is asked for receiving application server according to the MapReduce that user's initiation inquiry request is initiated, and
Verification request validity.
Enquiry module, hadoop cluster is read in for opening multiple Map from file server E by the file of multiple daily record classes
Server, Map quantity are carried out comprehensive by application server according to log file size, request scope, condition, number of clusters in request
It is total calculate after draw, daily record class file position is calculated by filename of the application server in request, scope, condition.
Comparison module, for will read in daily record class file and user in hadoop cluster server initiate inquiry request into
Row compares, and Reduce is output to if condition is met, these are met that the information of condition is saved in distribution of results by the Reduce stages
In formula file system (Hadoop Distributed File System, hdfs) file, wherein, as a result hdfs files are used to return
Back to user, filename is named according to inquiry request.
In the present embodiment, document storage server is original document storage server, i.e., in storage resource condition not
In the case of change, the MapReduce functions of being provided by hadoop cluster server are big to what is stored in document storage server
Measure daily record class file and carry out quick search because MapReduce can the substantial amounts of daily record class data of parallel processing, inquiry effect
Rate greatly improves, and then solves the problems, such as that daily record class efficiency data query is low.
Wherein, application server initiates inquiry request directly to the initiation of hadoop cluster server according to user
MapReduce is asked, or whether application server initiates to preserve in inquiry request inquiry database server according to user
Relevant summary info, if preserving related abstract information, MapReduce requests are initiated to hadoop cluster server.
Because database server contains the summary info of All Files in file server, application server does not have
Directly MapReduce requests are sent to hadoop cluster server to go to inquire about file in file server, but pass through database
The summary info of the file stored in server, quickly judges client request inquiry whether is preserved in file server E
File, when judging result when being, then to carry out the file polling of next step.
Fig. 3 provides a kind of self-defined searching system schematic diagram of structured file based on hadoop.
Customer end A can be two kinds of structures of C/S, B/S, and user can step on by browser (such as IE) or by shh agreements
Land server B calls shell scripts to initiate inquiry request.
Document storage server E is original document storage server, and a large amount of days are saved in this document storage server
Will class text file, such as can be that the database classified by accounting date, provinces and cities' code exports daily record class text file.
Also, this document storage server turn-on data download service, such as FTP, SFTP, for Hadoop cluster servers D by path
Read associated documents.
Hadoop cluster servers D is a Distributed Computing Platform, it provides MapReduce functions, and MapReduce is
A kind of method of parallel data processing.
The self-defined searching system of structured file based on hadoop is specific as follows:
Customer end A, initiates inquiry request, the request includes filename, scope, the bar that need to be inquired about for receiving user
Part.
Application server B, for receive customer end A initiation inquiry request, checking file name, scope, condition it is effective
Property, after the inspection of validity, sent according to the inquiry request that customer end A is initiated to hadoop cluster server D
MapReduce is asked, and accesses hadoop cluster server D by automatic regular polling mode, if hadoop cluster server D is analyzed
Finish, then read distribution of results formula file system (Hadoop Distributed File System, hdfs) file, and will read
The result hdfs files taken return to customer end A, mark the task to complete.
Hadoop cluster server D, for receiving the MapReduce requests of application server B initiations, open multiple Map
Hadoop cluster server is read in from file server E by the file of multiple daily record classes, hadoop cluster server will be read in
Daily record class file is output to Reduce compared with Client-initiated inquiry request if condition is met, the Reduce stages will
These meet that the information of condition is saved in result hdfs files, and filename is named according to inquiry request, wherein, daily record class
Document location is calculated by filename of the application server in request, scope, condition, and Map quantity is by application server
Drawn after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request.
In the present embodiment, document storage server E is original document storage server, i.e., in storage resource condition not
In the case of change, by the MapReduce functions that hadoop cluster server D is provided to storing in document storage server E
A large amount of daily record class files carry out quick search because MapReduce can the substantial amounts of daily record class data of parallel processing, inquiry
Efficiency greatly improves, and then solves the problems, such as that daily record class efficiency data query is low.
Fig. 4 provides another self-defined searching system schematic diagram of structured file based on hadoop.
Database server C contains the presupposed information of the daily record class file stored in file server E, the default letter
Cease and can be configured according to the actual requirements by user for the summary info of the daily record class file, the presupposed information, such as
Default information can be database table name of the file before export, export time point, its provinces and cities' code (subfile code),
The information such as field name, separator, field length, and/or person's newline.
The self-defined searching system of structured file based on hadoop is specific as follows:
Customer end A, initiates inquiry request, the request includes filename, scope, the bar that need to be inquired about for receiving user
Part.
Application server B, for receive customer end A initiation inquiry request, checking file name, scope, condition it is effective
Property, after the inspection of validity, inquired about according to the inquiry request that customer end A is initiated in database server C, when looking into
Ask and relevant summary info is preserved in database server C, then read the relevant summary info, and according to reading
Related abstract information forms Map logic parameters in the MapReduce requests initiated to hadoop cluster server D, and by fixed
When polling mode access hadoop cluster server D, if hadoop cluster server D analysis finish, read distribution of results formula
File system (Hadoop Distributed File System, hdfs) file, and the result hdfs files of reading are returned
To customer end A, mark the task to complete, show not preserve in database server C when database server C is returned the result
Relevant summary info, then return to error code reporting client A.
Hadoop cluster server D, for receiving the MapReduce requests of application server B initiations, open multiple Map
Hadoop cluster server is read in from file server E by the file of multiple daily record classes, to every a line according to database server C
The separator retained split by row, and retaining information according to database server C arrives each separating character String matching
The field of former table.By the daily record class file of reading hadoop cluster server compared with Client-initiated inquiry request, if
Meet that condition is then output to Reduce, these are met that the information of condition is saved in result hdfs files by the Reduce stages, text
Part name is named according to inquiry request, wherein, filename, model of the daily record class file position by application server in request
Enclose, condition is calculated, Map quantity is by application server according to log file size in request, request scope, condition, cluster
Quantity is drawn after carrying out COMPREHENSIVE CALCULATING.
For example, user sent by customer end A " SCV | 02 | 20140101 | 20141231 | CHNO=XX0001or HAC
=XX0009) " inquiry request, its implication is:Inquire about the export of the entitled SCV of tables of data, its provinces and cities' code (subfile generation
Code) it is 02 annual data in 2014, and meet that its CHNO, HAC are equal to this data line of some value.The request can be
Table name, subfile code, time started, end time are selected by the IE of customer end A and fill in corresponding conditional information, these
The incoming application server B of jeson strings that information passes through structuring.
After application server B receives the inquiry request that customer end A is sent, structured message is parsed, then to data
Storehouse server C is inquired about, if saved SCV | and 02 | 20140101 | 20141231 information.Mistake is returned if not preserving.
If preserving, read data structure information of the SCV tables of data in 2014 (data structure may change).Application service
Device B is assembled into Map logic parameters in Hadoop cluster servers MapReduce requests according to the data structure information of reading, main
To include the file path to be obtained (such as fileServer/SCV/02/20140101, fileServer/SCV/02/
20140102 etc.).
Hadoop cluster servers D asks to be looked into document storage server E according to the MapReduce received
Ask, after obtaining file, read file by row and parsed according to the database server C separators retained.Such as SCV tables
" SCV | 02 | 20140101 | " have " three fields such as ID, CHNO, HAC ", its often row include " 0001 | XX0001 | XX0009 "
Deng after then this line can be split according to separator, obtaining second, third phrase (i.e. " XX0001 ", " XX0009 ").
Judge whether to meet the requirements according to input logic or or and after being compared.If meeting, Reduce is output to.
These information are saved under the catalogue of the entitled request serial number of catalogue by the Reduce stages, such as " hdfs/
20150418001”.Application server B can obtain relevant information in hdfs by request serial number, and delete hdfs after the acquisition
Middle relevant information.
In the present embodiment, database server C contains the summary info of All Files in file server E, application
Server B does not directly send MapReduce requests to hadoop cluster server D and goes in file server E to inquire about file,
But the summary info of the file by being stored in database server C, quickly judge whether preserved in file server E
The file of customer end A requesting query, when judging result is to be, then carries out the file polling of next step.As it can be seen that the present embodiment institute
The above-mentioned technological means used, directly MapReduce requests are sent relative to application server B to hadoop cluster server D
Go in file server E to inquire about for file, file is first judged by the summary info of the file stored in database server C
Whether server E stores the file to be inquired about, after determining to be stored with the file to be inquired about, then carries out the inquiry of next step,
When determining not preserve the file to be inquired about, then the inquiry of next step need not be carried out, so as to save Internet resources, together
When also save the time to be checked such as client.
If the function described in the present embodiment method is realized in the form of SFU software functional unit and is used as independent product pin
Sell or in use, can be stored in a computing device read/write memory medium.Based on such understanding, the embodiment of the present invention
The part to contribute to the prior art or the part of the technical solution can be embodied in the form of software product, this is soft
Part product is stored in a storage medium, including some instructions are used so that computing device (can be personal computer,
Server, mobile computing device or network equipment etc.) perform all or part of step of each embodiment the method for the present invention
Suddenly.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), deposit at random
Access to memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
Each embodiment is described by the way of progressive in this specification, what each embodiment stressed be with it is other
The difference of embodiment, between each embodiment same or similar part mutually referring to.
The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or use the present invention.
A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention
The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one
The most wide scope caused.
Claims (6)
1. the self-defined search method of a kind of structured file based on hadoop, it is characterised in that the method is applied to hadoop
Cluster server, the described method includes:
Receive the MapReduce that application server is initiated according to Client-initiated inquiry request to ask, the request is included and need to looked into
The filename of inquiry, scope, condition, application server checking file name, scope, the validity of condition;
Open multiple Map and the file of multiple daily record classes is read in into hadoop cluster server, daily record class file from file server
Position is calculated by filename of the application server in request, scope, condition, Map quantity by application server according to
Log file size, request scope, condition, number of clusters are drawn after carrying out COMPREHENSIVE CALCULATING in request;
By the daily record class file read in hadoop cluster server compared with Client-initiated inquiry request, if meeting bar
Part is then output to Reduce, and the Reduce is used to the information for meeting the condition being saved in distribution of results formula file system text
In part, wherein, the distribution of results formula file system files are used to return to user, and filename is ordered according to inquiry request
Name.
2. according to the method described in claim 1, it is characterized in that, the application server is according to Client-initiated inquiry request
The MapReduce requests of initiation specifically include:
Application server directly initiates MapReduce requests according to Client-initiated inquiry request to hadoop cluster server,
Alternatively, application server inquires about in database server whether preserve relevant summary according to Client-initiated inquiry request
Information, if preserving the related abstract information, MapReduce requests are initiated to hadoop cluster server.
3. a kind of hadoop cluster server, it is characterised in that the server includes:
Receiving module, is asked for receiving the MapReduce that application server is initiated according to Client-initiated inquiry request, and school
Test the validity of the request;
Enquiry module, hadoop cluster service is read in for opening multiple Map from file server by the file of multiple daily record classes
Device, Map quantity are carried out integrating meter by application server according to log file size, request scope, condition, number of clusters in request
Drawn after calculation, daily record class file position is calculated by filename of the application server in request, scope, condition;
Comparison module, is compared for the daily record class file read in hadoop cluster server to be initiated inquiry request with user
Compared with being output to Reduce if condition is met, the Reduce is used to the information for meeting the condition being saved in distribution of results
In formula file system files, wherein, distribution of results formula file system files are used to return to user, and filename is according to inquiry request
It is named.
4. server according to claim 3, it is characterised in that
The receiving module, which has, to be used for, and receives application server according to Client-initiated inquiry request directly to hadoop cluster
The MapReduce requests that server is initiated, alternatively, application server initiates inquiry request according to user, inquire about database service
Relevant summary info whether is preserved in device, if preserving the related abstract information, is initiated to hadoop cluster server
MapReduce request.
A kind of 5. self-defined searching system of structured file based on hadoop, it is characterised in that the system comprises:Client,
Application server, document storage server and Hadoop cluster servers, wherein, document storage server is deposited for original file
Server is stored up, saves a large amount of daily record class text files in the document storage server, the system specifically includes:
Client, initiates inquiry request, the request includes the filename that need to be inquired about, scope, condition for receiving user;
Application server, for receiving the inquiry request of client initiation, checking file name, scope, the validity of condition, pass through
After the inspection of validity, MapReduce requests are sent to hadoop cluster server according to the inquiry request that client is initiated, and
Hadoop cluster server is accessed by automatic regular polling mode, if hadoop cluster server analysis finishes, reads result point
Cloth file system files, and the distribution of results formula file system files of reading are returned into client, mark the inquiry to appoint
Business has been completed;
Hadoop cluster servers, for receiving the MapReduce requests of application server initiation, open multiple Map from file
The file of multiple daily record classes is read in hadoop cluster server by server, will read in the daily record class text of hadoop cluster server
Part is output to Reduce compared with Client-initiated inquiry request if condition is met, the Reduce stages meet these
The information of condition is saved in distribution of results formula file system files, and filename is named according to inquiry request, wherein, daily record
Class file position is calculated by filename of the application server in request, scope, condition, and Map quantity is by application service
Device is drawn after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request.
6. system according to claim 5, it is characterised in that the system also includes:Database server, the data
Storehouse server contains the presupposed information of the daily record class file stored in file server, and the presupposed information is the daily record class
The summary info of file;
Application server is specifically used for, and receives the inquiry request that client is initiated, and exist according to the inquiry request that client is initiated
Inquired about in database server, relevant summary info is preserved in database server when inquiring, reads the phase
The summary info of pass, and the MapReduce initiated to hadoop cluster server is formed according to the related abstract information of reading and is asked
Middle Map logic parameters are sought, and hadoop cluster server is accessed by automatic regular polling mode, if hadoop cluster server analysis
Finish, then read distribution of results formula file system files, and the distribution of results formula file system files of reading are returned into client
End, marks the query task to complete, and shows when database server returns the result in database server without preservation phase
The summary info of pass, then return to error code reporting client;
Hadoop cluster servers are specifically used for, receive application server initiate MapReduce request, open multiple Map from
The file of multiple daily record classes is read in hadoop cluster server by file server, will read in the daily record of hadoop cluster server
Class file is output to Reduce compared with Client-initiated inquiry request if condition is met, the Reduce stages by these
Meeting that the information of condition is saved in distribution of results formula file system files, filename is named according to inquiry request, wherein,
Daily record class file position is calculated by filename of the application server in request, scope, condition, and Map quantity is by applying
Server is drawn after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711404242.7A CN107967362A (en) | 2017-12-22 | 2017-12-22 | The self-defined search method of structured file and system based on hadoop |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711404242.7A CN107967362A (en) | 2017-12-22 | 2017-12-22 | The self-defined search method of structured file and system based on hadoop |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107967362A true CN107967362A (en) | 2018-04-27 |
Family
ID=61994715
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711404242.7A Pending CN107967362A (en) | 2017-12-22 | 2017-12-22 | The self-defined search method of structured file and system based on hadoop |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107967362A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111090803A (en) * | 2019-11-22 | 2020-05-01 | 贝壳技术有限公司 | Data processing method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462351A (en) * | 2014-12-05 | 2015-03-25 | 河海大学 | Data query model and method for MapReduce pattern |
CN104572727A (en) * | 2013-10-22 | 2015-04-29 | 阿里巴巴集团控股有限公司 | Data querying method and device |
CN105389314A (en) * | 2014-09-04 | 2016-03-09 | 中芯国际集成电路制造(上海)有限公司 | Log file query system and query method |
-
2017
- 2017-12-22 CN CN201711404242.7A patent/CN107967362A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572727A (en) * | 2013-10-22 | 2015-04-29 | 阿里巴巴集团控股有限公司 | Data querying method and device |
CN105389314A (en) * | 2014-09-04 | 2016-03-09 | 中芯国际集成电路制造(上海)有限公司 | Log file query system and query method |
CN104462351A (en) * | 2014-12-05 | 2015-03-25 | 河海大学 | Data query model and method for MapReduce pattern |
Non-Patent Citations (3)
Title |
---|
何克晶等: "《大数据前沿技术与应用》", 31 March 2017, 华南理工大学出版社 * |
杨锋英等: ""基于Hadoop的在线网络日志分析系统研究"", 《计算机应用与软件》 * |
陈梦杰等: "基于Hadoop的大数据查询系统简述", 《计算机与数字工程》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111090803A (en) * | 2019-11-22 | 2020-05-01 | 贝壳技术有限公司 | Data processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102722481B (en) | The processing method of a kind of user's favorites data and searching method | |
CN110431545A (en) | Inquiry is executed for structural data and unstructured data | |
CN106126648B (en) | It is a kind of based on the distributed merchandise news crawler method redo log | |
US10546348B1 (en) | Cleaning noise words from transaction descriptions | |
US20170220681A1 (en) | System and method for automated domain-extensible web scraping | |
KR20070058684A (en) | Method for searching data elements on the web using a conceptual metadata and contextual metadata search engine | |
US9886711B2 (en) | Product recommendations over multiple stores | |
CN110334356A (en) | Article matter method for determination of amount, article screening technique and corresponding device | |
CN108170731A (en) | Data processing method, device, computer storage media and server | |
CN104579909A (en) | Method and equipment for classifying user information and acquiring user grouping information | |
CN111382279A (en) | Order examination method and device | |
US9652740B2 (en) | Fan identity data integration and unification | |
CN113205402A (en) | Account checking method and device, electronic equipment and computer readable medium | |
CN110377579A (en) | File memory method, device and server | |
KR20160070282A (en) | Providing system and method for shopping mall web site, program and recording medium thereof | |
WO2022111148A1 (en) | Metadata indexing for information management | |
CN110009796A (en) | Invoice category recognition methods, device, electronic equipment and readable storage medium storing program for executing | |
US11500840B2 (en) | Contrasting document-embedded structured data and generating summaries thereof | |
CN107967362A (en) | The self-defined search method of structured file and system based on hadoop | |
CN112581281A (en) | Product recommendation method and device, storage medium and electronic equipment | |
US11822875B2 (en) | Automatically evaluating summarizers | |
CN106940715B (en) | A kind of method and apparatus of the inquiry based on concordance list | |
US20220342887A1 (en) | Predictive query processing | |
KR101178998B1 (en) | Method and System for Certificating Data | |
US11755633B2 (en) | Entity search system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180427 |