CN107967362A - The self-defined search method of structured file and system based on hadoop - Google Patents

The self-defined search method of structured file and system based on hadoop Download PDF

Info

Publication number
CN107967362A
CN107967362A CN201711404242.7A CN201711404242A CN107967362A CN 107967362 A CN107967362 A CN 107967362A CN 201711404242 A CN201711404242 A CN 201711404242A CN 107967362 A CN107967362 A CN 107967362A
Authority
CN
China
Prior art keywords
server
file
request
hadoop cluster
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711404242.7A
Other languages
Chinese (zh)
Inventor
郭会
耿鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN201711404242.7A priority Critical patent/CN107967362A/en
Publication of CN107967362A publication Critical patent/CN107967362A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application provides a kind of self-defined search method of the structured file based on hadoop, in the case where storage resource condition is constant, the MapReduce functions of being provided by hadoop cluster server carry out quick search to a large amount of daily record class files stored in document storage server, because MapReduce being capable of the substantial amounts of daily record class data of parallel processing, so search efficiency greatly improves, and then solve the problems, such as that daily record class efficiency data query is low.

Description

The self-defined search method of structured file and system based on hadoop
Technical field
This application involves searching field, more particularly to the self-defined search method of structured file and system based on hadoop.
Background technology
In business bank's IT system, historical data amount is huge caused by transaction.Account flowing water historical data therein, Stored because to be used for purposes, the higher distributed data bases of most use cost such as customer inquiries, account verification.And some The historical datas such as daily record class, client's frequency of use is extremely low, only supplies O&M people under the conditions of problem investigation, analysis, tracking etc. is carried out Member uses.Therefore, for such data, often preserved using some low side storages, protected even with cheap tape Deposit.And the historical data for being preserved using cheap tape, at present can only be by " recovering tape-recovery database-database The mode of inquiry " inquires about such data, but since data volume is relatively large, so the inquiry consuming time is relatively long.
As business bank services constantly lifting, the search efficiency requirement for daily record class data is also higher and higher, and The quantity required of inquiry is also increasing.If stored using distributed data base to such data, its cost is again higher, such as Relation between the search efficiency and carrying cost of what such data of balance becomes technical problem urgently to be resolved hurrily.
The content of the invention
This application provides a kind of self-defined searching system of the structured file based on hadoop.To achieve these goals, This application provides following technical scheme:
A kind of self-defined search method of structured file based on hadoop, the method are applied to hadoop cluster service Device, the described method includes:
Receive the MapReduce that application server is initiated according to Client-initiated inquiry request to ask, the request includes The filename that need to inquire about, scope, condition, application server checking file name, scope, the validity of condition;
Open multiple Map and the file of multiple daily record classes is read in into hadoop cluster server, daily record class from file server Document location is calculated by filename of the application server in request, scope, condition, and Map quantity is by application server Drawn after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request;
By the daily record class file read in hadoop cluster server compared with Client-initiated inquiry request, if full Sufficient condition is then output to Reduce, and the Reduce is used to the information for meeting the condition being saved in distribution of results formula file system In system file, wherein, the distribution of results formula file system files are used to return to user, and filename is carried out according to inquiry request Name.
Wherein, the MapReduce requests that the application server is initiated according to Client-initiated inquiry request specifically include:
Application server directly initiates MapReduce according to Client-initiated inquiry request to hadoop cluster server please Ask, alternatively, application server inquires about in database server whether preserve relevant pluck according to Client-initiated inquiry request Information is wanted, if preserving the related abstract information, MapReduce requests are initiated to hadoop cluster server.
A kind of hadoop cluster server, the server include:
Receiving module, is asked for receiving the MapReduce that application server is initiated according to Client-initiated inquiry request, And verify the validity of the request;
Enquiry module, hadoop cluster is read in for opening multiple Map from file server by the file of multiple daily record classes Server, Map quantity are carried out comprehensive by application server according to log file size, request scope, condition, number of clusters in request It is total calculate after draw, daily record class file position is calculated by filename of the application server in request, scope, condition;
Comparison module, for will read in daily record class file and user in hadoop cluster server initiate inquiry request into Row compares, and Reduce is output to if condition is met, the Reduce is used to the information for meeting the condition being saved in result In distributed file system file, wherein, distribution of results formula file system files are used to return to user, and filename is according to inquiry Request is named.
Wherein, the receiving module have be used for, receive application server according to Client-initiated inquiry request directly to The MapReduce requests that hadoop cluster server is initiated, alternatively, application server initiates inquiry request, inquiry according to user Whether relevant summary info is preserved in database server, if the related abstract information is preserved, to hadoop cluster The MapReduce requests that server is initiated.
A kind of self-defined searching system of structured file based on hadoop, the system comprises:Client, application service Device, document storage server and Hadoop cluster servers, wherein, document storage server is original file storage service Device, saves a large amount of daily record class text files in the document storage server, the system specifically includes:
Client, initiates inquiry request, the request includes the filename that need to be inquired about, scope, condition for receiving user;
Application server, the inquiry request initiated for receiving client, checking file name, scope, the validity of condition, After the inspection of validity, sending MapReduce to hadoop cluster server according to the inquiry request that client is initiated please Ask, and hadoop cluster server is accessed by automatic regular polling mode, if hadoop cluster server analysis finishes, read knot Fruit distributed file system file, and the distribution of results formula file system files of reading are returned into client, looked into described in mark Inquiry task has been completed;
Hadoop cluster servers, for receive application server initiation MapReduce request, open multiple Map from The file of multiple daily record classes is read in hadoop cluster server by file server, will read in the daily record of hadoop cluster server Class file is output to Reduce compared with Client-initiated inquiry request if condition is met, the Reduce stages by these Meeting that the information of condition is saved in distribution of results formula file system files, filename is named according to inquiry request, wherein, Daily record class file position is calculated by filename of the application server in request, scope, condition, and Map quantity is by applying Server is drawn after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request.
Wherein, the system also includes:Database server, the database server, which is contained in file server, to be deposited The presupposed information of the daily record class file of storage, the presupposed information are the summary info of the daily record class file;
Application server is specifically used for, and receives the inquiry request that client is initiated, and the inquiry initiated according to client please Ask and inquired about in database server, relevant summary info is preserved in database server when inquiring, read institute Relevant summary info is stated, and forms what is initiated to hadoop cluster server according to the related abstract information of reading Map logic parameters in MapReduce requests, and hadoop cluster server is accessed by automatic regular polling mode, if hadoop collection Group's server analysis finishes, then reads distribution of results formula file system files, and the distribution of results formula file system of reading is literary Part returns to client, marks the query task to complete, and shows database server when database server returns the result In without relevant summary info is preserved, then return to error code reporting client;
Hadoop cluster servers are specifically used for, and receive the MapReduce requests that application server is initiated, open multiple The file of multiple daily record classes is read in hadoop cluster server by Map from file server, will read in hadoop cluster server Daily record class file compared with Client-initiated inquiry request, Reduce, Reduce stages are output to if condition is met These are met that the information of condition is saved in distribution of results formula file system files, filename is ordered according to inquiry request Name, wherein, daily record class file position is calculated by filename of the application server in request, scope, condition, Map numbers Amount obtains after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request by application server Go out.
Compared with prior art, the advantage of the invention is that:Since database server contains institute in file server Documentary summary info, application server directly do not send MapReduce requests to hadoop cluster server and remove text Inquire about file in part server, but the summary info of the file by being stored in database server, quickly judge file The file of client request inquiry whether is preserved in server, when judging result for when being, then the file of next step is carried out and looks into Ask.
Brief description of the drawings
, below will be to embodiment or description of the prior art in order to illustrate more clearly of the technical solution in the embodiment of the present application Needed in attached drawing be briefly described, it should be apparent that, drawings in the following description are only some of the application Embodiment, for those of ordinary skill in the art, without creative efforts, can also be attached according to these Figure obtains other attached drawings.
Fig. 1 is a kind of flow diagram of the self-defined search method of structured file based on hadoop;
Fig. 2 is a kind of hadoop cluster server architecture schematic diagram;
Fig. 3 is a kind of self-defined searching system schematic diagram of structured file based on hadoop;
Fig. 4 is another self-defined searching system schematic diagram of structured file based on hadoop.
Embodiment
This application provides a kind of self-defined searching system of the structured file based on hadoop, which can be applied to need In the system to be retrieved in a large amount of historical datas are stored, such as banking system, communication system.
MapReduce is introduced first, and MapReduce is a kind of software frame for parallel processing large data sets. The root of MapReduce is map the and reduce functions in functionality programming.It may include many examples by two and (be permitted More Map and Reduce) operation composition.Map functions receive one group of data and are converted into a key/value to list, input Each element in domain corresponds to a key/value pair.Reduce functions receive Map functions generation list, then according to they Key (for each key generate a key/value to) reduce key/value to list.
It is described in detail below and realizes the self-defined search method of structured file based on hadoop.
Fig. 1 provides a kind of self-defined search method of the structured file based on hadoop, and this method is applied to hadoop collection Group's server, this method are specific as follows:
S101:The MapReduce that application server is initiated according to Client-initiated inquiry request is received to ask.The request bag Containing need to inquire about filename, scope, the information such as condition, application server checking file name, scope, the validity of condition.
S102:Open multiple Map and the file of multiple daily record classes is read in into hadoop cluster server from file server E.
Wherein, daily record class file position is counted by filename of the application server in request, scope, condition etc. Calculate, Map quantity is integrated by application server according to log file size, request scope, condition, number of clusters etc. in request Drawn after calculating, i.e.,:
Y=f (x, y, z, t)
Wherein, t is number of clusters, and x is file size, and y is file extent, and z is search condition.
S103:Compared with the daily record class file read in hadoop cluster server is initiated inquiry request with user, Reduce is output to if condition is met, these are met that the information of condition is saved in distribution of results formula file system by the Reduce stages In system (Hadoop Distributed File System, hdfs) file, wherein, as a result hdfs files are used to return to use Family, filename are named according to inquiry request.
In the present embodiment, document storage server is original document storage server, i.e., in storage resource condition not In the case of change, the MapReduce functions of being provided by hadoop cluster server are big to what is stored in document storage server Measure daily record class file and carry out quick search because MapReduce can the substantial amounts of daily record class data of parallel processing, inquiry effect Rate greatly improves, and then solves the problems, such as that daily record class efficiency data query is low.
Wherein, in S101 steps, application server initiates inquiry request directly to hadoop cluster service according to user Device initiate MapReduce request, or application server according to user initiate inquiry request inquiry database server in whether Relevant summary info is preserved, if preserving related abstract information, initiating MapReduce to hadoop cluster server please Ask.
Since database server contains the summary info of All Files in file server, application server does not have Directly MapReduce requests are sent to hadoop cluster server to go to inquire about file in file server, but pass through database The summary info of the file stored in server, quickly judges client request inquiry whether is preserved in file server File, when judging result is to be, then carries out the file polling of next step.
Fig. 2 provides a kind of hadoop cluster server, and the server is including being:
Receiving module, is asked for receiving application server according to the MapReduce that user's initiation inquiry request is initiated, and Verification request validity.
Enquiry module, hadoop cluster is read in for opening multiple Map from file server E by the file of multiple daily record classes Server, Map quantity are carried out comprehensive by application server according to log file size, request scope, condition, number of clusters in request It is total calculate after draw, daily record class file position is calculated by filename of the application server in request, scope, condition.
Comparison module, for will read in daily record class file and user in hadoop cluster server initiate inquiry request into Row compares, and Reduce is output to if condition is met, these are met that the information of condition is saved in distribution of results by the Reduce stages In formula file system (Hadoop Distributed File System, hdfs) file, wherein, as a result hdfs files are used to return Back to user, filename is named according to inquiry request.
In the present embodiment, document storage server is original document storage server, i.e., in storage resource condition not In the case of change, the MapReduce functions of being provided by hadoop cluster server are big to what is stored in document storage server Measure daily record class file and carry out quick search because MapReduce can the substantial amounts of daily record class data of parallel processing, inquiry effect Rate greatly improves, and then solves the problems, such as that daily record class efficiency data query is low.
Wherein, application server initiates inquiry request directly to the initiation of hadoop cluster server according to user MapReduce is asked, or whether application server initiates to preserve in inquiry request inquiry database server according to user Relevant summary info, if preserving related abstract information, MapReduce requests are initiated to hadoop cluster server.
Because database server contains the summary info of All Files in file server, application server does not have Directly MapReduce requests are sent to hadoop cluster server to go to inquire about file in file server, but pass through database The summary info of the file stored in server, quickly judges client request inquiry whether is preserved in file server E File, when judging result when being, then to carry out the file polling of next step.
Fig. 3 provides a kind of self-defined searching system schematic diagram of structured file based on hadoop.
Customer end A can be two kinds of structures of C/S, B/S, and user can step on by browser (such as IE) or by shh agreements Land server B calls shell scripts to initiate inquiry request.
Document storage server E is original document storage server, and a large amount of days are saved in this document storage server Will class text file, such as can be that the database classified by accounting date, provinces and cities' code exports daily record class text file. Also, this document storage server turn-on data download service, such as FTP, SFTP, for Hadoop cluster servers D by path Read associated documents.
Hadoop cluster servers D is a Distributed Computing Platform, it provides MapReduce functions, and MapReduce is A kind of method of parallel data processing.
The self-defined searching system of structured file based on hadoop is specific as follows:
Customer end A, initiates inquiry request, the request includes filename, scope, the bar that need to be inquired about for receiving user Part.
Application server B, for receive customer end A initiation inquiry request, checking file name, scope, condition it is effective Property, after the inspection of validity, sent according to the inquiry request that customer end A is initiated to hadoop cluster server D MapReduce is asked, and accesses hadoop cluster server D by automatic regular polling mode, if hadoop cluster server D is analyzed Finish, then read distribution of results formula file system (Hadoop Distributed File System, hdfs) file, and will read The result hdfs files taken return to customer end A, mark the task to complete.
Hadoop cluster server D, for receiving the MapReduce requests of application server B initiations, open multiple Map Hadoop cluster server is read in from file server E by the file of multiple daily record classes, hadoop cluster server will be read in Daily record class file is output to Reduce compared with Client-initiated inquiry request if condition is met, the Reduce stages will These meet that the information of condition is saved in result hdfs files, and filename is named according to inquiry request, wherein, daily record class Document location is calculated by filename of the application server in request, scope, condition, and Map quantity is by application server Drawn after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request.
In the present embodiment, document storage server E is original document storage server, i.e., in storage resource condition not In the case of change, by the MapReduce functions that hadoop cluster server D is provided to storing in document storage server E A large amount of daily record class files carry out quick search because MapReduce can the substantial amounts of daily record class data of parallel processing, inquiry Efficiency greatly improves, and then solves the problems, such as that daily record class efficiency data query is low.
Fig. 4 provides another self-defined searching system schematic diagram of structured file based on hadoop.
Database server C contains the presupposed information of the daily record class file stored in file server E, the default letter Cease and can be configured according to the actual requirements by user for the summary info of the daily record class file, the presupposed information, such as Default information can be database table name of the file before export, export time point, its provinces and cities' code (subfile code), The information such as field name, separator, field length, and/or person's newline.
The self-defined searching system of structured file based on hadoop is specific as follows:
Customer end A, initiates inquiry request, the request includes filename, scope, the bar that need to be inquired about for receiving user Part.
Application server B, for receive customer end A initiation inquiry request, checking file name, scope, condition it is effective Property, after the inspection of validity, inquired about according to the inquiry request that customer end A is initiated in database server C, when looking into Ask and relevant summary info is preserved in database server C, then read the relevant summary info, and according to reading Related abstract information forms Map logic parameters in the MapReduce requests initiated to hadoop cluster server D, and by fixed When polling mode access hadoop cluster server D, if hadoop cluster server D analysis finish, read distribution of results formula File system (Hadoop Distributed File System, hdfs) file, and the result hdfs files of reading are returned To customer end A, mark the task to complete, show not preserve in database server C when database server C is returned the result Relevant summary info, then return to error code reporting client A.
Hadoop cluster server D, for receiving the MapReduce requests of application server B initiations, open multiple Map Hadoop cluster server is read in from file server E by the file of multiple daily record classes, to every a line according to database server C The separator retained split by row, and retaining information according to database server C arrives each separating character String matching The field of former table.By the daily record class file of reading hadoop cluster server compared with Client-initiated inquiry request, if Meet that condition is then output to Reduce, these are met that the information of condition is saved in result hdfs files by the Reduce stages, text Part name is named according to inquiry request, wherein, filename, model of the daily record class file position by application server in request Enclose, condition is calculated, Map quantity is by application server according to log file size in request, request scope, condition, cluster Quantity is drawn after carrying out COMPREHENSIVE CALCULATING.
For example, user sent by customer end A " SCV | 02 | 20140101 | 20141231 | CHNO=XX0001or HAC =XX0009) " inquiry request, its implication is:Inquire about the export of the entitled SCV of tables of data, its provinces and cities' code (subfile generation Code) it is 02 annual data in 2014, and meet that its CHNO, HAC are equal to this data line of some value.The request can be Table name, subfile code, time started, end time are selected by the IE of customer end A and fill in corresponding conditional information, these The incoming application server B of jeson strings that information passes through structuring.
After application server B receives the inquiry request that customer end A is sent, structured message is parsed, then to data Storehouse server C is inquired about, if saved SCV | and 02 | 20140101 | 20141231 information.Mistake is returned if not preserving. If preserving, read data structure information of the SCV tables of data in 2014 (data structure may change).Application service Device B is assembled into Map logic parameters in Hadoop cluster servers MapReduce requests according to the data structure information of reading, main To include the file path to be obtained (such as fileServer/SCV/02/20140101, fileServer/SCV/02/ 20140102 etc.).
Hadoop cluster servers D asks to be looked into document storage server E according to the MapReduce received Ask, after obtaining file, read file by row and parsed according to the database server C separators retained.Such as SCV tables " SCV | 02 | 20140101 | " have " three fields such as ID, CHNO, HAC ", its often row include " 0001 | XX0001 | XX0009 " Deng after then this line can be split according to separator, obtaining second, third phrase (i.e. " XX0001 ", " XX0009 "). Judge whether to meet the requirements according to input logic or or and after being compared.If meeting, Reduce is output to.
These information are saved under the catalogue of the entitled request serial number of catalogue by the Reduce stages, such as " hdfs/ 20150418001”.Application server B can obtain relevant information in hdfs by request serial number, and delete hdfs after the acquisition Middle relevant information.
In the present embodiment, database server C contains the summary info of All Files in file server E, application Server B does not directly send MapReduce requests to hadoop cluster server D and goes in file server E to inquire about file, But the summary info of the file by being stored in database server C, quickly judge whether preserved in file server E The file of customer end A requesting query, when judging result is to be, then carries out the file polling of next step.As it can be seen that the present embodiment institute The above-mentioned technological means used, directly MapReduce requests are sent relative to application server B to hadoop cluster server D Go in file server E to inquire about for file, file is first judged by the summary info of the file stored in database server C Whether server E stores the file to be inquired about, after determining to be stored with the file to be inquired about, then carries out the inquiry of next step, When determining not preserve the file to be inquired about, then the inquiry of next step need not be carried out, so as to save Internet resources, together When also save the time to be checked such as client.
If the function described in the present embodiment method is realized in the form of SFU software functional unit and is used as independent product pin Sell or in use, can be stored in a computing device read/write memory medium.Based on such understanding, the embodiment of the present invention The part to contribute to the prior art or the part of the technical solution can be embodied in the form of software product, this is soft Part product is stored in a storage medium, including some instructions are used so that computing device (can be personal computer, Server, mobile computing device or network equipment etc.) perform all or part of step of each embodiment the method for the present invention Suddenly.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), deposit at random Access to memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
Each embodiment is described by the way of progressive in this specification, what each embodiment stressed be with it is other The difference of embodiment, between each embodiment same or similar part mutually referring to.
The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or use the present invention. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one The most wide scope caused.

Claims (6)

1. the self-defined search method of a kind of structured file based on hadoop, it is characterised in that the method is applied to hadoop Cluster server, the described method includes:
Receive the MapReduce that application server is initiated according to Client-initiated inquiry request to ask, the request is included and need to looked into The filename of inquiry, scope, condition, application server checking file name, scope, the validity of condition;
Open multiple Map and the file of multiple daily record classes is read in into hadoop cluster server, daily record class file from file server Position is calculated by filename of the application server in request, scope, condition, Map quantity by application server according to Log file size, request scope, condition, number of clusters are drawn after carrying out COMPREHENSIVE CALCULATING in request;
By the daily record class file read in hadoop cluster server compared with Client-initiated inquiry request, if meeting bar Part is then output to Reduce, and the Reduce is used to the information for meeting the condition being saved in distribution of results formula file system text In part, wherein, the distribution of results formula file system files are used to return to user, and filename is ordered according to inquiry request Name.
2. according to the method described in claim 1, it is characterized in that, the application server is according to Client-initiated inquiry request The MapReduce requests of initiation specifically include:
Application server directly initiates MapReduce requests according to Client-initiated inquiry request to hadoop cluster server, Alternatively, application server inquires about in database server whether preserve relevant summary according to Client-initiated inquiry request Information, if preserving the related abstract information, MapReduce requests are initiated to hadoop cluster server.
3. a kind of hadoop cluster server, it is characterised in that the server includes:
Receiving module, is asked for receiving the MapReduce that application server is initiated according to Client-initiated inquiry request, and school Test the validity of the request;
Enquiry module, hadoop cluster service is read in for opening multiple Map from file server by the file of multiple daily record classes Device, Map quantity are carried out integrating meter by application server according to log file size, request scope, condition, number of clusters in request Drawn after calculation, daily record class file position is calculated by filename of the application server in request, scope, condition;
Comparison module, is compared for the daily record class file read in hadoop cluster server to be initiated inquiry request with user Compared with being output to Reduce if condition is met, the Reduce is used to the information for meeting the condition being saved in distribution of results In formula file system files, wherein, distribution of results formula file system files are used to return to user, and filename is according to inquiry request It is named.
4. server according to claim 3, it is characterised in that
The receiving module, which has, to be used for, and receives application server according to Client-initiated inquiry request directly to hadoop cluster The MapReduce requests that server is initiated, alternatively, application server initiates inquiry request according to user, inquire about database service Relevant summary info whether is preserved in device, if preserving the related abstract information, is initiated to hadoop cluster server MapReduce request.
A kind of 5. self-defined searching system of structured file based on hadoop, it is characterised in that the system comprises:Client, Application server, document storage server and Hadoop cluster servers, wherein, document storage server is deposited for original file Server is stored up, saves a large amount of daily record class text files in the document storage server, the system specifically includes:
Client, initiates inquiry request, the request includes the filename that need to be inquired about, scope, condition for receiving user;
Application server, for receiving the inquiry request of client initiation, checking file name, scope, the validity of condition, pass through After the inspection of validity, MapReduce requests are sent to hadoop cluster server according to the inquiry request that client is initiated, and Hadoop cluster server is accessed by automatic regular polling mode, if hadoop cluster server analysis finishes, reads result point Cloth file system files, and the distribution of results formula file system files of reading are returned into client, mark the inquiry to appoint Business has been completed;
Hadoop cluster servers, for receiving the MapReduce requests of application server initiation, open multiple Map from file The file of multiple daily record classes is read in hadoop cluster server by server, will read in the daily record class text of hadoop cluster server Part is output to Reduce compared with Client-initiated inquiry request if condition is met, the Reduce stages meet these The information of condition is saved in distribution of results formula file system files, and filename is named according to inquiry request, wherein, daily record Class file position is calculated by filename of the application server in request, scope, condition, and Map quantity is by application service Device is drawn after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request.
6. system according to claim 5, it is characterised in that the system also includes:Database server, the data Storehouse server contains the presupposed information of the daily record class file stored in file server, and the presupposed information is the daily record class The summary info of file;
Application server is specifically used for, and receives the inquiry request that client is initiated, and exist according to the inquiry request that client is initiated Inquired about in database server, relevant summary info is preserved in database server when inquiring, reads the phase The summary info of pass, and the MapReduce initiated to hadoop cluster server is formed according to the related abstract information of reading and is asked Middle Map logic parameters are sought, and hadoop cluster server is accessed by automatic regular polling mode, if hadoop cluster server analysis Finish, then read distribution of results formula file system files, and the distribution of results formula file system files of reading are returned into client End, marks the query task to complete, and shows when database server returns the result in database server without preservation phase The summary info of pass, then return to error code reporting client;
Hadoop cluster servers are specifically used for, receive application server initiate MapReduce request, open multiple Map from The file of multiple daily record classes is read in hadoop cluster server by file server, will read in the daily record of hadoop cluster server Class file is output to Reduce compared with Client-initiated inquiry request if condition is met, the Reduce stages by these Meeting that the information of condition is saved in distribution of results formula file system files, filename is named according to inquiry request, wherein, Daily record class file position is calculated by filename of the application server in request, scope, condition, and Map quantity is by applying Server is drawn after carrying out COMPREHENSIVE CALCULATING according to log file size, request scope, condition, number of clusters in request.
CN201711404242.7A 2017-12-22 2017-12-22 The self-defined search method of structured file and system based on hadoop Pending CN107967362A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711404242.7A CN107967362A (en) 2017-12-22 2017-12-22 The self-defined search method of structured file and system based on hadoop

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711404242.7A CN107967362A (en) 2017-12-22 2017-12-22 The self-defined search method of structured file and system based on hadoop

Publications (1)

Publication Number Publication Date
CN107967362A true CN107967362A (en) 2018-04-27

Family

ID=61994715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711404242.7A Pending CN107967362A (en) 2017-12-22 2017-12-22 The self-defined search method of structured file and system based on hadoop

Country Status (1)

Country Link
CN (1) CN107967362A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111090803A (en) * 2019-11-22 2020-05-01 贝壳技术有限公司 Data processing method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462351A (en) * 2014-12-05 2015-03-25 河海大学 Data query model and method for MapReduce pattern
CN104572727A (en) * 2013-10-22 2015-04-29 阿里巴巴集团控股有限公司 Data querying method and device
CN105389314A (en) * 2014-09-04 2016-03-09 中芯国际集成电路制造(上海)有限公司 Log file query system and query method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572727A (en) * 2013-10-22 2015-04-29 阿里巴巴集团控股有限公司 Data querying method and device
CN105389314A (en) * 2014-09-04 2016-03-09 中芯国际集成电路制造(上海)有限公司 Log file query system and query method
CN104462351A (en) * 2014-12-05 2015-03-25 河海大学 Data query model and method for MapReduce pattern

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
何克晶等: "《大数据前沿技术与应用》", 31 March 2017, 华南理工大学出版社 *
杨锋英等: ""基于Hadoop的在线网络日志分析系统研究"", 《计算机应用与软件》 *
陈梦杰等: "基于Hadoop的大数据查询系统简述", 《计算机与数字工程》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111090803A (en) * 2019-11-22 2020-05-01 贝壳技术有限公司 Data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN102722481B (en) The processing method of a kind of user's favorites data and searching method
CN110431545A (en) Inquiry is executed for structural data and unstructured data
CN106126648B (en) It is a kind of based on the distributed merchandise news crawler method redo log
US10546348B1 (en) Cleaning noise words from transaction descriptions
US20170220681A1 (en) System and method for automated domain-extensible web scraping
KR20070058684A (en) Method for searching data elements on the web using a conceptual metadata and contextual metadata search engine
US9886711B2 (en) Product recommendations over multiple stores
CN110334356A (en) Article matter method for determination of amount, article screening technique and corresponding device
CN108170731A (en) Data processing method, device, computer storage media and server
CN104579909A (en) Method and equipment for classifying user information and acquiring user grouping information
CN111382279A (en) Order examination method and device
US9652740B2 (en) Fan identity data integration and unification
CN113205402A (en) Account checking method and device, electronic equipment and computer readable medium
CN110377579A (en) File memory method, device and server
KR20160070282A (en) Providing system and method for shopping mall web site, program and recording medium thereof
WO2022111148A1 (en) Metadata indexing for information management
CN110009796A (en) Invoice category recognition methods, device, electronic equipment and readable storage medium storing program for executing
US11500840B2 (en) Contrasting document-embedded structured data and generating summaries thereof
CN107967362A (en) The self-defined search method of structured file and system based on hadoop
CN112581281A (en) Product recommendation method and device, storage medium and electronic equipment
US11822875B2 (en) Automatically evaluating summarizers
CN106940715B (en) A kind of method and apparatus of the inquiry based on concordance list
US20220342887A1 (en) Predictive query processing
KR101178998B1 (en) Method and System for Certificating Data
US11755633B2 (en) Entity search system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180427