CN104142968A - Solr technology based distributed searching method and system - Google Patents

Solr technology based distributed searching method and system Download PDF

Info

Publication number
CN104142968A
CN104142968A CN201310577657.XA CN201310577657A CN104142968A CN 104142968 A CN104142968 A CN 104142968A CN 201310577657 A CN201310577657 A CN 201310577657A CN 104142968 A CN104142968 A CN 104142968A
Authority
CN
China
Prior art keywords
distributed
file
classification
document
solr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310577657.XA
Other languages
Chinese (zh)
Inventor
吴含前
姚莉
王存哲
李露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201310577657.XA priority Critical patent/CN104142968A/en
Publication of CN104142968A publication Critical patent/CN104142968A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention discloses a solr technology based distributed searching method and system. The method comprises steps as follows: 1), when an off-line client system registers and files electronic documents, firstly, the electronic documents are automatically classified on the basis of a naive bayesian algorithm; 2), after the electronic documents are classified, the electronic documents are indexed in a distributed manner on the basis of a consistent Hash algorithm according to the classification of the electronic documents; and 3), after the indexing documents are established, a user inputs an inquiry statement for inquiring the electronic documents. The system adopts a distributed mode of an open source searching tool Solr and distributes the inquiry requests to the distributed nodes, each distributed node responds to the searching request, and then, a result is subjected to merging and duplication elimination and is returned to the user after well sorted, so that distributed vertical search is realized. With adoption of the manner, the accuracy for automatic classification of the electronic documents can be improved, and the stability of the system is improved.

Description

A kind of distributed search methods and system based on solr technology
Technical field
The present invention relates to information retrieval field, especially relate to a kind of distributed search methods and system based on solr technology.
Background technology
Internet technology obtains develop rapidly, online data volume sharp increase, and increasing of mass data produced tremendous influence to the search quality of universal search engine.At this moment, find accurately, fast the information oneself needing difficult on the net.Sum up its reason and have 3 points: the one,, online information is complicated unordered, and the information that likely duplicates of different websites, therefore utilize search engine inquiry to Search Results will produce information noise; The 2nd, only judge that according to the query terms of user's input the real search intention of user is very difficult; The 3rd, the reptile program of search engine can not crawl the information on all internets, captures in real time in other words network information.Now in the urgent need to there being a kind of appearance of the search engine for a certain field or theme.
Summary of the invention
The technical matters that the present invention mainly solves is to provide a kind of distributed search methods and system based on solr technology, can improve the accuracy of e-file automatic classification, the stability of enhancing system, and can merge duplicate removal, Auto-grouping to Search Results, realize vertical search, made to search for more absorbed, concrete and go deep into.
For solving the problems of the technologies described above, the technical scheme that the present invention adopts is: a kind of distributed search methods based on solr technology is provided, comprises the following steps:
1) in the time that offline client system is registered filing to e-file, first based on NB Algorithm, e-file is carried out to automatic classification;
2) after e-file classification, according to classification under e-file, based on consistance hash algorithm, e-file is carried out to distributed index, the content of index comprises the important metadata of e-file and the associated metadata of the electronic document that e-file comprises;
3) after index file is set up, user input query statement carries out the inquiry of e-file;
Wherein, described step 3) specifically comprises: adopt the distribution mode of the research tool Solr that increases income, inquiry request is distributed to distributed node, each distributed node response searching request, then merges duplicate removal to result, after sequence is good, returns to user.
In a preferred embodiment of the present invention, while e-file being carried out to automatic classification in described step 1), adopt coordinating factor dynamically to adjust the face that stresses of automatic classification, the size of described coordinating factor is 0-1.
In a preferred embodiment of the present invention, the size of described coordinating factor is 0.5.
In a preferred embodiment of the present invention, the NB Algorithm in described step 1), specifically comprises the following steps:
1.1) selection of dictionary and processing: adopt the index instrument of search engine to carry out respectively index process to the document of respective classes in dictionary;
1.2) extract the Feature Words of document to be sorted: adopt the installation component of search engine, summary and keyword message to document extract, and then the key word extracting are carried out to duplicate removal, select and get Feature Words;
1.3) adopt Bayesian formula and dictionary sample files to carry out Bayes's calculating the Feature Words extracting, obtain the probability of document to be sorted for each classification, then compare probable value, obtain maximum probability, thereby find the affiliated classification of document to be sorted.
In a preferred embodiment of the present invention, described step 1.3) described in Bayesian formula be:
Class(d)=argmax P(c|d);
Wherein, d: document;
C: classification;
Class (d): the classification under document;
P (c|d): document d belongs to the probability of classification c;
ArgmaxP (c|d): document belongs to the maximal value of a certain classification;
The value of P (c|d) is wherein drawn by following formula:
P(c|d)=λP(c)+ (1-λ)bayes(c|d);
Wherein, P (c): the set of given classification, in set, belong to the probability of c class, value is P (c)=1/n, the wherein number of n presentation class;
λ: coordinating factor;
Bayes (c|d): utilize Bayesian formula to obtain the probability that document d belongs to classification c.
The present invention also provides a kind of distributed search system, and described system comprises:
Automatic categorizer, for carrying out automatic classification to e-file;
Distributed index and searcher, replication mode and the distribution mode of employing Solr, backed up the index file of distributed node by replication mode, carries out distributed search by distribution mode.
In a preferred embodiment of the present invention, described system also comprises carries out the intelligent prompt device of intelligent prompt, Search Results is carried out to classified statistics device and the Search Results authority filtration unit of Auto-grouping statistics query statement.
The invention has the beneficial effects as follows: based on NB Algorithm, e-file is carried out automatic classification and introduce coordinating factor dynamically adjusting to the face that stresses of automatic classification, can improve the accuracy of e-file automatic classification; Based on consistance hash algorithm, e-file is carried out to distributed index, can strengthen the stability of system; By adopting the distribution mode of Slor, distributed node is optimized, and Search Results is merged to duplicate removal, Auto-grouping, realize vertical search, make to search for more absorbed, concrete and go deep into.
Brief description of the drawings
Fig. 1 is a kind of distributed search methods based on solr technology of the present invention and the schematic flow sheet of system;
Fig. 2 is the distributed index constitutional diagram the present invention is based in the distributed search methods of solr technology;
Fig. 3 is the distributed search process flow diagram the present invention is based in the distributed search methods of solr technology;
Fig. 4 is distributed search system software architecture diagram of the present invention;
Fig. 5 is the class interface design drawing of the automatic categorizer of distributed search system of the present invention;
Fig. 6 is the class interface design drawing of the distributed index device of distributed search system of the present invention;
Fig. 7 is the search intelligent prompt interface of distributed search system of the present invention;
Fig. 8 is the advanced search interface of distributed search system of the present invention;
Fig. 9 is the search result interfaces of distributed search system of the present invention;
In accompanying drawing, the mark of each parts is as follows: 1, index, 2, searcher.
Embodiment
Below in conjunction with accompanying drawing, preferred embodiment of the present invention is described in detail, thereby so that advantages and features of the invention can be easier to be it will be appreciated by those skilled in the art that, protection scope of the present invention is made to more explicit defining.
Refer to Fig. 1-Fig. 9, the embodiment of the present invention comprises:
A kind of distributed search system, described system comprises:
1) automatic categorizer, for carrying out automatic classification to e-file;
When ERMS offline client system is registered filing to e-file, carry out automatic classification to e-file, so that follow-up distributed index.Because the document under e-file may be inconsistent with the theme that file metadata is described, therefore can not carry out the judgement of final type completely to e-file according to the e-file type defining in ERMS offline client system.Automatic categorizer in the present embodiment has adopted coordinating factor that the size of the factor is set by user, determines by user the ratio that classification that ERMS offline client system defines and Bayes's classification respectively account for.Wherein, the size of the coordinating factor of acquiescence is 0.5.
Described Bayesian formula is:
Class(d)=argmax P(c|d);
Wherein, d: document;
C: classification;
Class (d): the classification under document;
P (c|d): document d belongs to the probability of classification c;
ArgmaxP (c|d): document belongs to the maximal value of a certain classification;
The value of P (c|d) is wherein drawn by following formula:
P(c|d)=λP(c)+ (1-λ)bayes(c|d);
Wherein, P (c): the set of given classification, in set, belong to the probability of c class, value is P (c)=1/n, the wherein number of n presentation class;
λ: coordinating factor, value is 0-1;
Bayes (c|d): utilize Bayesian formula to obtain the probability that document d belongs to classification c.
From above formula, in the time of λ=1, not according to bayesian algorithm, e-file is classified, classify according to the type of the e-file configuring in current ERMS offline client system completely; Otherwise, in the time of λ=0, according to Bayesian Classification Arithmetic, e-file is reclassified completely.
Because document d can be expressed as the set of n uncorrelated independently eigenwert, d=(w1, w2 ..., wn),
The calculating of bayes (c1d) can be obtained by bayesian algorithm, that is:
bayes(c|d)= =
Increase after coordinating factor, must ensure , existing as follows to algorithm proof:
1)
2)
3)
……
n)
Above n expression formula is added: ;
Due to therefore, , card is finished.
2) distributed index and searcher, index 1 and searcher 2 as shown in Figure 1, replication mode and the distribution mode of employing Solr, backed up the index file of distributed node by replication mode, carries out distributed search by distribution mode;
Because the e-file quantity of managing in ERMS offline client system will be exponential growth, the size of index file is inevitable also will be exponential growth, and in the time that the size of index file exceedes a certain threshold value, the speed of search and efficiency will be had a greatly reduced quality.So, in order to make system can tackle the search utilization of the e-file of magnanimity PE rank, in the present embodiment, adopt distributed strategy, based on the replication mode of consistance hash algorithm and Solr, index file is carried out to distributed storage and backup; Distribution mode based on Solr and facet face vertical search characteristic are to carrying out the strategy of distributed search.Adopt memcached and heartbeat strategy to carry out distributed storage and monitoring to distributed node state simultaneously.Fig. 2 is distributed index constitutional diagram.
The realization of the distributed search in the present embodiment has mainly adopted the shard distribution mode of Solr, it is user input query word, then from distributed caching device, obtain distributed host node or the distributed interim host node of survival, then request is distributed to the distributed node of survival, carry out corresponding by distributed node to request, master server is responsible for the Query Result of distributed node to gather, and then final Query Result is fed back to user.Fig. 3 is the process flow diagram of distributed search.
3) intelligent prompt device, for carrying out intelligent prompt to user's query statement;
4) classified statistics device, for carrying out Auto-grouping statistics to Search Results;
5) Search Results authority filtration unit.
Native system has the functions such as automatic classification, distributed index, distributed search, intelligent prompt, classified statistics and the filtration of Search Results authority of data.Wherein, replication mode and distribution mode that distributed index and search have mainly adopted Solr, backed up the index file of distributed node by replication mode, carries out distributed search by distribution mode.
Based on a distributed search methods for solr technology, comprise the following steps:
1) in the time that offline client system is registered filing to e-file, first based on NB Algorithm, e-file is carried out to automatic classification;
Wherein, described NB Algorithm specifically comprises the following steps:
1.1) selection of dictionary and processing: first should select the judgement of more authoritative sample for classifying, the sample of selecting in the present embodiment derives from search dog dictionary (standard edition).Because this dictionary is larger, if the each sample files in document and dictionary is contrasted, need to consume the time at least 10 seconds, in order to improve the speed of index, in the present embodiment, adopt the index instrument IndexWriter of Lucene to carry out respectively index process to the document of respective classes in dictionary;
1.2) extract the Feature Words of document to be sorted: treat the extraction of classifying documents Feature Words, adopt the Tika installation component of Lucene search engine.In order to improve the speed of index, only summary and the keyword message to document extracts, and then the key word extracting carried out to duplicate removal;
1.3) adopt Bayesian formula and dictionary sample files to carry out Bayes's calculating the Feature Words extracting, obtain the probability of document to be sorted for each classification, then compare probable value, obtain maximum probability, thereby find the affiliated classification of document to be sorted.
2) after e-file classification, according to classification under e-file, based on consistance hash algorithm, e-file is carried out to distributed index, the content of index comprises the important metadata of e-file and the associated metadata of the electronic document that e-file comprises;
3) after index file is set up, user input query statement carries out the inquiry of e-file;
Wherein, described step 3) specifically comprises: adopt the distribution mode of the research tool Solr that increases income, inquiry request is distributed to distributed node, each distributed node response searching request, then merges duplicate removal to result, after sequence is good, returns to user.
Native system utilizes search engine technique, based on ERMS system, Design and implementation distributed vertical search engine.In the time that ERMS offline client system is registered filing to e-file, first based on NB Algorithm, e-file is carried out to automatic classification; After e-file classification, according to classification under e-file, based on consistance hash algorithm, e-file is carried out to distributed index, the content of index comprises the important metadata of e-file and the associated metadata of the electronic document that e-file comprises; After index file is set up, user can carry out the inquiry of e-file by input inquiry statement, specific implementation has adopted distribution (shard) pattern of the research tool Solr that increases income, inquiry request is distributed to distributed node, each distributed node response searching request, then result is merged to duplicate removal, sort and return to user.From the stability of system, mainly consider that following two aspects carry out system optimization: the one, high concurrent request processing aspect, mainly distributed node is optimized, has introduced load balancing simultaneously, make the user's request in can the fast processing high concurrent situation of system; The 2nd, system disaster tolerance aspect, has adopted MS master-slave (master-slave) framework, based on Observer Pattern, the index file on distributed node is carried out to timed backup.In the time that distributed node breaks down, serve as its role by backup node, respond index and search request.
The present invention has disclosed a kind of distributed search methods and system based on solr technology, based on NB Algorithm, e-file is carried out automatic classification and introduce coordinating factor dynamically adjusting to the face that stresses of automatic classification, can improve the accuracy of e-file automatic classification; Based on consistance hash algorithm, e-file is carried out to distributed index, can strengthen the stability of system; By adopting the distribution mode of Slor, distributed node is optimized, and Search Results is merged to duplicate removal, Auto-grouping, realize vertical search, make to search for more absorbed, concrete and go deep into.
The foregoing is only embodiments of the invention; not thereby limit the scope of the claims of the present invention; every equivalent structure or conversion of equivalent flow process that utilizes instructions of the present invention and accompanying drawing content to do; or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.

Claims (7)

1. the distributed search methods based on solr technology, is characterized in that, comprises the following steps:
1) in the time that offline client system is registered filing to e-file, first based on NB Algorithm, e-file is carried out to automatic classification;
2) after e-file classification, according to classification under e-file, based on consistance hash algorithm, e-file is carried out to distributed index, the content of index comprises the important metadata of e-file and the associated metadata of the electronic document that e-file comprises;
3) after index file is set up, user input query statement carries out the inquiry of e-file;
Wherein, described step 3) specifically comprises: adopt the distribution mode of the research tool Solr that increases income, inquiry request is distributed to distributed node, each distributed node response searching request, then merges duplicate removal to result, after sequence is good, returns to user.
2. the distributed search methods based on solr technology according to claim 1, it is characterized in that, while e-file being carried out to automatic classification in described step 1), adopt coordinating factor dynamically to adjust the face that stresses of automatic classification, the size of described coordinating factor is 0-1.
3. the distributed search methods based on solr technology according to claim 2, is characterized in that, the size of described coordinating factor is 0.5.
4. the distributed search methods based on solr technology according to claim 1, is characterized in that, the NB Algorithm in described step 1), specifically comprises the following steps:
1.1) selection of dictionary and processing: adopt the index instrument of search engine to carry out respectively index process to the document of respective classes in dictionary;
1.2) extract the Feature Words of document to be sorted: adopt the installation component of search engine, summary and keyword message to document extract, and then the key word extracting are carried out to duplicate removal, select and get Feature Words;
1.3) adopt Bayesian formula and dictionary sample files to carry out Bayes's calculating the Feature Words extracting, obtain the probability of document to be sorted for each classification, then compare probable value, obtain maximum probability, thereby find the affiliated classification of document to be sorted.
5. the distributed search methods based on solr technology according to claim 4, is characterized in that, described step 1.3) described in Bayesian formula be:
Class(d)=argmax P(c|d);
Wherein, d: document;
C: classification;
Class (d): the classification under document;
P (c|d): document d belongs to the probability of classification c;
ArgmaxP (c|d): document belongs to the maximal value of a certain classification;
The value of P (c|d) is wherein drawn by following formula:
P(c|d)=λP(c)+ (1-λ)bayes(c|d);
Wherein, P (c): the set of given classification, in set, belong to the probability of c class, value is P (c)=1/n, the wherein number of n presentation class;
λ: coordinating factor;
Bayes (c|d): utilize Bayesian formula to obtain the probability that document d belongs to classification c.
6. a distributed search system, is characterized in that, described system comprises:
Automatic categorizer, for carrying out automatic classification to e-file;
Distributed index and searcher, replication mode and the distribution mode of employing Solr, backed up the index file of distributed node by replication mode, carries out distributed search by distribution mode.
7. distributed search system according to claim 6, it is characterized in that, described system also comprises carries out the intelligent prompt device of intelligent prompt, Search Results is carried out to classified statistics device and the Search Results authority filtration unit of Auto-grouping statistics query statement.
CN201310577657.XA 2013-11-19 2013-11-19 Solr technology based distributed searching method and system Pending CN104142968A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310577657.XA CN104142968A (en) 2013-11-19 2013-11-19 Solr technology based distributed searching method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310577657.XA CN104142968A (en) 2013-11-19 2013-11-19 Solr technology based distributed searching method and system

Publications (1)

Publication Number Publication Date
CN104142968A true CN104142968A (en) 2014-11-12

Family

ID=51852142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310577657.XA Pending CN104142968A (en) 2013-11-19 2013-11-19 Solr technology based distributed searching method and system

Country Status (1)

Country Link
CN (1) CN104142968A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778278A (en) * 2015-04-30 2015-07-15 亚信科技(南京)有限公司 Solr-based searching method and searching application server AUS
CN104899268A (en) * 2015-05-25 2015-09-09 浪潮集团有限公司 Distributed enterprise information vertical search method
CN105282045A (en) * 2015-11-17 2016-01-27 高新兴科技集团股份有限公司 Distributed calculating and storage method based on consistent Hash algorithm
CN106487582A (en) * 2016-09-21 2017-03-08 努比亚技术有限公司 A kind of method and apparatus of deployment search server
CN107395412A (en) * 2017-07-18 2017-11-24 浪潮天元通信信息系统有限公司 The method and apparatus of warning information inquiry
CN108563649A (en) * 2017-12-12 2018-09-21 南京富士通南大软件技术有限公司 Offline De-weight method based on GlusterFS distributed file systems
CN109033283A (en) * 2018-07-12 2018-12-18 广州市闲愉凡生信息科技有限公司 A kind of distributed search methods of cloud computing platform
CN110413771A (en) * 2019-06-18 2019-11-05 平安科技(深圳)有限公司 Classified index method, apparatus, equipment and storage medium based on solr
CN110659157A (en) * 2019-08-30 2020-01-07 安徽芃睿科技有限公司 Distributed multi-language retrieval platform and method for lossless recovery
CN114362953A (en) * 2020-10-13 2022-04-15 北京泛融科技有限公司 Document content rapid extraction and verification method based on zero knowledge proof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101059796A (en) * 2006-04-19 2007-10-24 中国科学院自动化研究所 Two-stage combined file classification method based on probability subject
CN102867265A (en) * 2011-07-08 2013-01-09 北京亿赞普网络技术有限公司 Online advertising weight calculation system and calculation method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101059796A (en) * 2006-04-19 2007-10-24 中国科学院自动化研究所 Two-stage combined file classification method based on probability subject
CN102867265A (en) * 2011-07-08 2013-01-09 北京亿赞普网络技术有限公司 Online advertising weight calculation system and calculation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王腾飞: "基于Solr的分布式实时全文检索系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778278A (en) * 2015-04-30 2015-07-15 亚信科技(南京)有限公司 Solr-based searching method and searching application server AUS
CN104899268A (en) * 2015-05-25 2015-09-09 浪潮集团有限公司 Distributed enterprise information vertical search method
CN105282045B (en) * 2015-11-17 2018-11-16 高新兴科技集团股份有限公司 A kind of distributed computing and storage method based on consistency hash algorithm
CN105282045A (en) * 2015-11-17 2016-01-27 高新兴科技集团股份有限公司 Distributed calculating and storage method based on consistent Hash algorithm
CN106487582A (en) * 2016-09-21 2017-03-08 努比亚技术有限公司 A kind of method and apparatus of deployment search server
CN107395412A (en) * 2017-07-18 2017-11-24 浪潮天元通信信息系统有限公司 The method and apparatus of warning information inquiry
CN108563649A (en) * 2017-12-12 2018-09-21 南京富士通南大软件技术有限公司 Offline De-weight method based on GlusterFS distributed file systems
CN108563649B (en) * 2017-12-12 2021-12-07 南京富士通南大软件技术有限公司 Offline duplicate removal method based on GlusterFS distributed file system
CN109033283A (en) * 2018-07-12 2018-12-18 广州市闲愉凡生信息科技有限公司 A kind of distributed search methods of cloud computing platform
CN110413771A (en) * 2019-06-18 2019-11-05 平安科技(深圳)有限公司 Classified index method, apparatus, equipment and storage medium based on solr
CN110659157A (en) * 2019-08-30 2020-01-07 安徽芃睿科技有限公司 Distributed multi-language retrieval platform and method for lossless recovery
CN114362953A (en) * 2020-10-13 2022-04-15 北京泛融科技有限公司 Document content rapid extraction and verification method based on zero knowledge proof
CN114362953B (en) * 2020-10-13 2023-12-12 北京泛融科技有限公司 Document content rapid extraction verification method based on zero knowledge proof

Similar Documents

Publication Publication Date Title
CN104142968A (en) Solr technology based distributed searching method and system
CN108304444B (en) Information query method and device
Ni et al. Short text clustering by finding core terms
CN101216826B (en) Information search system and method
Reinanda et al. Mining, ranking and recommending entity aspects
CN102662965A (en) Method and system of automatically discovering hot news theme on the internet
WO2012177794A2 (en) Identifying information related to a particular entity from electronic sources, using dimensional reduction and quantum clustering
CN106294695A (en) A kind of implementation method towards the biggest data search engine
WO2007085187A1 (en) Method of data retrieval, method of generating index files and search engine
WO2018117975A1 (en) Systems and methods for intelligent prospect identification using online resources and neural network processing to classify organizations based on published materials
CN105183884A (en) Search engine system and method based on big data technique
KR20160053933A (en) Smart search refinement
Psallidas et al. Effective Event Identification in Social Media.
CN101957860B (en) Method and device for releasing and searching information
CN109542930A (en) A kind of data efficient search method based on ElasticSearch
Gagliardelli et al. Bigdedup: a big data integration toolkit for duplicate detection in industrial scenarios
Zhang et al. A hot spot clustering method based on improved kmeans algorithm
CN110019380B (en) Data query method, device, server and storage medium
Cremaschi et al. s-elBat: A Semantic Interpretation Approach for Messy taBle-s.
CN103399952A (en) Relational database retrieval system and method based on keywords
Shekhar et al. A WEBIR crawling framework for retrieving highly relevant web documents: evaluation based on rank aggregation and result merging algorithms
CN102508920A (en) Information retrieval method based on Boosting sorting algorithm
Lu et al. Research and implementation of big data system of social media
US9646099B2 (en) Generating resources for support of online services
Rashmi et al. Deep web crawler: exploring and re-ranking of web forms

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141112

RJ01 Rejection of invention patent application after publication