CN105320746A - Big data based index acquisition method and system - Google Patents

Big data based index acquisition method and system Download PDF

Info

Publication number
CN105320746A
CN105320746A CN201510622636.4A CN201510622636A CN105320746A CN 105320746 A CN105320746 A CN 105320746A CN 201510622636 A CN201510622636 A CN 201510622636A CN 105320746 A CN105320746 A CN 105320746A
Authority
CN
China
Prior art keywords
data
url
keyword
index
rowkey
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510622636.4A
Other languages
Chinese (zh)
Inventor
龚建新
王周松
郑平贺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing VRV Software Corp Ltd
Original Assignee
Beijing VRV Software Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing VRV Software Corp Ltd filed Critical Beijing VRV Software Corp Ltd
Priority to CN201510622636.4A priority Critical patent/CN105320746A/en
Publication of CN105320746A publication Critical patent/CN105320746A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention provides a big data based index acquisition method and system. The big data based index acquisition method comprises: performing a first analysis on data, to acquire a keyword of the data; classifying the data according to the keyword, storing the classified data into a database, acquiring a rowkey corresponding to the classified data; and establishing an index according to the rowkey corresponding to the classified data and the keyword. According to the big data based index acquisition method and system provided by the present invention, the index is established by using the keyword in the data and the rowkey generated when the data is stored, and a mapping relationship between the data in the database and the rowkey in the index is established, so that in subsequent retrieval, data corresponding to the rowkey can be acquired only by acquiring the rowkey, and a retrieval speed in massive data is improved.

Description

A kind of index acquisition methods based on large data and system
Technical field
The present invention relates to field of data retrieval, particularly a kind of index acquisition methods based on large data and system.
Background technology
Along with the development of social informatization degree, society have entered large data age.Data volume is large, the storage of data and full-text search become the bottleneck hindering Informatization Development.A lot of data incorporate some relevant databases, such as SQLServer, Mysql, Oracle etc. at present, and its data retrieval depends on index or subregion, the submeter etc. of database itself.Inquiry velocity or acceptable when data volume is smaller, but to store along with the increase of data volume, Database Systems and recall precision declines with regard to straight line, until database corruption.Trace it to its cause, these relational datas are not just large data and existing.
For storage and the full-text search of large data, prior art proposes a kind of nosql database, and the speed of the pressure adopting the combination of mongdb and solr to alleviate relevant database, the data deposited and retrieval has had large increase.But along with mass data stored in, its performance bottleneck also shows, cause store and search speed more and more slower.
Summary of the invention
For defect of the prior art, the invention provides a kind of index acquisition methods based on large data, the method sets up index by the rowkey of the keyword in data and storage data genaration, can carry out high efficiency data retrieval in the data of magnanimity.
The invention provides a kind of index acquisition methods based on large data, comprising:
Carry out first time to data to resolve, obtain the keyword of data;
According to described keyword, data are classified, and by sorted datum number storage according to storehouse, obtain the rowkey corresponding with described sorted data;
Index set up in the rowkey corresponding according to described and described sorted data and described keyword.
Optionally, before data being carried out to first time parsing, comprising:
Obtain the URL of multiple data to be obtained;
URL in the url history storehouse of URL and the Hbase cluster of each data to be obtained is mated, if the URL of data to be obtained is new URL, then described new URL is imported queue to be crawled, until the URL of all data to be obtained has mated;
URL in queue to be crawled described in obtaining successively, and obtain data to be obtained according to the URL that described URL crawls in queue.
Optionally, before data being carried out to first time parsing, comprising:
Judge whether to get described data to be obtained;
And when not getting described data to be obtained, the URL continuing to crawl in queue according to described URL obtains data to be obtained, and the number of times obtaining data to be obtained is added 1;
If the number of times of described acquisition data to be obtained is preset times and does not obtain to treat described data to be obtained, then the error library of URL corresponding to described data to be obtained stored in described Hbase cluster will do not got.
Optionally, by sorted datum number storage according to storehouse before, comprising:
Sorted data are carried out packing compression according to preset strategy;
File after air exercise packet compression carries out second time and resolves, and the data after second time being resolved are stored into database.
Optionally, described method also comprises: the step being obtained data by described index;
The described step being obtained data by described index, being comprised:
Obtain the keyword of user's input, according to the keyword of user's input in the index of search server, obtain the rowkey corresponding with keyword;
According to the described rowkey corresponding with keyword, obtain data corresponding with rowkey in database.
The invention allows for a kind of index based on large data and obtain system, comprising:
First parsing module: resolve for carrying out first time to data, obtain the keyword of data;
First acquisition module: for classifying to data according to described keyword, and by sorted datum number storage according to storehouse, obtains the rowkey corresponding with described sorted data;
Set up module: for setting up index according to rowkey corresponding to described and described sorted data and described keyword.
Optionally, this system also comprises:
Second acquisition module: for obtaining the URL of multiple data to be obtained;
Matching module: mate for the URL in the url history storehouse of URL and the Hbase cluster to each data to be obtained, if the URL of data to be obtained is new URL, then described new URL is imported queue to be crawled, until the URL of all data to be obtained has mated;
3rd acquisition module: for the URL in queue to be crawled described in obtaining successively, and obtain data to be obtained according to the URL that described URL crawls in queue
Optionally, this system also comprises:
Judge module: get described data to be obtained for judging whether;
And when not getting described data to be obtained, the URL continuing to crawl in queue according to described URL obtains data to be obtained, and the number of times obtaining data to be obtained is added 1;
If the number of times of described acquisition data to be obtained is preset times and does not obtain to treat described data to be obtained, then the error library of URL corresponding to described data to be obtained stored in described Hbase cluster will do not got.
Optionally, this system also comprises:
Packetization module: for sorted data are carried out packing compression according to preset strategy;
Second parsing module: carry out second time for the file after packet compression of fighting each other and resolve, and the data after second time being resolved are stored into database.
Optionally, this system also comprises:
4th acquisition module: for obtaining the keyword of user's input, according to the keyword of user's input in the index of search server, obtain the rowkey corresponding with keyword;
5th acquisition module: for according to the described rowkey corresponding with keyword, obtain data corresponding with roekey in database.
As shown from the above technical solution, index access method based on large data of the present invention, the rowkey generated by the keyword in data and when storing data sets up index, so that the data in database and the rowkey in index are set up corresponding relation, in follow-up retrieval, only need to obtain rowkey, the data that rowkey is corresponding can be obtained, improve the retrieval rate in the data of magnanimity.
Accompanying drawing explanation
Can understanding the features and advantages of the present invention clearly by reference to accompanying drawing, accompanying drawing is schematic and should not be construed as and carry out any restriction to the present invention, in the accompanying drawings:
Fig. 1 shows the process flow diagram of the index acquisition methods based on large data that one embodiment of the invention provides;
Fig. 2 shows the process flow diagram of the index acquisition methods based on large data that another embodiment of the present invention provides;
The index based on large data that Fig. 3 shows one embodiment of the invention to be provided obtains the structural representation of system.
Embodiment
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 shows the process flow diagram of the index acquisition methods based on large data that one embodiment of the invention provides, and with reference to Fig. 1, the index acquisition methods based on large data of the present embodiment, comprising:
Step 101, data are carried out to first time and resolve, obtain the keyword of data;
Step 102, according to described keyword, data to be classified, and by sorted datum number storage according to storehouse, obtain the rowkey corresponding with described sorted data;
Step 103, set up index according to rowkey corresponding to described and described sorted data and described keyword.
The rowkey generated by the keyword in data and when storing data sets up index, so that the data in database and the rowkey in index are set up corresponding relation, in follow-up retrieval, only need to obtain rowkey, the data that rowkey is corresponding can be obtained, improve the speed retrieved in mass data.
In order to improve the efficiency of data acquisition, the present invention is obtaining the URL of multiple data to be obtained; URL in the url history storehouse of URL and the Hbase cluster of each data to be obtained is mated, if the URL of data to be obtained is new URL, then described new URL is imported queue to be crawled, until the URL of all data to be obtained has mated; URL in queue to be crawled described in obtaining successively, and according to the URL data to be obtained that described URL crawls in queue.
Wherein, the order of the URL in queue to be crawled described in acquisition can be the order of first in first out.
In order to further improve the efficiency of data acquisition and storage, the present invention, before data being carried out to first time parsing, also detects crawling operation, to judge whether to get described data to be obtained; And when not getting described data to be obtained, the URL continuing to crawl in queue according to described URL obtains data to be obtained, and the number of times obtaining data to be obtained is added 1; If the number of times of described acquisition data to be obtained is preset times and does not obtain to treat described data to be obtained, then the error library of URL corresponding to described data to be obtained stored in described Hbase cluster will do not got.
In order to improve data store efficiency, the present invention by sorted datum number storage according to storehouse before, also need sorted data assembling to become XML file, and according to preset strategy carry out packing compress; File after air exercise packet compression carries out second time and resolves, and the data after second time being resolved are stored into database.
The method, after getting above-mentioned index, also comprises: obtain data according to above-mentioned index, concrete steps are as follows:
Obtain the keyword of user's input, according to the keyword of user's input in the index of search server, obtain the rowkey corresponding with keyword;
According to the described rowkey corresponding with keyword, obtain data corresponding with roekey in database.
By above-mentioned index, the keyword that user inputs is inquired about, to obtain rowkey corresponding to keyword, and find corresponding data in a database by this rowkey, to realize the effect carrying out retrieval efficiently in the database of magnanimity.
The process flow diagram of the data capture method that Fig. 2 provides for another embodiment of the present invention, with reference to Fig. 2, is described in detail to index acquisition methods and based on the method for this index acquisition data below:
Step 201, obtain the URL of multiple data to be obtained from internet based on database;
Step 202, judge whether the URL of data to be obtained is existing URL, based on the URL in the url history storehouse of Hbase cluster, the URL of each data to be obtained is mated, obtain non-existent URL in url history storehouse, and import successively in queue to be crawled and url history storehouse, and the URL repeated with the URL in url history storehouse in the URL of data to be obtained is abandoned;
Step 203, choose and be with the URL that crawls in queue successively, and the data obtained on the URL that selects, and when not getting described data to be obtained, the URL continuing to crawl in queue according to this URL obtains data to be obtained, and the number of times obtaining data to be obtained is added 1;
If the number of times obtaining data to be obtained is less than three times, then the URL selected is imported described band again and crawl queue; If or the number of times crawling failure reaches three times, then the URL selected and abnormal information are imported the error library of described Hbase cluster.
Wherein, the preset times in step 203 is only be used to for three times conveniently understand this technical side's scheme, can depend on the circumstances.
Step 204, the resolution rules obtained in rule base, and according to resolution rules, first time parsing is carried out to the data obtained, obtain the keyword in data;
Wherein, keyword is the word in common dictionary, such as: time, title, content, author etc.;
The keyword that step 205, basis parse, classifies to data, and is assembled into XML file, carries out packing compression according to preset strategy;
File after data loading middleware air exercise packet compression in step 206, server carries out second time and resolves, and the data after second time being resolved are stored into database, and database will generate corresponding rowkey unique with data automatically; Meanwhile, according to the keyword in these data and corresponding rowkey unique with these data, set up index, and index is imported in search server ElasticSearch cluster;
Wherein, along with data continuous stored in, database will set up rowkey sequence, also will there is identical rowkey sequence in search server;
The service interface middleware of step 207, server receives the keyword for inquiring about that user is inputted by client, according to keyword, inquires the rowkey corresponding with the keyword inputted in the index of service interface middleware in search server;
Will be understood that, if when having multiple data corresponding with the keyword of input, acquisition will be rowkey list;
Step 208, based on obtain rowkey or rowkey list, service interface middleware obtains data corresponding to this rowkey or rowkey list in a database.
The structural representation of the acquisition of the index based on the large data system that Fig. 3 provides for one embodiment of the invention, with reference to Fig. 3, the invention allows for a kind of index based on large data and obtain system, this system comprises:
First parsing module 31: resolve for carrying out first time to data, obtain the keyword of data;
First acquisition module 32: for classifying to data according to described keyword, and by sorted datum number storage according to storehouse, obtains the rowkey corresponding with described sorted data;
Set up module 33: for setting up index according to rowkey corresponding to described and described sorted data and described keyword.
In order to improve the efficiency of data acquisition, the present invention carried out pre-service before transferring data to the first parsing module, and this system also comprises:
Second acquisition module 34: for obtaining the URL of multiple data to be obtained;
Matching module 35: mate for the URL in the url history storehouse of URL and the Hbase cluster to each data to be obtained, if the URL of data to be obtained is new URL, then described new URL is imported queue to be crawled, until the URL of all data to be obtained has mated;
Crawl module 36: for the URL in queue to be crawled described in obtaining successively, and obtain data to be obtained according to the URL that described URL crawls in queue.
In order to further improve the efficiency of data acquisition and storage, the present invention also comprises:
Judge module 37: for judging whether to crawl successfully;
And when not getting described data to be obtained, the URL continuing to crawl in queue according to described URL obtains data to be obtained, and the number of times obtaining data to be obtained is added 1;
If the number of times of described acquisition data to be obtained is preset times and does not obtain to treat described data to be obtained, then the error library of URL corresponding to described data to be obtained stored in described Hbase cluster will do not got.
In order to improve the efficiency that data store, this system also comprises:
Packetization module 38: for sorted data are carried out packing compression according to preset strategy;
Second parsing module 39: carry out second time for the file after packet compression of fighting each other and resolve, and the data after second time being resolved are stored into database.
This system also comprises:
4th acquisition module 40: for obtaining the keyword of user's input, according to the keyword of user's input in the index of search server, obtain the rowkey corresponding with keyword;
5th acquisition module 41: for according to the described rowkey corresponding with keyword, obtain data corresponding with roekey in database.
The rowkey that this method generates by the keyword in data and when storing data sets up index, so that the data in database and the rowkey in index are set up corresponding relation, in follow-up retrieval, only need to obtain rowkey, the data that rowkey is corresponding can be obtained, improve the speed retrieved in mass data.
Although describe embodiments of the present invention by reference to the accompanying drawings, but those skilled in the art can make various modifications and variations without departing from the spirit and scope of the present invention, such amendment and modification all fall into by within claims limited range.

Claims (10)

1., based on an index acquisition methods for large data, it is characterized in that, comprising:
Carry out first time to data to resolve, obtain the keyword of data;
According to described keyword, data are classified, and by sorted datum number storage according to storehouse, obtain the rowkey corresponding with described sorted data;
Index set up in the rowkey corresponding according to described and described sorted data and described keyword.
2. method according to claim 1, is characterized in that, before data being carried out to first time parsing, comprising:
Obtain the URL of multiple data to be obtained;
URL in the url history storehouse of URL and the Hbase cluster of each data to be obtained is mated, if the URL of data to be obtained is new URL, then described new URL is imported queue to be crawled, until the URL of all data to be obtained has mated;
URL in queue to be crawled described in obtaining successively, and obtain data to be obtained according to the URL that described URL crawls in queue.
3. method according to claim 2, is characterized in that, before data being carried out to first time parsing, comprising:
Judge whether to get described data to be obtained;
And when not getting described data to be obtained, the URL continuing to crawl in queue according to described URL obtains data to be obtained, and the number of times obtaining data to be obtained is added 1;
If the number of times of described acquisition data to be obtained is preset times and does not obtain to treat described data to be obtained, then the error library of URL corresponding to described data to be obtained stored in described Hbase cluster will do not got.
4. method according to claim 1, is characterized in that, by sorted datum number storage according to storehouse before, comprising:
Sorted data are carried out packing compression according to preset strategy;
File after air exercise packet compression carries out second time and resolves, and the data after second time being resolved are stored into database.
5. the method according to any one of claim 1-4, is characterized in that, described method also comprises: the step being obtained data by described index;
The described step being obtained data by described index, being comprised:
Obtain the keyword of user's input, according to the keyword of user's input in the index of search server, obtain the rowkey corresponding with keyword;
According to the described rowkey corresponding with keyword, obtain data corresponding with roekey in database.
6. the index based on large data obtains a system, it is characterized in that, comprising:
First parsing module: resolve for carrying out first time to data, obtain the keyword of data;
First acquisition module: for classifying to data according to described keyword, and by sorted datum number storage according to storehouse, obtains the rowkey corresponding with described sorted data;
Set up module: for setting up index according to rowkey corresponding to described and described sorted data and described keyword.
7. system according to claim 6, is characterized in that, comprising:
Second acquisition module: for obtaining the URL of multiple data to be obtained;
Matching module: mate for the URL in the url history storehouse of URL and the Hbase cluster to each data to be obtained, if the URL of data to be obtained is new URL, then described new URL is imported queue to be crawled, until the URL of all data to be obtained has mated;
3rd acquisition module: for the URL in queue to be crawled described in obtaining successively, and obtain data to be obtained according to the URL that described URL crawls in queue.
8. system according to claim 7, is characterized in that, comprising:
Judge module: get described data to be obtained for judging whether;
And when not getting described data to be obtained, the URL continuing to crawl in queue according to described URL obtains data to be obtained, and the number of times obtaining data to be obtained is added 1;
If the number of times of described acquisition data to be obtained is preset times and does not obtain to treat described data to be obtained, then the error library of URL corresponding to described data to be obtained stored in described Hbase cluster will do not got.
9. system according to claim 6, is characterized in that, comprising:
Packetization module: for sorted data are carried out packing compression according to preset strategy;
Second parsing module: carry out second time for the file after packet compression of fighting each other and resolve, and the data after second time being resolved are stored into database.
10. the system according to any one of claim 6 ~ 9, is characterized in that, comprising:
4th acquisition module: for obtaining the keyword of user's input, according to the keyword of user's input in the index of search server, obtain the rowkey corresponding with keyword;
5th acquisition module: for according to the described rowkey corresponding with keyword, obtain data corresponding with roekey in database.
CN201510622636.4A 2015-09-25 2015-09-25 Big data based index acquisition method and system Pending CN105320746A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510622636.4A CN105320746A (en) 2015-09-25 2015-09-25 Big data based index acquisition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510622636.4A CN105320746A (en) 2015-09-25 2015-09-25 Big data based index acquisition method and system

Publications (1)

Publication Number Publication Date
CN105320746A true CN105320746A (en) 2016-02-10

Family

ID=55248132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510622636.4A Pending CN105320746A (en) 2015-09-25 2015-09-25 Big data based index acquisition method and system

Country Status (1)

Country Link
CN (1) CN105320746A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909671A (en) * 2017-02-28 2017-06-30 湖南蚁坊软件股份有限公司 A kind of method and system of NoSQL databases condition query
CN107644049A (en) * 2016-07-21 2018-01-30 虹光精密工业股份有限公司 Retrieval index generation method and server applying same
WO2018027463A1 (en) * 2016-08-08 2018-02-15 深圳市博信诺达经贸咨询有限公司 Application method and system for keyword analysis in big data
WO2018095037A1 (en) * 2016-11-24 2018-05-31 杭州海康威视数字技术股份有限公司 Method and device for obtaining data in cloud storage system
CN108897804A (en) * 2018-06-15 2018-11-27 东北大学秦皇岛分校 A kind of search system and method for the Internet space data
CN110347722A (en) * 2019-07-11 2019-10-18 软通智慧科技有限公司 Data capture method, device, equipment and storage medium based on HBase
CN110413771A (en) * 2019-06-18 2019-11-05 平安科技(深圳)有限公司 Classified index method, apparatus, equipment and storage medium based on solr
CN111176650A (en) * 2018-11-09 2020-05-19 阿里巴巴集团控股有限公司 Parser generation method, search method, server, and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033870A (en) * 2009-09-24 2011-04-27 中国移动通信集团公司 Data searching method and device
CN102193917A (en) * 2010-03-01 2011-09-21 中国移动通信集团公司 Method and device for processing and querying data
CN103729429A (en) * 2013-12-26 2014-04-16 浪潮电子信息产业股份有限公司 Hbase based compression method
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104573022A (en) * 2015-01-12 2015-04-29 浪潮软件股份有限公司 Data query method and device for HBase
CN104820670A (en) * 2015-03-13 2015-08-05 国家电网公司 Method for acquiring and storing big data of power information
CN104850640A (en) * 2015-05-26 2015-08-19 华北电力大学(保定) HBase based storage and query method and system for power equipment status monitoring data
CN104915450A (en) * 2015-07-01 2015-09-16 武汉大学 HBase-based big data storage and retrieval method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033870A (en) * 2009-09-24 2011-04-27 中国移动通信集团公司 Data searching method and device
CN102193917A (en) * 2010-03-01 2011-09-21 中国移动通信集团公司 Method and device for processing and querying data
CN103729429A (en) * 2013-12-26 2014-04-16 浪潮电子信息产业股份有限公司 Hbase based compression method
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104573022A (en) * 2015-01-12 2015-04-29 浪潮软件股份有限公司 Data query method and device for HBase
CN104820670A (en) * 2015-03-13 2015-08-05 国家电网公司 Method for acquiring and storing big data of power information
CN104850640A (en) * 2015-05-26 2015-08-19 华北电力大学(保定) HBase based storage and query method and system for power equipment status monitoring data
CN104915450A (en) * 2015-07-01 2015-09-16 武汉大学 HBase-based big data storage and retrieval method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汤羽 等: "《基于HDFS开源架构与多级索引表的海量数据检索mDHT算法》", 《计算机科学》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644049A (en) * 2016-07-21 2018-01-30 虹光精密工业股份有限公司 Retrieval index generation method and server applying same
US11093713B2 (en) 2016-07-21 2021-08-17 Avision Inc. Method for generating search index and server utilizing the same
WO2018027463A1 (en) * 2016-08-08 2018-02-15 深圳市博信诺达经贸咨询有限公司 Application method and system for keyword analysis in big data
WO2018095037A1 (en) * 2016-11-24 2018-05-31 杭州海康威视数字技术股份有限公司 Method and device for obtaining data in cloud storage system
CN108111557A (en) * 2016-11-24 2018-06-01 杭州海康威视数字技术股份有限公司 The method and device of data in a kind of acquisition cloud storage system
CN106909671A (en) * 2017-02-28 2017-06-30 湖南蚁坊软件股份有限公司 A kind of method and system of NoSQL databases condition query
CN108897804A (en) * 2018-06-15 2018-11-27 东北大学秦皇岛分校 A kind of search system and method for the Internet space data
CN111176650A (en) * 2018-11-09 2020-05-19 阿里巴巴集团控股有限公司 Parser generation method, search method, server, and storage medium
CN111176650B (en) * 2018-11-09 2023-04-18 阿里巴巴集团控股有限公司 Parser generation method, search method, server, and storage medium
CN110413771A (en) * 2019-06-18 2019-11-05 平安科技(深圳)有限公司 Classified index method, apparatus, equipment and storage medium based on solr
CN110347722A (en) * 2019-07-11 2019-10-18 软通智慧科技有限公司 Data capture method, device, equipment and storage medium based on HBase

Similar Documents

Publication Publication Date Title
CN105320746A (en) Big data based index acquisition method and system
US11068439B2 (en) Unsupervised method for enriching RDF data sources from denormalized data
US8214361B1 (en) Organizing search results in a topic hierarchy
US11580168B2 (en) Method and system for providing context based query suggestions
US9870382B2 (en) Data encoding and corresponding data structure
EP2674875B1 (en) Method, controller, program and data storage system for performing reconciliation processing
US20180004751A1 (en) Methods and apparatus for subgraph matching in big data analysis
US9916368B2 (en) Non-exclusionary search within in-memory databases
US20140229473A1 (en) Determining documents that match a query
EP3964976A1 (en) Cloud inference system
US10353966B2 (en) Dynamic attributes for searching
US20150019680A1 (en) Systems and Methods for Consistent Hashing Using Multiple Hash Rlngs
US11249993B2 (en) Answer facts from structured content
CN104298785A (en) Searching method for public searching resources
US20190347360A1 (en) System and method for updating a search index
CN110889023A (en) Distributed multifunctional search engine of elastic search
Adamu et al. A survey on big data indexing strategies
EP3480706A1 (en) Automatic search dictionary and user interfaces
CN110580255A (en) method and system for storing and retrieving data
US10565188B2 (en) System and method for performing a pattern matching search
US9984108B2 (en) Database joins using uncertain criteria
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
Doulkeridis et al. On saying" enough already!" in mapreduce
CN112650739A (en) Data storage processing method and device for coal mine data middling station
WO2013097065A1 (en) Index data processing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160210

RJ01 Rejection of invention patent application after publication