CN112783977A - Mass data search implementation method based on big data - Google Patents
Mass data search implementation method based on big data Download PDFInfo
- Publication number
- CN112783977A CN112783977A CN202110101717.5A CN202110101717A CN112783977A CN 112783977 A CN112783977 A CN 112783977A CN 202110101717 A CN202110101717 A CN 202110101717A CN 112783977 A CN112783977 A CN 112783977A
- Authority
- CN
- China
- Prior art keywords
- data
- commodity
- standardized
- database
- big
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for realizing mass data search based on big data, and belongs to the technical field of big data. The method comprises the steps of collecting commodity data of each online platform, processing the commodity data, storing the standardized commodity data into a database after the data are standardized, storing the standardized commodity data into the database, simultaneously storing the standardized commodity data into a built distributed search engine, carrying out retrieval query on the data in the distributed search engine by a user, and requesting detailed historical data from the database according to the queried commodity data. The method for realizing the mass data search based on the big data has good popularization and application values.
Description
Technical Field
The invention relates to the technical field of big data, and particularly provides a method for realizing mass data search based on big data.
Background
The development of internet technology brings great convenience to the life of people, under the background of the era of big data, all industries are all bloomed, hundreds of families are struggling, more industrial portals are coming, the generation of data is increased, how to quickly grasp information required by users in the presence of massive data is gradually pushed to scientific and technical internet companies, and meanwhile, along with the high-speed development of company business and the increase of data explosion type, all platforms of the current companies have requirements on searching, but the traditional search service system cannot well meet the expectations of all business lines due to the design of architecture and business. Meanwhile, in the big data era, the problem of data retrieval is a serious problem faced by each platform as the data volume increases, how to quickly query required data in the presence of numerous historical data and massive data is not slow, the required data cannot support sentence-level search, attributes related to a large number of services cannot be realized at all, no index evaluation system related to search is provided, the expansibility and the maintainability are poor, the low redundancy of the data cannot be guaranteed, and the retrieval efficiency and the user experience cannot be guaranteed.
Disclosure of Invention
The technical task of the invention is to provide a method for realizing mass data search based on big data, which can ensure low redundancy of data, keep data consistency and improve retrieval efficiency and user experience.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for realizing mass data search based on big data includes collecting commodity data of each online platform, processing the commodity data, storing the standardized commodity data in a database after the data are standardized, storing the standardized commodity data in the database, simultaneously storing the standardized commodity data in a built distributed search engine, carrying out retrieval query on data in the distributed search engine by a user, and requesting detailed historical data from the database according to the queried commodity data.
Preferably, the implementation method of mass data search based on big data specifically includes the following steps:
s1, collecting commodity data and processing the collected commodity data;
s2, standardized warehousing of the commodity data;
s3, storing the standardized commodity data into a database;
s4, storing the standardized commodity data into the constructed distributed search engine;
s5, carrying out retrieval query on the distributed search engine;
and S6, inquiring detailed data from the database according to the inquired commodity data.
Preferably, in step S1, the product data is washed and processed.
The method comprises the steps of cleaning and processing commodity data through a big data basic data processing basic method and a big data quality and data management method, then performing standardized storage, performing field creation mapping on the data taken by people, and storing the data into system data after structuring and standardizing the data according to mapping fields so as to facilitate program analysis and processing.
Preferably, in step S2, the product unique ID is custom-generated, and then the product data is standardized and put in storage.
The method comprises the steps of generating a commodity unique RowKey (ID) from standard data in a self-defined mode, storing the commodity unique RowKey (ID) in HBase, synchronously entering an ES index database through an Observer, displaying the commodity unique RowKey (ID) to a user group through a system, inquiring a plurality of pieces of data meeting conditions in the ES database by the user through inputting retrieval conditions, processing the data through a system frame, synchronously entering non-all data of the ES index database by the aid of the Observer, and preparing for inquiring mass data of the HBase by the user in the future.
Preferably, in step S3, the normalized commodity data is stored in the Hbase database, where the Hbase stores the detailed commodity data.
Preferably, in step S4, the standardized commodity data is stored in the built distributed search engine by the LOGSTAH data dump tool, wherein the distributed search engine only stores the fields corresponding to the data retrieval conditions.
Preferably, in step S5, the user inputs the query condition, and the query condition is carried to the ID of the required data list searched in the distributed search engine.
Preferably, in step S6, detailed data is queried from the Hbase database with the ID of the data list.
The ES only stores fields corresponding to data retrieval conditions, the ES is guaranteed to have large data volume but small storage space occupation as much as possible, and the Hbase stores detailed data, so that a massive data retrieval database and a massive data storage database are obtained. The user visual display pages are fused into a system commodity retrieval system, when a user inputs query conditions, the system acquires the conditions, then carries the conditions to an ES database to quickly find out the RowKey of a required data list, simultaneously carries the RowKey to the Hbase database mentioned above, inquires detailed data, and displays the data to the user after visual processing through a data retrieval system, so that the user quick retrieval requirement is realized.
The implementation method for massive data search based on big data collects commodity data of each online platform, the commodity data are cleaned and processed through a big data basic data processing basic method, a unique RowKey (ID) of a commodity is generated in a self-defined mode and then stored in a warehouse in a standardized mode, the standardized commodity data are stored in a built distributed search engine ElasticSearch through a LOGSTAH data dump tool, meanwhile, the standardized commodity data are stored in an HBASE database, a user can achieve the function of retrieving and inquiring the commodity data in the ElasticSearch through the system, and the system can again request detailed historical data like the HBASE according to the inquired commodity data. On the basis, the problem that the inquiry speed of hundred million-level commodities in a traditional database is low is solved, the inquiry speed is improved, and the user retrieval experience is optimized.
Compared with the prior art, the method for realizing mass data search based on big data has the following outstanding advantages: the implementation method for searching the mass data based on the big data can ensure low redundancy of the data, integrates a set of platform system architecture suitable for mass data retrieval on the basis of keeping the consistency of the data, realizes mass data retrieval, improves the platform retrieval efficiency and user experience, and has good popularization and application values.
Drawings
Fig. 1 is a flowchart of a method for implementing mass data search based on big data according to the present invention.
Detailed Description
The following describes in detail a method for implementing mass data search based on big data according to the present invention with reference to the accompanying drawings and embodiments.
Examples
The method for realizing the massive data search based on the big data collects commodity data of each online platform, processes the commodity data, stores the standardized commodity data into a library after the data are standardized, stores the standardized commodity data into a built distributed search engine, simultaneously stores the standardized commodity data into a database, and a user searches and inquires the data in the distributed search engine and requests detailed historical data from the database according to the inquired commodity data. The method specifically comprises the following steps:
and S1, collecting commodity data and processing the collected commodity data.
Wherein processing the commodity data includes cleaning and processing the commodity data. The method comprises the steps of cleaning and processing commodity data through a big data basic data processing basic method and a big data quality and data management method, then performing standardized storage, performing field creation mapping on the data taken by people, and storing the data into system data after structuring and standardizing the data according to mapping fields so as to facilitate program analysis and processing.
And S2, standardizing and warehousing the commodity data.
And after the unique ID of the commodity is generated by self definition, standardizing the commodity data and warehousing. The method comprises the steps of generating a commodity unique RowKey (ID) from standard data in a self-defined mode, storing the commodity unique RowKey (ID) in HBase, synchronously entering an ES index database through an Observer, displaying the commodity unique RowKey (ID) to a user group through a system, inquiring a plurality of pieces of data meeting conditions in the ES database by the user through inputting retrieval conditions, processing the data through a system frame, synchronously entering non-all data of the ES index database by the aid of the Observer, and preparing for inquiring mass data of the HBase by the user in the future.
And S3, storing the standardized commodity data into a database. The normalized commodity data is stored in an Hbase database, where the Hbase stores the commodity data for the details.
And S4, storing the standardized commodity data into the constructed distributed search engine.
And storing the standardized commodity data into the constructed distributed search engine through a LOGSTAH data dump tool, wherein the distributed search engine only stores fields corresponding to the data retrieval conditions.
And S5, carrying out retrieval query to the distributed search engine. And the user inputs a query condition and carries the query condition to a distributed search engine to search the ID of the required data list.
And S6, inquiring detailed data from the database according to the inquired commodity data. And inquiring detailed data from an Hbase database by carrying the ID of the data list.
The ES only stores fields corresponding to data retrieval conditions, the ES is guaranteed to have large data volume but small storage space occupation as much as possible, and the Hbase stores detailed data, so that a massive data retrieval database and a massive data storage database are obtained. The user visual display pages are fused into a system commodity retrieval system, when a user inputs query conditions, the system acquires the conditions, then carries the conditions to an ES database to quickly find out the RowKey of a required data list, simultaneously carries the RowKey to the Hbase database mentioned above, inquires detailed data, and displays the data to the user after visual processing through a data retrieval system, so that the user quick retrieval requirement is realized.
The implementation method for massive data search based on big data collects commodity data of each online platform, the commodity data are cleaned and processed through a big data basic data processing basic method, a unique RowKey (ID) of a commodity is generated in a self-defined mode and then stored in a warehouse in a standardized mode, the standardized commodity data are stored in a built distributed search engine ElasticSearch through a LOGSTAH data dump tool, meanwhile, the standardized commodity data are stored in an HBASE database, a user can achieve the function of retrieving and inquiring the commodity data in the ElasticSearch through the system, and the system can again request detailed historical data like the HBASE according to the inquired commodity data. On the basis, the problem that the inquiry speed of hundred million-level commodities in a traditional database is low is solved, the inquiry speed is improved, and the user retrieval experience is optimized.
As shown in fig. 1, the implementation method of mass data search based on big data of the present invention is implemented in a specific manner: the acquired commodity original data is cleaned and processed into standardized data, the standard data is customized to generate a commodity unique RowKey (ID), the commodity unique RowKey (ID) is stored in an HBase database, and meanwhile, the commodity unique RowKey (ID) synchronously enters an ES database through an Observer. A user inputs a search condition to a data search platform, the data search platform carries List (RowKey) to Hbase database accurate query, and simultaneously carries the search condition to ES database query. The HB ase database returns detailed list data to the data search platform, the ES database returns List (RowKey) to the data search platform, and the data search platform returns to the user through the client.
The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.
Claims (8)
1. A method for realizing mass data search based on big data is characterized in that: the method comprises the steps of collecting commodity data of each online platform, processing the commodity data, storing the data into a database after the data are standardized, storing the standardized commodity data into the database, simultaneously storing the standardized commodity data into a built distributed search engine, carrying out retrieval query on the data in the distributed search engine by a user, and requesting detailed historical data from the database according to the queried commodity data.
2. The method for realizing mass data search based on big data according to claim 1, characterized in that: the method specifically comprises the following steps:
s1, collecting commodity data and processing the collected commodity data;
s2, standardized warehousing of the commodity data;
s3, storing the standardized commodity data into a database;
s4, storing the standardized commodity data into the constructed distributed search engine;
s5, carrying out retrieval query on the distributed search engine;
and S6, inquiring detailed data from the database according to the inquired commodity data.
3. The method for realizing mass data search based on big data according to claim 2, characterized in that: in step S1, the product data is cleaned and processed.
4. The method for realizing mass data search based on big data according to claim 3, characterized in that: in step S2, after the unique ID of the commodity is generated by user-definition, the commodity data is standardized and put in storage.
5. The method for realizing mass data search based on big data according to claim 4, wherein: in step S3, the normalized product data is stored in the Hbase database, where the Hbase stores the detailed product data.
6. The method for realizing mass data search based on big data according to claim 5, wherein: in step S4, the standardized commodity data is stored in the built distributed search engine by the LOGSTAH data dump tool, where the distributed search engine only stores the fields corresponding to the data retrieval conditions.
7. The method for realizing mass data search based on big data according to claim 6, wherein: in step S5, the user inputs a query condition, and the query condition is carried to the distributed search engine to search for the ID of the required data list.
8. The method for implementing mass data search based on big data according to claim 7, wherein: in step S6, detailed data is queried from the Hbase database with the ID of the data list.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110101717.5A CN112783977A (en) | 2021-01-26 | 2021-01-26 | Mass data search implementation method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110101717.5A CN112783977A (en) | 2021-01-26 | 2021-01-26 | Mass data search implementation method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112783977A true CN112783977A (en) | 2021-05-11 |
Family
ID=75757661
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110101717.5A Withdrawn CN112783977A (en) | 2021-01-26 | 2021-01-26 | Mass data search implementation method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112783977A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113806618A (en) * | 2021-09-22 | 2021-12-17 | 汉唐信通(北京)咨询股份有限公司 | Trademark big data management method and system and computer equipment |
-
2021
- 2021-01-26 CN CN202110101717.5A patent/CN112783977A/en not_active Withdrawn
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113806618A (en) * | 2021-09-22 | 2021-12-17 | 汉唐信通(北京)咨询股份有限公司 | Trademark big data management method and system and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11663254B2 (en) | System and engine for seeded clustering of news events | |
US20170140038A1 (en) | Method and system for hybrid information query | |
CN102236663B (en) | Query method, query system and query device based on vertical search | |
KR102249466B1 (en) | Data catalog providing method and system for providing recommendation information using artificial intelligence recommendation model | |
US20180225305A1 (en) | Method for displaying landmark data | |
Athanasiou et al. | Big POI data integration with Linked Data technologies. | |
CN112825182A (en) | Method and device for determining recommended commodities | |
CN112269816A (en) | Government affair appointment event correlation retrieval method | |
CN110110234B (en) | Big data real-time searching system and method | |
KR100242606B1 (en) | Apparatus for supporting development of information processing system | |
CN111159559A (en) | Method for constructing recommendation engine according to user requirements and user behaviors | |
CN114090877A (en) | Position information recommendation method and device, electronic equipment and storage medium | |
CN112783977A (en) | Mass data search implementation method based on big data | |
CN113626571A (en) | Answer sentence generating method and device, computer equipment and storage medium | |
CN116150436B (en) | Data display method and system based on node tree | |
CN111078988B (en) | Electric power service information hotspot retrieval method and device and electronic equipment | |
CN115934923A (en) | E-commerce reply method and system based on big data | |
US10083241B2 (en) | Sorting method of data documents and display method for sorting landmark data | |
CN113342844A (en) | Industrial intelligent search system | |
CN113918728A (en) | Industrial Internet post-service knowledge map analysis platform | |
KR101592670B1 (en) | Apparatus for searching data using index and method for using the apparatus | |
CN112948660A (en) | Cluster electric bus monitoring website battery data continuous crawling and analyzing method | |
TWI605351B (en) | Query method, system and device based on vertical search | |
Peng et al. | Design and implementation of an intelligent recommendation system for product information on an e-commerce platform based on machine learning | |
CN113868322B (en) | Semantic structure analysis method, device and equipment, virtualization system and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210511 |
|
WW01 | Invention patent application withdrawn after publication |