CN112783977A

CN112783977A - Mass data search implementation method based on big data

Info

Publication number: CN112783977A
Application number: CN202110101717.5A
Authority: CN
Inventors: 李赛赛; 康子光; 张洪超
Original assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Current assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-05-11

Abstract

The invention discloses a method for realizing mass data search based on big data, and belongs to the technical field of big data. The method comprises the steps of collecting commodity data of each online platform, processing the commodity data, storing the standardized commodity data into a database after the data are standardized, storing the standardized commodity data into the database, simultaneously storing the standardized commodity data into a built distributed search engine, carrying out retrieval query on the data in the distributed search engine by a user, and requesting detailed historical data from the database according to the queried commodity data. The method for realizing the mass data search based on the big data has good popularization and application values.

Description

Mass data search implementation method based on big data

Technical Field

The invention relates to the technical field of big data, and particularly provides a method for realizing mass data search based on big data.

Background

The development of internet technology brings great convenience to the life of people, under the background of the era of big data, all industries are all bloomed, hundreds of families are struggling, more industrial portals are coming, the generation of data is increased, how to quickly grasp information required by users in the presence of massive data is gradually pushed to scientific and technical internet companies, and meanwhile, along with the high-speed development of company business and the increase of data explosion type, all platforms of the current companies have requirements on searching, but the traditional search service system cannot well meet the expectations of all business lines due to the design of architecture and business. Meanwhile, in the big data era, the problem of data retrieval is a serious problem faced by each platform as the data volume increases, how to quickly query required data in the presence of numerous historical data and massive data is not slow, the required data cannot support sentence-level search, attributes related to a large number of services cannot be realized at all, no index evaluation system related to search is provided, the expansibility and the maintainability are poor, the low redundancy of the data cannot be guaranteed, and the retrieval efficiency and the user experience cannot be guaranteed.

Disclosure of Invention

The technical task of the invention is to provide a method for realizing mass data search based on big data, which can ensure low redundancy of data, keep data consistency and improve retrieval efficiency and user experience.

In order to achieve the purpose, the invention provides the following technical scheme:

a method for realizing mass data search based on big data includes collecting commodity data of each online platform, processing the commodity data, storing the standardized commodity data in a database after the data are standardized, storing the standardized commodity data in the database, simultaneously storing the standardized commodity data in a built distributed search engine, carrying out retrieval query on data in the distributed search engine by a user, and requesting detailed historical data from the database according to the queried commodity data.

Preferably, the implementation method of mass data search based on big data specifically includes the following steps:

s1, collecting commodity data and processing the collected commodity data;

s2, standardized warehousing of the commodity data;

s3, storing the standardized commodity data into a database;

s4, storing the standardized commodity data into the constructed distributed search engine;

s5, carrying out retrieval query on the distributed search engine;

and S6, inquiring detailed data from the database according to the inquired commodity data.

Preferably, in step S1, the product data is washed and processed.

The method comprises the steps of cleaning and processing commodity data through a big data basic data processing basic method and a big data quality and data management method, then performing standardized storage, performing field creation mapping on the data taken by people, and storing the data into system data after structuring and standardizing the data according to mapping fields so as to facilitate program analysis and processing.

Preferably, in step S2, the product unique ID is custom-generated, and then the product data is standardized and put in storage.

The method comprises the steps of generating a commodity unique RowKey (ID) from standard data in a self-defined mode, storing the commodity unique RowKey (ID) in HBase, synchronously entering an ES index database through an Observer, displaying the commodity unique RowKey (ID) to a user group through a system, inquiring a plurality of pieces of data meeting conditions in the ES database by the user through inputting retrieval conditions, processing the data through a system frame, synchronously entering non-all data of the ES index database by the aid of the Observer, and preparing for inquiring mass data of the HBase by the user in the future.

Preferably, in step S3, the normalized commodity data is stored in the Hbase database, where the Hbase stores the detailed commodity data.

Preferably, in step S4, the standardized commodity data is stored in the built distributed search engine by the LOGSTAH data dump tool, wherein the distributed search engine only stores the fields corresponding to the data retrieval conditions.

Preferably, in step S5, the user inputs the query condition, and the query condition is carried to the ID of the required data list searched in the distributed search engine.

Preferably, in step S6, detailed data is queried from the Hbase database with the ID of the data list.

The ES only stores fields corresponding to data retrieval conditions, the ES is guaranteed to have large data volume but small storage space occupation as much as possible, and the Hbase stores detailed data, so that a massive data retrieval database and a massive data storage database are obtained. The user visual display pages are fused into a system commodity retrieval system, when a user inputs query conditions, the system acquires the conditions, then carries the conditions to an ES database to quickly find out the RowKey of a required data list, simultaneously carries the RowKey to the Hbase database mentioned above, inquires detailed data, and displays the data to the user after visual processing through a data retrieval system, so that the user quick retrieval requirement is realized.

The implementation method for massive data search based on big data collects commodity data of each online platform, the commodity data are cleaned and processed through a big data basic data processing basic method, a unique RowKey (ID) of a commodity is generated in a self-defined mode and then stored in a warehouse in a standardized mode, the standardized commodity data are stored in a built distributed search engine ElasticSearch through a LOGSTAH data dump tool, meanwhile, the standardized commodity data are stored in an HBASE database, a user can achieve the function of retrieving and inquiring the commodity data in the ElasticSearch through the system, and the system can again request detailed historical data like the HBASE according to the inquired commodity data. On the basis, the problem that the inquiry speed of hundred million-level commodities in a traditional database is low is solved, the inquiry speed is improved, and the user retrieval experience is optimized.

Compared with the prior art, the method for realizing mass data search based on big data has the following outstanding advantages: the implementation method for searching the mass data based on the big data can ensure low redundancy of the data, integrates a set of platform system architecture suitable for mass data retrieval on the basis of keeping the consistency of the data, realizes mass data retrieval, improves the platform retrieval efficiency and user experience, and has good popularization and application values.

Drawings

Fig. 1 is a flowchart of a method for implementing mass data search based on big data according to the present invention.

Detailed Description

The following describes in detail a method for implementing mass data search based on big data according to the present invention with reference to the accompanying drawings and embodiments.

Examples

The method for realizing the massive data search based on the big data collects commodity data of each online platform, processes the commodity data, stores the standardized commodity data into a library after the data are standardized, stores the standardized commodity data into a built distributed search engine, simultaneously stores the standardized commodity data into a database, and a user searches and inquires the data in the distributed search engine and requests detailed historical data from the database according to the inquired commodity data. The method specifically comprises the following steps:

and S1, collecting commodity data and processing the collected commodity data.

Wherein processing the commodity data includes cleaning and processing the commodity data. The method comprises the steps of cleaning and processing commodity data through a big data basic data processing basic method and a big data quality and data management method, then performing standardized storage, performing field creation mapping on the data taken by people, and storing the data into system data after structuring and standardizing the data according to mapping fields so as to facilitate program analysis and processing.

And S2, standardizing and warehousing the commodity data.

And after the unique ID of the commodity is generated by self definition, standardizing the commodity data and warehousing. The method comprises the steps of generating a commodity unique RowKey (ID) from standard data in a self-defined mode, storing the commodity unique RowKey (ID) in HBase, synchronously entering an ES index database through an Observer, displaying the commodity unique RowKey (ID) to a user group through a system, inquiring a plurality of pieces of data meeting conditions in the ES database by the user through inputting retrieval conditions, processing the data through a system frame, synchronously entering non-all data of the ES index database by the aid of the Observer, and preparing for inquiring mass data of the HBase by the user in the future.

And S3, storing the standardized commodity data into a database. The normalized commodity data is stored in an Hbase database, where the Hbase stores the commodity data for the details.

And S4, storing the standardized commodity data into the constructed distributed search engine.

And storing the standardized commodity data into the constructed distributed search engine through a LOGSTAH data dump tool, wherein the distributed search engine only stores fields corresponding to the data retrieval conditions.

And S5, carrying out retrieval query to the distributed search engine. And the user inputs a query condition and carries the query condition to a distributed search engine to search the ID of the required data list.

And S6, inquiring detailed data from the database according to the inquired commodity data. And inquiring detailed data from an Hbase database by carrying the ID of the data list.

As shown in fig. 1, the implementation method of mass data search based on big data of the present invention is implemented in a specific manner: the acquired commodity original data is cleaned and processed into standardized data, the standard data is customized to generate a commodity unique RowKey (ID), the commodity unique RowKey (ID) is stored in an HBase database, and meanwhile, the commodity unique RowKey (ID) synchronously enters an ES database through an Observer. A user inputs a search condition to a data search platform, the data search platform carries List (RowKey) to Hbase database accurate query, and simultaneously carries the search condition to ES database query. The HB ase database returns detailed list data to the data search platform, the ES database returns List (RowKey) to the data search platform, and the data search platform returns to the user through the client.

The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims

1. A method for realizing mass data search based on big data is characterized in that: the method comprises the steps of collecting commodity data of each online platform, processing the commodity data, storing the data into a database after the data are standardized, storing the standardized commodity data into the database, simultaneously storing the standardized commodity data into a built distributed search engine, carrying out retrieval query on the data in the distributed search engine by a user, and requesting detailed historical data from the database according to the queried commodity data.

2. The method for realizing mass data search based on big data according to claim 1, characterized in that: the method specifically comprises the following steps:

s1, collecting commodity data and processing the collected commodity data;

s2, standardized warehousing of the commodity data;

s3, storing the standardized commodity data into a database;

s5, carrying out retrieval query on the distributed search engine;

3. The method for realizing mass data search based on big data according to claim 2, characterized in that: in step S1, the product data is cleaned and processed.

4. The method for realizing mass data search based on big data according to claim 3, characterized in that: in step S2, after the unique ID of the commodity is generated by user-definition, the commodity data is standardized and put in storage.

5. The method for realizing mass data search based on big data according to claim 4, wherein: in step S3, the normalized product data is stored in the Hbase database, where the Hbase stores the detailed product data.

6. The method for realizing mass data search based on big data according to claim 5, wherein: in step S4, the standardized commodity data is stored in the built distributed search engine by the LOGSTAH data dump tool, where the distributed search engine only stores the fields corresponding to the data retrieval conditions.

7. The method for realizing mass data search based on big data according to claim 6, wherein: in step S5, the user inputs a query condition, and the query condition is carried to the distributed search engine to search for the ID of the required data list.

8. The method for implementing mass data search based on big data according to claim 7, wherein: in step S6, detailed data is queried from the Hbase database with the ID of the data list.