CN112783977A - Mass data search implementation method based on big data - Google Patents

Mass data search implementation method based on big data Download PDF

Info

Publication number
CN112783977A
CN112783977A CN202110101717.5A CN202110101717A CN112783977A CN 112783977 A CN112783977 A CN 112783977A CN 202110101717 A CN202110101717 A CN 202110101717A CN 112783977 A CN112783977 A CN 112783977A
Authority
CN
China
Prior art keywords
data
commodity
standardized
database
big
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110101717.5A
Other languages
Chinese (zh)
Inventor
李赛赛
康子光
张洪超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Original Assignee
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chaozhou Zhuoshu Big Data Industry Development Co Ltd filed Critical Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority to CN202110101717.5A priority Critical patent/CN112783977A/en
Publication of CN112783977A publication Critical patent/CN112783977A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for realizing mass data search based on big data, and belongs to the technical field of big data. The method comprises the steps of collecting commodity data of each online platform, processing the commodity data, storing the standardized commodity data into a database after the data are standardized, storing the standardized commodity data into the database, simultaneously storing the standardized commodity data into a built distributed search engine, carrying out retrieval query on the data in the distributed search engine by a user, and requesting detailed historical data from the database according to the queried commodity data. The method for realizing the mass data search based on the big data has good popularization and application values.

Description

Mass data search implementation method based on big data
Technical Field
The invention relates to the technical field of big data, and particularly provides a method for realizing mass data search based on big data.
Background
The development of internet technology brings great convenience to the life of people, under the background of the era of big data, all industries are all bloomed, hundreds of families are struggling, more industrial portals are coming, the generation of data is increased, how to quickly grasp information required by users in the presence of massive data is gradually pushed to scientific and technical internet companies, and meanwhile, along with the high-speed development of company business and the increase of data explosion type, all platforms of the current companies have requirements on searching, but the traditional search service system cannot well meet the expectations of all business lines due to the design of architecture and business. Meanwhile, in the big data era, the problem of data retrieval is a serious problem faced by each platform as the data volume increases, how to quickly query required data in the presence of numerous historical data and massive data is not slow, the required data cannot support sentence-level search, attributes related to a large number of services cannot be realized at all, no index evaluation system related to search is provided, the expansibility and the maintainability are poor, the low redundancy of the data cannot be guaranteed, and the retrieval efficiency and the user experience cannot be guaranteed.
Disclosure of Invention
The technical task of the invention is to provide a method for realizing mass data search based on big data, which can ensure low redundancy of data, keep data consistency and improve retrieval efficiency and user experience.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for realizing mass data search based on big data includes collecting commodity data of each online platform, processing the commodity data, storing the standardized commodity data in a database after the data are standardized, storing the standardized commodity data in the database, simultaneously storing the standardized commodity data in a built distributed search engine, carrying out retrieval query on data in the distributed search engine by a user, and requesting detailed historical data from the database according to the queried commodity data.
Preferably, the implementation method of mass data search based on big data specifically includes the following steps:
s1, collecting commodity data and processing the collected commodity data;
s2, standardized warehousing of the commodity data;
s3, storing the standardized commodity data into a database;
s4, storing the standardized commodity data into the constructed distributed search engine;
s5, carrying out retrieval query on the distributed search engine;
and S6, inquiring detailed data from the database according to the inquired commodity data.
Preferably, in step S1, the product data is washed and processed.
The method comprises the steps of cleaning and processing commodity data through a big data basic data processing basic method and a big data quality and data management method, then performing standardized storage, performing field creation mapping on the data taken by people, and storing the data into system data after structuring and standardizing the data according to mapping fields so as to facilitate program analysis and processing.
Preferably, in step S2, the product unique ID is custom-generated, and then the product data is standardized and put in storage.
The method comprises the steps of generating a commodity unique RowKey (ID) from standard data in a self-defined mode, storing the commodity unique RowKey (ID) in HBase, synchronously entering an ES index database through an Observer, displaying the commodity unique RowKey (ID) to a user group through a system, inquiring a plurality of pieces of data meeting conditions in the ES database by the user through inputting retrieval conditions, processing the data through a system frame, synchronously entering non-all data of the ES index database by the aid of the Observer, and preparing for inquiring mass data of the HBase by the user in the future.
Preferably, in step S3, the normalized commodity data is stored in the Hbase database, where the Hbase stores the detailed commodity data.
Preferably, in step S4, the standardized commodity data is stored in the built distributed search engine by the LOGSTAH data dump tool, wherein the distributed search engine only stores the fields corresponding to the data retrieval conditions.
Preferably, in step S5, the user inputs the query condition, and the query condition is carried to the ID of the required data list searched in the distributed search engine.
Preferably, in step S6, detailed data is queried from the Hbase database with the ID of the data list.
The ES only stores fields corresponding to data retrieval conditions, the ES is guaranteed to have large data volume but small storage space occupation as much as possible, and the Hbase stores detailed data, so that a massive data retrieval database and a massive data storage database are obtained. The user visual display pages are fused into a system commodity retrieval system, when a user inputs query conditions, the system acquires the conditions, then carries the conditions to an ES database to quickly find out the RowKey of a required data list, simultaneously carries the RowKey to the Hbase database mentioned above, inquires detailed data, and displays the data to the user after visual processing through a data retrieval system, so that the user quick retrieval requirement is realized.
The implementation method for massive data search based on big data collects commodity data of each online platform, the commodity data are cleaned and processed through a big data basic data processing basic method, a unique RowKey (ID) of a commodity is generated in a self-defined mode and then stored in a warehouse in a standardized mode, the standardized commodity data are stored in a built distributed search engine ElasticSearch through a LOGSTAH data dump tool, meanwhile, the standardized commodity data are stored in an HBASE database, a user can achieve the function of retrieving and inquiring the commodity data in the ElasticSearch through the system, and the system can again request detailed historical data like the HBASE according to the inquired commodity data. On the basis, the problem that the inquiry speed of hundred million-level commodities in a traditional database is low is solved, the inquiry speed is improved, and the user retrieval experience is optimized.
Compared with the prior art, the method for realizing mass data search based on big data has the following outstanding advantages: the implementation method for searching the mass data based on the big data can ensure low redundancy of the data, integrates a set of platform system architecture suitable for mass data retrieval on the basis of keeping the consistency of the data, realizes mass data retrieval, improves the platform retrieval efficiency and user experience, and has good popularization and application values.
Drawings
Fig. 1 is a flowchart of a method for implementing mass data search based on big data according to the present invention.
Detailed Description
The following describes in detail a method for implementing mass data search based on big data according to the present invention with reference to the accompanying drawings and embodiments.
Examples
The method for realizing the massive data search based on the big data collects commodity data of each online platform, processes the commodity data, stores the standardized commodity data into a library after the data are standardized, stores the standardized commodity data into a built distributed search engine, simultaneously stores the standardized commodity data into a database, and a user searches and inquires the data in the distributed search engine and requests detailed historical data from the database according to the inquired commodity data. The method specifically comprises the following steps:
and S1, collecting commodity data and processing the collected commodity data.
Wherein processing the commodity data includes cleaning and processing the commodity data. The method comprises the steps of cleaning and processing commodity data through a big data basic data processing basic method and a big data quality and data management method, then performing standardized storage, performing field creation mapping on the data taken by people, and storing the data into system data after structuring and standardizing the data according to mapping fields so as to facilitate program analysis and processing.
And S2, standardizing and warehousing the commodity data.
And after the unique ID of the commodity is generated by self definition, standardizing the commodity data and warehousing. The method comprises the steps of generating a commodity unique RowKey (ID) from standard data in a self-defined mode, storing the commodity unique RowKey (ID) in HBase, synchronously entering an ES index database through an Observer, displaying the commodity unique RowKey (ID) to a user group through a system, inquiring a plurality of pieces of data meeting conditions in the ES database by the user through inputting retrieval conditions, processing the data through a system frame, synchronously entering non-all data of the ES index database by the aid of the Observer, and preparing for inquiring mass data of the HBase by the user in the future.
And S3, storing the standardized commodity data into a database. The normalized commodity data is stored in an Hbase database, where the Hbase stores the commodity data for the details.
And S4, storing the standardized commodity data into the constructed distributed search engine.
And storing the standardized commodity data into the constructed distributed search engine through a LOGSTAH data dump tool, wherein the distributed search engine only stores fields corresponding to the data retrieval conditions.
And S5, carrying out retrieval query to the distributed search engine. And the user inputs a query condition and carries the query condition to a distributed search engine to search the ID of the required data list.
And S6, inquiring detailed data from the database according to the inquired commodity data. And inquiring detailed data from an Hbase database by carrying the ID of the data list.
The ES only stores fields corresponding to data retrieval conditions, the ES is guaranteed to have large data volume but small storage space occupation as much as possible, and the Hbase stores detailed data, so that a massive data retrieval database and a massive data storage database are obtained. The user visual display pages are fused into a system commodity retrieval system, when a user inputs query conditions, the system acquires the conditions, then carries the conditions to an ES database to quickly find out the RowKey of a required data list, simultaneously carries the RowKey to the Hbase database mentioned above, inquires detailed data, and displays the data to the user after visual processing through a data retrieval system, so that the user quick retrieval requirement is realized.
The implementation method for massive data search based on big data collects commodity data of each online platform, the commodity data are cleaned and processed through a big data basic data processing basic method, a unique RowKey (ID) of a commodity is generated in a self-defined mode and then stored in a warehouse in a standardized mode, the standardized commodity data are stored in a built distributed search engine ElasticSearch through a LOGSTAH data dump tool, meanwhile, the standardized commodity data are stored in an HBASE database, a user can achieve the function of retrieving and inquiring the commodity data in the ElasticSearch through the system, and the system can again request detailed historical data like the HBASE according to the inquired commodity data. On the basis, the problem that the inquiry speed of hundred million-level commodities in a traditional database is low is solved, the inquiry speed is improved, and the user retrieval experience is optimized.
As shown in fig. 1, the implementation method of mass data search based on big data of the present invention is implemented in a specific manner: the acquired commodity original data is cleaned and processed into standardized data, the standard data is customized to generate a commodity unique RowKey (ID), the commodity unique RowKey (ID) is stored in an HBase database, and meanwhile, the commodity unique RowKey (ID) synchronously enters an ES database through an Observer. A user inputs a search condition to a data search platform, the data search platform carries List (RowKey) to Hbase database accurate query, and simultaneously carries the search condition to ES database query. The HB ase database returns detailed list data to the data search platform, the ES database returns List (RowKey) to the data search platform, and the data search platform returns to the user through the client.
The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (8)

1. A method for realizing mass data search based on big data is characterized in that: the method comprises the steps of collecting commodity data of each online platform, processing the commodity data, storing the data into a database after the data are standardized, storing the standardized commodity data into the database, simultaneously storing the standardized commodity data into a built distributed search engine, carrying out retrieval query on the data in the distributed search engine by a user, and requesting detailed historical data from the database according to the queried commodity data.
2. The method for realizing mass data search based on big data according to claim 1, characterized in that: the method specifically comprises the following steps:
s1, collecting commodity data and processing the collected commodity data;
s2, standardized warehousing of the commodity data;
s3, storing the standardized commodity data into a database;
s4, storing the standardized commodity data into the constructed distributed search engine;
s5, carrying out retrieval query on the distributed search engine;
and S6, inquiring detailed data from the database according to the inquired commodity data.
3. The method for realizing mass data search based on big data according to claim 2, characterized in that: in step S1, the product data is cleaned and processed.
4. The method for realizing mass data search based on big data according to claim 3, characterized in that: in step S2, after the unique ID of the commodity is generated by user-definition, the commodity data is standardized and put in storage.
5. The method for realizing mass data search based on big data according to claim 4, wherein: in step S3, the normalized product data is stored in the Hbase database, where the Hbase stores the detailed product data.
6. The method for realizing mass data search based on big data according to claim 5, wherein: in step S4, the standardized commodity data is stored in the built distributed search engine by the LOGSTAH data dump tool, where the distributed search engine only stores the fields corresponding to the data retrieval conditions.
7. The method for realizing mass data search based on big data according to claim 6, wherein: in step S5, the user inputs a query condition, and the query condition is carried to the distributed search engine to search for the ID of the required data list.
8. The method for implementing mass data search based on big data according to claim 7, wherein: in step S6, detailed data is queried from the Hbase database with the ID of the data list.
CN202110101717.5A 2021-01-26 2021-01-26 Mass data search implementation method based on big data Withdrawn CN112783977A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110101717.5A CN112783977A (en) 2021-01-26 2021-01-26 Mass data search implementation method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110101717.5A CN112783977A (en) 2021-01-26 2021-01-26 Mass data search implementation method based on big data

Publications (1)

Publication Number Publication Date
CN112783977A true CN112783977A (en) 2021-05-11

Family

ID=75757661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110101717.5A Withdrawn CN112783977A (en) 2021-01-26 2021-01-26 Mass data search implementation method based on big data

Country Status (1)

Country Link
CN (1) CN112783977A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806618A (en) * 2021-09-22 2021-12-17 汉唐信通(北京)咨询股份有限公司 Trademark big data management method and system and computer equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806618A (en) * 2021-09-22 2021-12-17 汉唐信通(北京)咨询股份有限公司 Trademark big data management method and system and computer equipment

Similar Documents

Publication Publication Date Title
US11663254B2 (en) System and engine for seeded clustering of news events
US20170140038A1 (en) Method and system for hybrid information query
CN102236663B (en) Query method, query system and query device based on vertical search
KR102249466B1 (en) Data catalog providing method and system for providing recommendation information using artificial intelligence recommendation model
US20180225305A1 (en) Method for displaying landmark data
Athanasiou et al. Big POI data integration with Linked Data technologies.
CN112825182A (en) Method and device for determining recommended commodities
CN112269816A (en) Government affair appointment event correlation retrieval method
CN110110234B (en) Big data real-time searching system and method
KR100242606B1 (en) Apparatus for supporting development of information processing system
CN111159559A (en) Method for constructing recommendation engine according to user requirements and user behaviors
CN114090877A (en) Position information recommendation method and device, electronic equipment and storage medium
CN112783977A (en) Mass data search implementation method based on big data
CN113626571A (en) Answer sentence generating method and device, computer equipment and storage medium
CN116150436B (en) Data display method and system based on node tree
CN111078988B (en) Electric power service information hotspot retrieval method and device and electronic equipment
CN115934923A (en) E-commerce reply method and system based on big data
US10083241B2 (en) Sorting method of data documents and display method for sorting landmark data
CN113342844A (en) Industrial intelligent search system
CN113918728A (en) Industrial Internet post-service knowledge map analysis platform
KR101592670B1 (en) Apparatus for searching data using index and method for using the apparatus
CN112948660A (en) Cluster electric bus monitoring website battery data continuous crawling and analyzing method
TWI605351B (en) Query method, system and device based on vertical search
Peng et al. Design and implementation of an intelligent recommendation system for product information on an e-commerce platform based on machine learning
CN113868322B (en) Semantic structure analysis method, device and equipment, virtualization system and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210511

WW01 Invention patent application withdrawn after publication