CN104778273B - A kind of big data analysis system for shopping website - Google Patents

A kind of big data analysis system for shopping website Download PDF

Info

Publication number
CN104778273B
CN104778273B CN201510203342.8A CN201510203342A CN104778273B CN 104778273 B CN104778273 B CN 104778273B CN 201510203342 A CN201510203342 A CN 201510203342A CN 104778273 B CN104778273 B CN 104778273B
Authority
CN
China
Prior art keywords
data
data analysis
shop
commodity
shopping website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510203342.8A
Other languages
Chinese (zh)
Other versions
CN104778273A (en
Inventor
邵明前
徐胜飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taojin Hechuang E Commerce Jiangsu Co ltd
Original Assignee
Panning Information Technology Jiangsu Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panning Information Technology Jiangsu Co filed Critical Panning Information Technology Jiangsu Co
Priority to CN201510203342.8A priority Critical patent/CN104778273B/en
Publication of CN104778273A publication Critical patent/CN104778273A/en
Application granted granted Critical
Publication of CN104778273B publication Critical patent/CN104778273B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention proposes a kind of big data analysis system for shopping website, including:Data acquisition module is used to acquire the initial data of shopping website and stores into raw data base;Data analysis module is used to receive the data analysis inlet flow that the initial data of raw data base is formed, and data analysis inlet flow is analyzed to establish index statistics, data analysis inlet flow and the corresponding data in data acquisition module are synchronized, data analysis output stream is formed, and exports the database to after analyzing;The data analysis output stream that data exhibiting module is used to receive after analysis in database shows data, and will show data and be presented to user according to the different of type with interface form and check.Present invention collection is acquired, is analyzed, being showed in one, is acquired by real-time data, data results are showed client by complete data analysis.

Description

A kind of big data analysis system for shopping website
Technical field
The present invention relates to big data processing technology field, more particularly to a kind of big data analysis systems for shopping website System.
Background technology
Same domain uses the storage mode of Mysql databases substantially at present in terms of data-storing, this storage structure for It is fit closely below data volume millions.But if it exceeds this order of magnitude, which then will produce serious data, blocks consequence, To influence system performance.
Invention content
The purpose of the present invention aims to solve at least one of described technological deficiency.
For this purpose, it is an object of the invention to propose a kind of big data analysis system for shopping website, but realize collection It acquires, analyze, be showed in one, acquired by real-time data, data results are showed visitor by complete data analysis Family.
To achieve the goals above, the embodiment of the present invention provides a kind of big data analysis system for shopping website, It is acquired and analyzes for the data to shopping website, the big data analysis system for shopping website includes:Data Acquisition module, data analysis module and data display module, wherein the data acquisition module is for acquiring the shopping website Initial data and store into raw data base, wherein the initial data of the shopping website includes:Universal class mesh number according to, it is preceding The merchandise news and store information of backstage classification associated data, infima species now;The data analysis module is described for receiving The data analysis inlet flow that initial data in raw data base is formed, and the data analysis inlet flow is analyzed to build Lithol draws and counts, and is synchronized with the corresponding data in the data acquisition module to the data analysis inlet flow, Data analysis output stream is formed, and exports the database to after analyzing;The data exhibiting module after receiving the analysis for counting Showed with interface form according to the difference of type according to the data that show of the data analysis output stream in library, and by the data that show It is checked to user, and receives concern instruction input by user, wherein the concern instruction includes paying close attention to title and the pass of commodity The title in shop is noted, the data exhibiting module, which is additionally operable to generate according to concern instruction, shows system interaction data, and will The system interaction data that show are sent to database after the analysis;The data analysis module is additionally operable to receive the analysis Afterwards in database show system interaction data formation show data feedback inlet flow, show data feedback inlet flow to described The title for being analyzed the title and concern shop to obtain concern commodity establishes index to concern commodity, and to the pass Note merchandise news and concern store information are synchronized with the data exhibiting mould corresponding data in the block, and it is anti-that formation shows data Feedback output stream shows data feedback output stream output to the raw data base by described;The data acquisition module is additionally operable to It receives in the raw data base and shows data feedback output stream, and show data feedback output stream preferential collection according to described It pays close attention to commodity and pays close attention to the information in shop, and the user is presented to by the data analysis module and data display module and is looked into It sees.
In one embodiment of the invention, the data acquisition module uses Mysql database servers, the data Analysis module and the data exhibiting module use Mysql database servers and Solr database servers.
In yet another embodiment of the present invention, the data acquisition module obtains the universal class mesh number evidence of the shopping website Including:The data acquisition module carries out operation for the first time according to configuration information and detects, and if there is scheme of classes, then inquires scheme of classes Under level-one classification, otherwise carry out universal class mesh inquiry;The data acquisition module calls the shopping website backstage classification Api, Scheme of classes is updated according to return value.
In one embodiment of the invention, the data acquisition module obtains the front and back classification pass of the shopping website Joining data includes:The data acquisition module adds one or more foreground level-one classification according to updated scheme of classes, according to The foreground category ID of the foreground level-one classification splices searched page;Described in the data acquisition module judges according to page code Whether foreground level-one classification has subprime directory, and if there is being then judged as parent directory, subprime directory information is obtained according to the page;If There is no subprime directory, be then judged as subdirectory, obtains corresponding backstage classification ID by commodity under the catalogue, pass through the background class Mesh ID obtains other corresponding backstage generic attributes.
In yet another embodiment of the present invention, the data acquisition module obtains the infima species of the shopping website now Merchandise news include:The classification that the data acquisition module acquires as needed is downloaded the page by URL and is judged whether Attribute tags, if there is no then collecting the addresses URL and attribute information, and if so, judge whether sub- attribute, if There are sub- attributes then to collect the addresses URL and sub- attribute information, according to the URL address searching pages are obtained, is matched according to page source code Merchandise news.
In yet another embodiment of the present invention, the data acquisition module obtains the infima species of the shopping website now Store information and concern shop information include:The preferential shop data for obtaining concern shop, then obtain in scheme of classes Shop data judge whether the URL pages for downloading shop ID splicings, if it is present matching the page and obtaining shop letter Otherwise breath deletes the corresponding data in scheme of classes;The data acquisition module judges whether new store information is sky, if it is New store information is then added, the store name for needing newly-increased shop is obtained, splice the addresses URL and judges whether to download the page, If it is present the matching page obtains store information.
In one embodiment of the invention, the data collecting module collected concern merchandise news includes:The data Acquisition module obtains concern commodity ID and concern shop ID, splices the commodity details addresses page URL and the shop addresses search page URL, The shop addresses search page URL are downloaded, matching obtains the addresses all commodity URL under the shop, and further combined with commodity details page The addresses URL judge whether that commodity download details page source code, and if so, parsing merchandise news.
In one embodiment of the invention, the data analysis module to the data analysis inlet flow and shows data Feed back input stream is analyzed includes to establish index:The data analysis module initializes index service first, then initially Change the data analysis inlet flow and show all classifications of commodity in data feedback inlet flow, addition need to increase the item of index Mesh, respectively original article data and concern commodity addition index.
In yet another embodiment of the present invention, the data analysis module to data synchronize including:The data Analysis module loads the classification of commodity, obtains original article data, commodity focused data and shop focused data;The data point Commodity focused data and shop focused data are updated to the raw data base by analysis module;The data analysis module is to described Thread is arranged in commodity data, and carries out data synchronization and statistics of attributes.
In one embodiment of the invention, described data exhibiting modules exhibit the following contents:Login interface, homepage circle Face, industry analysis interface, shop assay surface, commercial analysis interface, account interface and system management interface.
The big data analysis system for shopping website of the embodiment of the present invention be collection acquire, analyze, be showed in it is integrated Aggregate data system is acquired by real-time data, and data results are showed client by complete data analysis.It adopts The mode acquired with the whole network, the data volume of embodiment is big, and collected object is more comprehensive, to using object analysis market conditions, production Product research and development have comprehensive directive function.The present invention is by the way of Mysql and solr database combinations, with solr non-relational numbers Big data quantity storage is done according to library, analysis statistics is done with Mysql, the two is combined and can be blocked to avoid data, improves system performance.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description Obviously, or practice through the invention is recognized.
Description of the drawings
The above-mentioned and/or additional aspect and advantage of the present invention will become in the description from combination following accompanying drawings to embodiment Obviously and it is readily appreciated that, wherein:
Fig. 1 is the structure chart according to the big data analysis system for shopping website of the embodiment of the present invention;
Fig. 2 is the data interaction schematic diagram according to the big data analysis system for shopping website of the embodiment of the present invention;
Fig. 3 is the work flow diagram according to the data acquisition module of the embodiment of the present invention;
Fig. 4 is to obtain universal class purpose flow chart according to the data acquisition module of the embodiment of the present invention;
Fig. 5 is to obtain the associated flow chart of front and back classification according to the data acquisition module of the embodiment of the present invention;
Fig. 6 is the flow chart that the merchandise news of infima species now is obtained according to the data acquisition module of the embodiment of the present invention;
Fig. 7 is the flow chart that the store information of infima species now is obtained according to the data acquisition module of the embodiment of the present invention;
Fig. 8 is the flow chart for increasing store information according to the data acquisition module of the embodiment of the present invention;
Fig. 9 is the flow that commodity and concern commodity under concern shop are obtained according to the data acquisition module of the embodiment of the present invention Figure;
Figure 10 is the work flow diagram according to the data analysis module of the embodiment of the present invention;
Figure 11 is the flow chart that data directory and analysis are carried out according to the data analysis module of the embodiment of the present invention;
Figure 12 is the flow chart that data synchronization is carried out according to the data analysis module of the embodiment of the present invention;
Figure 13 is the flow chart that data statistics is carried out according to the data analysis module of the embodiment of the present invention;
Figure 14 is the topological diagram according to the data exhibiting modules exhibit content of the embodiment of the present invention.
Specific implementation mode
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
As shown in Figure 1, the big data analysis system for shopping website of the embodiment of the present invention, for shopping website Data are acquired and analyze, such as:Taobao.This includes for big data analysis system of shopping website:Data acquisition module Block 1, data analysis module 2 and data display module 3.
Fig. 2 is the data interaction schematic diagram according to the big data analysis system for shopping website of the embodiment of the present invention.
As shown in Fig. 2, data acquisition module 1 acquires the initial data of shopping website and stores into raw data base.Its In, the initial data of shopping website includes:Universal class mesh number evidence, front and back classification associated data, the merchandise news of infima species now And store information.
Data analysis module 2 receives the data analysis inlet flow that the initial data in raw data base is formed, and to data Analysis inlet flow is analyzed to establish index and statistics, and to pair in data analysis inlet flow and data acquisition module 1 It answers data to synchronize, forms data analysis output stream, and export the database to after analyzing.
What the data analysis output after the reception analysis of data exhibiting module 3 in database was flowed shows data, and will show number It is checked according to user is presented to according to the difference of type with interface form, and receives concern instruction input by user.Wherein, it pays close attention to Instruction includes the title of the title and concern shop of paying close attention to commodity.For example, user can be by being connected with data exhibiting module Keyboard or the above-mentioned concern instruction of touch screen input.
Data exhibiting module 3 is generated also according to concern instruction shows system interaction data, and will show system interaction data It is sent to database after analyzing.
Data analysis module 2 also receive after analysis in database show the formation of system interaction data show data feedback Inlet flow is analyzed showing data feedback inlet flow the title of title and concern shop to obtain concern commodity, to closing Note commodity and establish index, and to concern merchandise news and pay close attention to corresponding data in store information and data exhibiting module 3 into Row synchronizes, and formation shows data feedback output stream, will show data feedback output stream output to raw data base.
Data acquisition module 1, which also receives in raw data base, shows data feedback output stream, and anti-according to data are showed Feedback output stream preferential collection concern commodity and the information for paying close attention to shop, and opened up by data analysis module 2 and data display module 3 Now checked to user.
That is, the big data analysis system for shopping website of the embodiment of the present invention can be first by the original of shopping website Data exhibiting checks that user can input the trade name and shop title of concern after checking according to the interest of oneself to user, The big data analysis system for shopping website is fed back to, acquisition is re-started according to the concern of user by it and analysis is a series of Action, and be presented to user and check.
In one embodiment of the invention, data acquisition module 1 uses Mysql database servers.Specifically, data The hardware device of acquisition module 1 includes server, and the software used on server is Mysql, jdk etc..
Data analysis module 2 uses Mysql database servers and Solr database servers.Specifically, data analysis The hardware device of module 2 includes server, and it is Mysql, solr, jdk etc. that software is used on server.
Data exhibiting module 3 uses Mysql database servers and Solr database servers.Specifically, data exhibiting The hardware device of module 3 includes server, and the software used on server is mysql, solr, jdk, tomcat etc..
The present invention the big data analysis system for shopping website by the way of mysql and solr database combinations, Big data quantity storage is done with solr non-relational databases, analysis statistics is done with Mysql, the two combines can be stifled to avoid data Plug improves system performance.
Fig. 3 is the work flow diagram according to the data acquisition module of the embodiment of the present invention.It is by Taobao of shopping website Example illustrates.
Step S301 obtains universal class mesh Api.
Step S302, Taobao's classification is front-background related, obtains infima species commodity now.
Step S303 obtains commodity and acquires search page.
Step S304 obtains the commodity details page of concern commodity and pays close attention to the commodity details page of commodity under shop.
Step S305 obtains shop under concern store information and acquires information page.
Specifically, the present invention uses the acquisition mode of classification refinement, multithreading is interted in thinning process, on the one hand The whole network universal class purpose feature is maintained, on the other hand also optimizes picking rate and quality.
Fig. 4 is to obtain universal class purpose flow chart according to the data acquisition module of the embodiment of the present invention.
Step S401 obtains configuration information.
Step S402, operation detection, judges whether scheme of classes, if there is S403 is thened follow the steps, otherwise for the first time Execute step S404.
Step S403 inquires scheme of classes level-one classification.
Step S404 inquires universal class mesh.
Step S405 calls Taobao backstage classification Api according to the level-one classification in step S403, and according to step S404 In universal class mesh call Taobao backstage classification Api.
Step S406 updates scheme of classes (category) according to return value.
Fig. 5 is to obtain the associated flow chart of front and back classification according to the data acquisition module of the embodiment of the present invention.
Step S501 adds one or more foreground level-one classification.
Step S502 splices searched page according to foreground category ID.
Step S503 judges whether it is to have subprime directory, if so, thening follow the steps S504, otherwise according to page code Execute step S506.
Step S504 judges the foreground category for parent directory, subprime directory information is obtained according to the page.
Step S505, by subprime directory information write-in database (category_front).
Step S506 judges the foreground category for subdirectory, corresponding backstage classification ID is obtained by commodity under the catalogue.
Step S507 obtains other corresponding backstage classification attributes by backstage classification ID, such as:Industry and parent mesh Deng then executing step S505.
Step S508 judges to handle whether foreground category is empty (null), if it is terminate, otherwise return to step S502。
Fig. 6 is the flow chart that the merchandise news of infima species now is obtained according to the data acquisition module of the embodiment of the present invention.
Step S601 obtains the classification that need to be acquired by database (category_front).
Step S602 downloads the page by URL and judges whether attribute tags, if it is present step S603 is executed, It is no to then follow the steps S605.
Step S603 judges that attribute has s.m.p attribute, no to then follow the steps S605 if there is thening follow the steps S604.
Step S604 obtains sub- attribute, then executes step S605.
Step S605 collects the addresses URL and attribute information.
Step S606, write-in database (auction_list_url).
Step S607 passes through the URL information searched page got.
Step S608 matches merchandise news according to page source code.
Step S609, write-in database (auction).
Fig. 7 is the flow chart that the store information of infima species now is obtained according to the data acquisition module of the embodiment of the present invention.
Step S701 is safeguarded to having store information.
Step S702, the preferential shop data (cust_shop) for obtaining concern shop.
Step S703 obtains original shop data.
Step S704 judges whether the URL pages for downloading shop ID splicings, if there is thening follow the steps S705, It is no to then follow the steps S706.
Step S707, by store information write-in database (shop).
Step S708 judges to need whether newer shop is empty, if so, then follow the steps S709, otherwise return to step S704。
Step S709 adds new store information.
Fig. 8 is the flow chart for increasing store information according to the data acquisition module of the embodiment of the present invention.
Step S801 starts to add new store information.
Step S802 obtains the store name (nick_new) that need to increase shop newly.
Step S803 judges whether the download page that splicing URL is formed, if it is present step S804 is executed, it is no Then follow the steps S806.
Step S804, the matching page obtain store information.
Step S805, by store information write-in database (shop).
Step S806 judges whether the shop that need to be added is empty, is if it is terminated, otherwise return to step S803.
Fig. 9 is the flow that commodity and concern commodity under concern shop are obtained according to the data acquisition module of the embodiment of the present invention Figure.
Step S901 obtains concern commodity ID (cust_auction).
The address step S902, splicing commodity details page URL.
Step S903 obtains concern shop ID (cust_shop).
Step S904 splices the shop addresses details page URL.
Step S905, downloads the page, and matching obtains the addresses URL of all commodity under the shop.
Step S906 judges whether to download details page source code by commodity URL, if it is present S907 is executed, it is no Then follow the steps S908.
Step S907 parses merchandise news, and database (auciton_concern) is written.
Step S908 judges to obtain whether commodity URL is empty, if it is, terminate, otherwise return to step S906.
Figure 10 is the work flow diagram according to the data analysis module of the embodiment of the present invention.
Step S1001, data directory and analysis, establish index database.
Step S1002, data analysis module carry out data same with data analysis module and data display module accordingly Step.
Step S1003, data statistics, with for data exhibiting service.
Figure 11 is the flow chart that data directory and analysis are carried out according to the data analysis module of the embodiment of the present invention.
Step S1101 initializes index service.
Step S1102 initializes all classifications of commodity, load foreground and backstage classification.
Step S1103, addition need to increase the project of index.
Step S1104 adds original article data directory.
Step S1105, addition concern commodity index.
Step S1106 judges need to add whether index item is empty, if it is terminates, otherwise return to step S1104 and S1105。
Figure 12 is the flow chart that data synchronization is carried out according to the data analysis module of the embodiment of the present invention.
Step S1201 loads the classification of commodity.
Step S1202 is handled and is synchronized to original article data.
Step S1203 pays close attention to commodity to user and synchronizes.
Step S1204 pays close attention to shop to user and synchronizes.
Step S1205 obtains commodity data (auction).
Step S1206 obtains concern commodity data (cust_auction).
Step S1207 obtains concern shop data (cust_shop).
Step S1208 is arranged thread, multithreading need to be arranged according to data volume herein.
Concern commodity data is updated to raw data base (cust_auction) by step S1209.
Step S1210, by concern shop data update to raw data base (cust_shop).
Step S1211, synchrodata are stored to database (statistics_cache_auction).
Step S1212, statistics of attributes are stored to database (statistics_property).
Step S1213, judges whether the commodity that need to be counted terminate, and if it is terminates, otherwise return to step S1208.
Figure 13 is the flow chart that data statistics is carried out according to the data analysis module of the embodiment of the present invention.
Step S1301 loads front and back classification.
Step S1302, commodity statistics.
Step S1303, shop statistics.
Step S1304, industry statistic.
Step S1305 obtains concern commodity, statistics commodity data (statistics_cache_auction and cust_ Auction it) is associated with.
Step S1306 obtains concern shop, counts all commodity data (statistics_ under the concern shop Cache_auction and cust_shop) it is associated with.
Step S1307, resetting database (statistics_auction).
Step S1308, by commodity statistics value write-in database (statistics_auction).
Step S1309 obtains concern shop, statistics shop data (statistics_cache_auction and cust_ Shop it) is associated with.
Step S1310 counts original shop data.
Step S1311, by commodity statistics moon sales volume.
Step S1312 counts moon sales volume by classification.
Step S1313, resetting database (statistics_shop).
Step S1314, by shop statistical value write-in database (statistics_shop).
Step S1315, the nearly 30 days data of Statistics.
Step S1316, the Statistics moon data.
Step S1317, resetting database (statistics_industry).
(statistics_industry) is written in industry statistic value by step S1318.
Figure 14 is the topological diagram according to the data exhibiting modules exhibit content of the embodiment of the present invention.Using shopping website as Taobao It is illustrated for net.
As shown in figure 14, data exhibiting module 3 shows that content includes:Login interface, home interface, industry analysis interface, Shop assay surface, account interface, gathers cost-effective interface and system management interface at commercial analysis interface.
Home interface:Month Sales Volume of Commodity ranking and moon shop sales volume ranking.
Industry analysis interface:Industry analysis and industry interesting episode.
Shop assay surface:Shop search, single shop analysis and shop comparison, wherein single shop analysis has shop statistics.
Commercial analysis interface:Fast sale inquiry and concern commodity.
Account interface:Safety modification.
Gather cost-effective interface:Poly- cost-effective inquiry and poly- cost-effective concern.
Double 11 interfaces:Store ranking and commodity ranking.
System management interface:User management.
The big data analysis system for shopping website of the embodiment of the present invention be collection acquire, analyze, be showed in it is integrated Aggregate data system is acquired by real-time data, and data results are showed client by complete data analysis.It adopts The mode acquired with the whole network, the data volume of embodiment is big, and collected object is more comprehensive, to using object analysis market conditions, production Product research and development have comprehensive directive function.The present invention is by the way of Mysql and solr database combinations, with solr non-relational numbers Big data quantity storage is done according to library, analysis statistics is done with Mysql, the two is combined and can be blocked to avoid data, improves system performance.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any One or more embodiments or example in can be combined in any suitable manner.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art are not departing from the principle of the present invention and objective In the case of can make changes, modifications, alterations, and variations to the above described embodiments within the scope of the invention.The scope of the present invention It is extremely equally limited by appended claims.

Claims (10)

1. a kind of big data analysis system for shopping website, which is characterized in that adopted for the data to shopping website Collection and analysis, the big data analysis system for shopping website include:Data acquisition module, data analysis module and data Display module, wherein
The data acquisition module is used to acquire the initial data of the shopping website and stores into raw data base, wherein The initial data of the shopping website includes:Universal class mesh number evidence, front and back classification associated data, the merchandise news of infima species now And store information;
The data analysis module is used to receive the data analysis inlet flow that the initial data in the raw data base is formed, and The data analysis inlet flow is analyzed to establish index and statistics, and to the data analysis inlet flow and the number It is synchronized according to the corresponding data in acquisition module, forms data analysis output stream, and export the database to after analyzing;
The data analysis output stream that the data exhibiting module is used to receive after the analysis in database shows data, and will The data that show are presented to user with interface form and check according to the difference of type, and receive concern input by user and refer to It enables, wherein the concern instruction includes the title of the title and concern shop of paying close attention to commodity, the data exhibiting module is additionally operable to It is generated according to concern instruction and shows system interaction data, and showed described after system interaction data are sent to the analysis Database;
The data analysis module is additionally operable to receive after the analysis shows showing for system interaction data formation in database Data feedback inlet flow analyzes to obtain title and the concern shop of paying close attention to commodity the data feedback inlet flow that shows Title, index is established to concern commodity, and to the concern merchandise news and pay close attention to store information and the data exhibiting Mould corresponding data in the block synchronizes, and formation shows data feedback output stream, shows data feedback output stream output by described To the raw data base;
The data acquisition module, which is additionally operable to receive in the raw data base, shows data feedback output stream, and according to described Show data feedback output stream preferential collection concern commodity and pay close attention to the information in shop, and passes through the data analysis module sum number The user is presented to according to display module to check.
2. being used for the big data analysis system of shopping website as described in claim 1, which is characterized in that the data acquisition module Block uses Mysql database servers, the data analysis module and the data exhibiting module to use Mysql database services Device and Solr database servers.
3. being used for the big data analysis system of shopping website as described in claim 1, which is characterized in that the data acquisition module Block obtains the universal class mesh number of the shopping website according to including:
The data acquisition module carries out operation for the first time according to configuration information and detects, and if there is scheme of classes, then inquires scheme of classes Under level-one classification, otherwise carry out universal class mesh inquiry;
The data acquisition module calls the shopping website backstage classification Api, updates scheme of classes according to return value.
4. being used for the big data analysis system of shopping website as claimed in claim 3, which is characterized in that the data acquisition module The front and back classification associated data that block obtains the shopping website includes:
The data acquisition module adds one or more foreground level-one classification according to updated scheme of classes, according to the foreground The foreground category ID of level-one classification splices searched page;
The data acquisition module judges whether the foreground level-one classification has subprime directory according to page code, if there is then sentencing Break as parent directory, subprime directory information is obtained according to the page;
If without subprime directory, it is judged as subdirectory, obtains corresponding backstage classification ID by commodity under the catalogue, pass through institute It states backstage classification ID and obtains other corresponding backstage generic attributes.
5. being used for the big data analysis system of shopping website as described in claim 1, which is characterized in that the data acquisition module Block obtains the merchandise news of the infima species of the shopping website now:
The classification that the data acquisition module acquires as needed downloads the page by URL and judges whether attribute tags, such as Fruit is there is no the addresses URL and attribute information is then collected, and if so, judging whether sub- attribute, then if there is sub- attribute The addresses URL and sub- attribute information are collected, according to the URL address searching pages are obtained, merchandise news is matched according to page source code.
6. being used for the big data analysis system of shopping website as described in claim 1, which is characterized in that the data acquisition module Block obtains the store information of the infima species of the shopping website now and the information in concern shop includes:
Then the preferential shop data for obtaining concern shop obtain the shop data in scheme of classes, judge whether download shop The URL pages of ID splicings are spread, if it is present matching the page and obtaining store information, otherwise delete the respective counts in scheme of classes According to;
The data acquisition module judges whether new store information is empty, if it is adds new store information, obtains and needs The store name in newly-increased shop splices the addresses URL and judges whether to download the page, if it is present the matching page obtains shop Spread information.
7. being used for the big data analysis system of shopping website as described in claim 1, which is characterized in that the data acquisition module Merchandise news is paid close attention in block acquisition:
The data acquisition module obtains concern commodity ID and concern shop ID, splices the commodity details addresses page URL and shop is searched The shop addresses search page URL are downloaded in the addresses rope page URL, and matching obtains the addresses all commodity URL under the shop, and further ties The addresses commodity details page URL are closed, judge whether that commodity download details page source code, and if so, parsing merchandise news.
8. being used for the big data analysis system of shopping website as described in claim 1, which is characterized in that the data analysis mould Block to the data analysis inlet flow and show data feedback inlet flow analyzed with establish index include:
The data analysis module initializes index service first, then initializes the data analysis inlet flow and shows data All classifications of commodity in feed back input stream, addition need to increase the project of index, respectively original article data and concern commodity Addition index.
9. being used for the big data analysis system of shopping website as described in claim 1, which is characterized in that the data analysis mould Block to data synchronize including:
The classification of the data analysis module load commodity, obtains original article data, commodity focused data and shop attention number According to;
Commodity focused data and shop focused data are updated to the raw data base by the data analysis module;
Thread is arranged to the commodity data in the data analysis module, and carries out data synchronization and statistics of attributes.
10. being used for the big data analysis system of shopping website as described in claim 1, which is characterized in that the data exhibiting Modules exhibit the following contents:Login interface, home interface, industry analysis interface, shop assay surface, commercial analysis interface, account Family interface and system management interface.
CN201510203342.8A 2015-04-24 2015-04-24 A kind of big data analysis system for shopping website Expired - Fee Related CN104778273B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510203342.8A CN104778273B (en) 2015-04-24 2015-04-24 A kind of big data analysis system for shopping website

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510203342.8A CN104778273B (en) 2015-04-24 2015-04-24 A kind of big data analysis system for shopping website

Publications (2)

Publication Number Publication Date
CN104778273A CN104778273A (en) 2015-07-15
CN104778273B true CN104778273B (en) 2018-10-09

Family

ID=53619737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510203342.8A Expired - Fee Related CN104778273B (en) 2015-04-24 2015-04-24 A kind of big data analysis system for shopping website

Country Status (1)

Country Link
CN (1) CN104778273B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760446A (en) * 2016-02-03 2016-07-13 杭州驭猫科技有限公司 Big data analysis method for shopping website
CN105825399A (en) * 2016-03-15 2016-08-03 焦点科技股份有限公司 Internet based B2B e-commerce information collecting method
CN106097009A (en) * 2016-06-13 2016-11-09 上海钢联电子商务股份有限公司 A kind of staple commodities industry data Analysis Service platform
CN106127394A (en) * 2016-06-24 2016-11-16 广州若羽臣科技股份有限公司 A kind of data control method, Data Control terminal, Data Control platform and system
CN106447441A (en) * 2016-09-21 2017-02-22 上海鲶鱼网络科技有限公司 Data processing system and method
CN106570721A (en) * 2016-10-19 2017-04-19 帘盟科技(上海)股份有限公司 Curtain big data acquisition and analysis system
CN108665335B (en) * 2017-04-01 2021-09-14 北京京东尚科信息技术有限公司 Method for processing shopping cart data of shopping website
CN108171076B (en) * 2017-12-22 2021-04-02 湖北工业大学 Big data correlation analysis method and system for protecting privacy of consumers in electronic transaction
CN112434204A (en) * 2020-11-23 2021-03-02 洛阳建企大数据服务有限公司 Automatic data acquisition system and method for multi-source website

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206744A (en) * 2006-12-21 2008-06-25 叶克 Method for interfusing commercial articles self-defining information in shopping search engine
CN102368831A (en) * 2011-12-01 2012-03-07 内蒙古中大传媒发展有限公司 Survey method for audience rating of digital television users
CN102594870A (en) * 2011-05-31 2012-07-18 北京亿赞普网络技术有限公司 Cloud computing platform, cloud computing system and service information publishing method for cloud computing system
CN102750654A (en) * 2011-04-20 2012-10-24 中国南方电网有限责任公司 Power dispatching information disclosure platform for large power grid
EP2712451A1 (en) * 2011-05-10 2014-04-02 Thales Canada Inc. Data analysis system
CN103761296A (en) * 2014-01-20 2014-04-30 北京集奥聚合科技有限公司 Method and system for analyzing network behaviors of mobile terminal users

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206744A (en) * 2006-12-21 2008-06-25 叶克 Method for interfusing commercial articles self-defining information in shopping search engine
CN102750654A (en) * 2011-04-20 2012-10-24 中国南方电网有限责任公司 Power dispatching information disclosure platform for large power grid
EP2712451A1 (en) * 2011-05-10 2014-04-02 Thales Canada Inc. Data analysis system
CN102594870A (en) * 2011-05-31 2012-07-18 北京亿赞普网络技术有限公司 Cloud computing platform, cloud computing system and service information publishing method for cloud computing system
CN102368831A (en) * 2011-12-01 2012-03-07 内蒙古中大传媒发展有限公司 Survey method for audience rating of digital television users
CN103761296A (en) * 2014-01-20 2014-04-30 北京集奥聚合科技有限公司 Method and system for analyzing network behaviors of mobile terminal users

Also Published As

Publication number Publication date
CN104778273A (en) 2015-07-15

Similar Documents

Publication Publication Date Title
CN104778273B (en) A kind of big data analysis system for shopping website
CN106446228B (en) Method and device for collecting and analyzing WEB page data
JP6286104B2 (en) Display method, apparatus, server, program and recording medium for social network information stream
CN104573054B (en) A kind of information-pushing method and equipment
CN110245069B (en) Page version testing method and device and page display method and device
CN107729336A (en) Data processing method, equipment and system
JP2010067175A (en) Hybrid content recommendation server, recommendation system, and recommendation method
CN104504159B (en) Application of the positive and negative sequence pattern of multiple supports in customers buying behavior analysis
CN105095231A (en) Method and device for presenting search result
JP2007080210A (en) Information management device, information management method, information management program and recording medium
US9558185B2 (en) Method and system to discover and recommend interesting documents
US20160162583A1 (en) Apparatus and method for searching information using graphical user interface
CN104077415A (en) Searching method and device
CN108170731A (en) Data processing method, device, computer storage media and server
CN111582951A (en) Advertisement putting system and method for cloud electronic commerce
CN103077217A (en) Method, device and equipment for providing result additional information matched with query sequence
CN112052397B (en) User characteristic generation method and device, electronic equipment and storage medium
CN112825089A (en) Article recommendation method, article recommendation device, article recommendation equipment and storage medium
US20230409551A1 (en) Scalable Fine Grained Access Control Within A Search Engine
WO2014034383A1 (en) Information processing device, record location information specification method, and information processing program
CN105808605B (en) A kind of search log merging method and system
JP2011515754A5 (en)
CN108280102A (en) Internet behavior recording method, device and user terminal
US11170039B2 (en) Search system, search criteria setting device, control method for search criteria setting device, program, and information storage medium
CN105989171A (en) Media file processing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210115

Address after: 226000 room 304, building 3, yingnuoyuan Science Park, Chongchuan District, Nantong City, Jiangsu Province

Patentee after: Taojin hechuang e-commerce Jiangsu Co.,Ltd.

Address before: 226000 7th floor, 375 Century Avenue, Chongchuan District, Nantong City, Jiangsu Province

Patentee before: TAOJIN INFORMATION TECHNOLOGY JIANGSU Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20181009

CF01 Termination of patent right due to non-payment of annual fee