CN104778273A - Big data analysis system for shopping website - Google Patents

Big data analysis system for shopping website Download PDF

Info

Publication number
CN104778273A
CN104778273A CN201510203342.8A CN201510203342A CN104778273A CN 104778273 A CN104778273 A CN 104778273A CN 201510203342 A CN201510203342 A CN 201510203342A CN 104778273 A CN104778273 A CN 104778273A
Authority
CN
China
Prior art keywords
data
data analysis
shop
module
commodity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510203342.8A
Other languages
Chinese (zh)
Other versions
CN104778273B (en
Inventor
邵明前
徐胜飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taojin Hechuang E Commerce Jiangsu Co ltd
Original Assignee
Panning Information Technology Jiangsu Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panning Information Technology Jiangsu Co filed Critical Panning Information Technology Jiangsu Co
Priority to CN201510203342.8A priority Critical patent/CN104778273B/en
Publication of CN104778273A publication Critical patent/CN104778273A/en
Application granted granted Critical
Publication of CN104778273B publication Critical patent/CN104778273B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computing Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a big data analysis system for a shopping website. The big data analysis system comprises a data acquisition module, a data analysis module and a data display module, wherein the data acquisition module is used for acquiring raw data of the shopping website and storing the raw data in a raw data base; the data analysis module is used for receiving a data analysis input stream formed by the raw data of the raw data base, analyzing the data analysis input stream so as to establish index statistics, synchronizing the data analysis input stream and corresponding data in the data acquisition module to form a data analysis output stream, and outputting the data analysis output stream to a data base after analysis; the data display module is used for receiving display data of the data analysis output stream in the data base after analysis, and displaying the display data to the user for viewing in an interface form according to the difference of types. According to the big data analysis system provided by the invention, the acquisition, analysis and display are integrated; the data analysis result is displayed to customers by real-time data acquisition and complete data analysis.

Description

A kind of large data analysis system for shopping website
Technical field
The present invention relates to large technical field of data processing, particularly a kind of large data analysis system for shopping website.
Background technology
The current same domain in data-storing aspect adopts the storage mode of Mysql database substantially, and this storage structure is for being fit closely below data volume millions.But, if exceed this order of magnitude, serious data blocking consequence can be produced, thus influential system performance.
Summary of the invention
Object of the present invention is intended at least solve one of described technological deficiency.
For this reason, the object of the invention is to propose a kind of large data analysis system for shopping website, but realize centralized procurement collection, analyze, be showed in one, by real-time data acquisition, complete data analysis, shows client by data results.
To achieve these goals, embodiments of the invention provide a kind of large data analysis system for shopping website, for gathering the data of shopping website and analyze, the described large data analysis system for shopping website comprises: data acquisition module, data analysis module and data exhibiting module, wherein, described data acquisition module is for gathering the raw data of described shopping website and being stored in raw data base, wherein, the raw data of described shopping website comprises: universal class order data, AM/BAM classification associated data, infima species merchandise news now and store information, the data analysis inlet flow that described data analysis module is formed for the raw data received in described raw data base, and analyze to set up index and statistics to described data analysis inlet flow, and carry out synchronous to described data analysis inlet flow with the corresponding data in described data acquisition module, form data analysis output stream, and database after exporting analysis to, described data exhibiting module is for receiving the demonstrating data of the data analysis output stream after described analysis in database, and described demonstrating data is presented to user according to the difference of type with interface form checks, and receive the concern instruction of user's input, wherein said concern instruction comprises the title paying close attention to commodity and the title paying close attention to shop, described data exhibiting module also represents system interaction data for generating according to described concern instruction, and the described system interaction data that represent are sent to database after described analysis, described data analysis module is also for representing the demonstrating data feed back input stream that system interaction data are formed in database after receiving described analysis, analyze to obtain the title paying close attention to commodity and the title paying close attention to shop to described demonstrating data feed back input stream, index is set up to concern commodity, and carry out synchronous to described concern merchandise news and concern store information with the corresponding data in described data exhibiting module, form demonstrating data feedback output stream, export described one-tenth demonstrating data feedback output stream to described raw data base, described data acquisition module is also for receiving the demonstrating data feedback output stream in described raw data base, and pay close attention to commodity according to described demonstrating data feedback output stream preferential collection and pay close attention to the information in shop, and be presented to described user by described data analysis module and data exhibiting module and check.
In one embodiment of the invention, described data acquisition module adopts Mysql database server, and described data analysis module and described data exhibiting module adopt Mysql database server and Solr database server.
In yet another embodiment of the present invention, the universal class order data that described data acquisition module obtains described shopping website comprise: described data acquisition module runs detection first according to configuration information, if there is scheme of classes, then inquire about the one-level classification under class target, otherwise carry out the inquiry of universal class order; Described data acquisition module calls described shopping website backstage classification Api, upgrades scheme of classes according to rreturn value.
In one embodiment of the invention, the AM/BAM classification associated data that described data acquisition module obtains described shopping website comprises: described data acquisition module adds one or more foreground one-level classification according to the scheme of classes after renewal, and the foreground category ID according to described foreground one-level classification splices searched page; According to page code, described data acquisition module judges whether described foreground one-level classification has subprime directory, if had, is judged as parent directory, obtain subprime directory information according to the page; If there is no subprime directory, be then judged as sub-directory, obtain corresponding backstage classification ID by commodity under this catalogue, obtain other background class attributes corresponding by described backstage classification ID.
In yet another embodiment of the present invention, the infima species merchandise news now that described data acquisition module obtains described shopping website comprises: the classification that described data acquisition module gathers as required, judge whether to there is attribute tags by URL downloading page, if there is no then URL address and attribute information is collected, if existed, judge whether to there is sub-attribute, if there is sub-attribute, collect URL address and sub-attribute information, according to the acquisition URL address searching page, according to page source code coupling merchandise news.
In yet another embodiment of the present invention, the infima species store information now that described data acquisition module obtains described shopping website comprises with the letter paying close attention to shop: the preferential shop data obtaining concern shop, then the shop data in scheme of classes are obtained, judge whether to there is the URL page downloaded shop ID and splice, if existed, then mate the page and obtain store information, otherwise deleting the corresponding data in scheme of classes; Described data acquisition module judges whether new store information is empty, if it is adds new store information, obtains the shop name needing newly-increased shop, and splicing URL address also judges whether to there is downloading page, if existed, then mates the page and obtains store information.
In one embodiment of the invention, described data collecting module collected is paid close attention to merchandise news and is comprised: described data acquisition module obtains to be paid close attention to commodity ID and pays close attention to shop ID, splicing commodity details page URL address and search page URL address, shop, download search page URL address, shop, coupling obtains all commodity URL addresses under this shop, and further combined with commodity details page URL address, judge whether that there are commodity downloads details page source code, if existed, parses merchandise news.
In one embodiment of the invention, described data analysis module is analyzed to set up index to described data analysis inlet flow and demonstrating data feed back input stream and is comprised: described data analysis module first initialization index service, then all classifications of commodity in data analysis inlet flow described in initialization and demonstrating data feed back input stream, interpolation need increase the project of index, is respectively original article data and pays close attention to commodity to add index.
In yet another embodiment of the present invention, described data analysis module synchronously comprises data: described data analysis module loads the classification of commodity, obtains original article data, commodity focused data and shop focused data; Commodity focused data and shop focused data are updated to described raw data base by described data analysis module; Described data analysis module arranges thread to described commodity data, and carries out data syn-chronization and statistics of attributes.
In one embodiment of the invention, the following content of described data exhibiting modules exhibit: login interface, homepage interface, industry analysis interface, shop assay surface, commercial analysis interface, account interface and system management interface.
The large data analysis system for shopping website of the embodiment of the present invention is centralized procurement collection, analyzes, is showed in the aggregate data system of one, and by real-time data acquisition, complete data analysis, shows client by data results.Adopt the mode that the whole network gathers, the data volume of embodiment is large, and collected object is more comprehensive, has comprehensive directive function to use object analysis market conditions, research and development of products.The present invention adopts the mode of Mysql and solr database combination, and do big data quantity with solr non-relational database and store, do analytic statistics with Mysql, both combinations can avoid data to block, and improve system performance.
The aspect that the present invention adds and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or additional aspect of the present invention and advantage will become obvious and easy understand from accompanying drawing below combining to the description of embodiment, wherein:
Fig. 1 is the structural drawing of the large data analysis system for shopping website according to the embodiment of the present invention;
Fig. 2 is the data interaction schematic diagram of the large data analysis system for shopping website according to the embodiment of the present invention;
Fig. 3 is the workflow diagram of the data acquisition module according to the embodiment of the present invention;
Fig. 4 obtains universal class object process flow diagram according to the data acquisition module of the embodiment of the present invention;
Fig. 5 is the process flow diagram obtaining the association of AM/BAM classification according to the data acquisition module of the embodiment of the present invention;
Fig. 6 is the process flow diagram obtaining infima species merchandise news now according to the data acquisition module of the embodiment of the present invention;
Fig. 7 is the process flow diagram obtaining infima species store information now according to the data acquisition module of the embodiment of the present invention;
Fig. 8 is the process flow diagram increasing store information according to the data acquisition module of the embodiment of the present invention;
Fig. 9 is the process flow diagram of commodity and concern commodity under the data acquisition module acquisition concern shop according to the embodiment of the present invention;
Figure 10 is the workflow diagram of the data analysis module according to the embodiment of the present invention;
Figure 11 is the process flow diagram carrying out data directory and analysis according to the data analysis module of the embodiment of the present invention;
Figure 12 is the process flow diagram carrying out data syn-chronization according to the data analysis module of the embodiment of the present invention;
Figure 13 is the process flow diagram carrying out data statistics according to the data analysis module of the embodiment of the present invention;
Figure 14 is the topological diagram of the data exhibiting modules exhibit content according to the embodiment of the present invention.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Be exemplary below by the embodiment be described with reference to the drawings, be intended to for explaining the present invention, and can not limitation of the present invention be interpreted as.
As shown in Figure 1, the large data analysis system for shopping website of the embodiment of the present invention, for gathering the data of shopping website and analyze, such as: Taobao.This large data analysis system being used for shopping website comprises: data acquisition module 1, data analysis module 2 and data exhibiting module 3.
Fig. 2 is the data interaction schematic diagram of the large data analysis system for shopping website according to the embodiment of the present invention.
As shown in Figure 2, data acquisition module 1 gathers the raw data of shopping website and is stored in raw data base.Wherein, the raw data of shopping website comprises: universal class order data, AM/BAM classification associated data, infima species merchandise news now and store information.
Data analysis module 2 receives the data analysis inlet flow that the raw data in raw data base is formed, and analyze to set up index and statistics to data analysis inlet flow, and carry out synchronous to data analysis inlet flow with the corresponding data in data acquisition module 1, form data analysis output stream, and database after exporting analysis to.
The demonstrating data of the data analysis output stream after data exhibiting module 3 receiving and analyzing in database, and demonstrating data is presented to user according to the difference of type with interface form checks, and receive the concern instruction of user's input.Wherein, pay close attention to instruction and comprise the title paying close attention to commodity and the title paying close attention to shop.Such as, user can input above-mentioned concern instruction by the keyboard that is connected with data exhibiting module or touch-screen.
Data exhibiting module 3 also generates according to concern instruction and represents system interaction data, and will represent system interaction data and be sent to and analyze afterwards database.
The demonstrating data feed back input stream representing the formation of system interaction data after data analysis module 2 goes back receiving and analyzing in database, analyze to obtain the title paying close attention to commodity and the title paying close attention to shop to demonstrating data feed back input stream, index is set up to concern commodity, and carry out synchronous to concern merchandise news and concern store information with the corresponding data in data exhibiting module 3, form demonstrating data feedback output stream, export one-tenth demonstrating data feedback output stream to raw data base.
Data acquisition module 1 also receives the demonstrating data feedback output stream in raw data base, and pays close attention to commodity according to demonstrating data feedback output stream preferential collection and pay close attention to the information in shop, and is presented to user by data analysis module 2 and data exhibiting module 3 and checks.
Namely, first the raw data of shopping website can be presented to user and check by the large data analysis system for shopping website of the embodiment of the present invention, can according to the interest of oneself after user checks, the trade name that input is paid close attention to and shop title, feed back to the large data analysis system for shopping website, to be re-started according to the concern of user by it and gather and analyze a series of actions, and be presented to user and check.
In one embodiment of the invention, data acquisition module 1 adopts Mysql database server.Particularly, the hardware device of data acquisition module 1 comprises server, and the software that server uses is Mysql, jdk etc.
Data analysis module 2 adopts Mysql database server and Solr database server.Particularly, the hardware device of data analysis module 2 comprises server, server uses software for Mysql, solr, jdk etc.
Data exhibiting module 3 adopts Mysql database server and Solr database server.Particularly, the hardware device of data exhibiting module 3 comprises server, and the software that server uses is mysql, solr, jdk, tomcat etc.
Large data analysis system for shopping website of the present invention adopts the mode of mysql and solr database combination, do big data quantity with solr non-relational database to store, do analytic statistics with Mysql, both combinations can avoid data to block, and improve system performance.
Fig. 3 is the workflow diagram of the data acquisition module according to the embodiment of the present invention.Take shopping website as Taobao for example is described.
Step S301, obtains universal class order Api.
Step S302, Taobao's classification is front-background related, obtains infima species commodity now.
Step S303, obtains commodity collection and searches plain page.
Step S304, obtains the commodity details page of commodity details page and the commodity under concern shop paying close attention to commodity.
Step S305, Information Monitoring page in shop under acquisition concern store information.
Particularly, the present invention adopts the acquisition mode of classification refinement, and in thinning process, interspersed multithreading, maintains the whole network universal class object feature on the one hand, also optimizes picking rate and quality on the other hand.
Fig. 4 obtains universal class object process flow diagram according to the data acquisition module of the embodiment of the present invention.
Step S401, obtains configuration information.
Step S402, runs detection first, judges whether to there is scheme of classes, if existed, performs step S403, otherwise performs step S404.
Step S403, inquiry scheme of classes one-level classification.
Step S404, inquiry universal class order.
Step S405, calls Taobao backstage classification Api according to the one-level classification in step S403, and calls Taobao backstage classification Api according to the universal class order in step S404.
Step S406, upgrades scheme of classes (category) according to rreturn value.
Fig. 5 is the process flow diagram obtaining the association of AM/BAM classification according to the data acquisition module of the embodiment of the present invention.
Step S501, adds one or more foreground one-level classification.
Step S502, splices searched page according to foreground category ID.
Step S503, judges whether it is have subprime directory according to page code, if had, then performs step S504, otherwise performs step S506.
Step S504, judges that this foreground category is parent directory, obtains subprime directory information according to the page.
Step S505, by subprime directory information write into Databasce (category_front).
Step S506, judges that this foreground category is sub-directory, obtains corresponding backstage classification ID by commodity under this catalogue.
Step S507, obtains other backstage classification attributes corresponding by backstage classification ID, such as: industry and parent order etc., then performs step S505.
Step S508, whether judgement need process foreground category is empty (null), if it is terminates, otherwise returns step S502.
Fig. 6 is the process flow diagram obtaining infima species merchandise news now according to the data acquisition module of the embodiment of the present invention.
Step S601, obtains the classification that need gather by database (category_front).
Step S602, judges whether to there is attribute tags by URL downloading page, if existed, then performs step S603, otherwise performs step S605.
Step S603, judges that attribute has s.m.p attribute, if existed, performs step S604, otherwise performs step S605.
Step S604, obtains sub-attribute, then performs step S605.
Step S605, collects URL address and attribute information.
Step S606, write into Databasce (auction_list_url).
Step S607, by the URL information searched page got.
Step S608, according to page source code coupling merchandise news.
Step S609, write into Databasce (auction).
Fig. 7 is the process flow diagram obtaining infima species store information now according to the data acquisition module of the embodiment of the present invention.
Step S701, safeguards existing store information.
Step S702, the preferential shop data (cust_shop) obtaining concern shop.
Step S703, obtains original shop data.
Step S704, judges whether to there is the URL page downloaded shop ID and splice, if existed, performs step S705, otherwise performs step S706.
Step S707, by store information write into Databasce (shop).
Step S708, judges whether the shop that need upgrade is empty, if so, then performs step S709, otherwise returns step S704.
Step S709, adds new store information.
Fig. 8 is the process flow diagram increasing store information according to the data acquisition module of the embodiment of the present invention.
Step S801, starts to add new store information.
Step S802, obtains the shop name (nick_new) that need increase shop newly.
Step S803, judges whether the downloading page that there is splicing URL formation, if existed, then performs step S804, otherwise perform step S806.
Step S804, the coupling page obtains store information.
Step S805, by store information write into Databasce (shop).
Step S806, judges whether the shop that need add is empty, if it is terminates, otherwise returns step S803.
Fig. 9 is the process flow diagram of commodity and concern commodity under the data acquisition module acquisition concern shop according to the embodiment of the present invention.
Step S901, obtains and pays close attention to commodity ID (cust_auction).
Step S902, splicing commodity details page URL address.
Step S903, obtains and pays close attention to shop ID (cust_shop).
Step S904, splicing details page URL address, shop.
Step S905, downloading page, under this shop of coupling acquisition the URL address of all commodity.
Step S906, judges whether to exist and downloads details page source code by commodity URL, if existed, then perform S907, otherwise perform step S908.
Step S907, resolves merchandise news, and write into Databasce (auciton_concern).
Step S908, judges that whether obtain commodity URL is empty, if so, then terminates, otherwise returns step S906.
Figure 10 is the workflow diagram of the data analysis module according to the embodiment of the present invention.
Step S1001, data directory and analysis, set up index database.
Step S1002, data are carried out synchronous with data analysis module and data exhibiting module by data analysis module accordingly.
Step S1003, data statistics, thinks data exhibiting service.
Figure 11 is the process flow diagram carrying out data directory and analysis according to the data analysis module of the embodiment of the present invention.
Step S1101, initialization index service.
Step S1102, all classifications of initialization commodity, load foreground and backstage classification.
Step S1103, adds the project that need increase index.
Step S1104, adds original article data directory.
Step S1105, adds and pays close attention to commodity index.
Step S1106, whether judgement need add index item is empty, if it is terminates, otherwise returns step S1104 and S1105.
Figure 12 is the process flow diagram carrying out data syn-chronization according to the data analysis module of the embodiment of the present invention.
Step S1201, loads the classification of commodity.
Step S1202, carries out process to original article data also synchronous.
Step S1203, pays close attention to commodity to user and carries out synchronously.
Step S1204, pays close attention to shop to user and carries out synchronously.
Step S1205, obtains commodity data (auction).
Step S1206, obtains and pays close attention to commodity data (cust_auction).
Step S1207, obtains and pays close attention to shop data (cust_shop).
Step S1208, arranges thread, need arrange multithreading according to data volume herein.
Step S1209, is updated to raw data base (cust_auction) by concern commodity data.
Step S1210, will pay close attention to shop Data Update to raw data base (cust_shop).
Step S1211, synchrodata, is stored to database (statistics_cache_auction).
Step S1212, statistics of attributes, is stored to database (statistics_property).
Step S1213, judges whether the commodity that need add up terminate, and if it is terminate, otherwise return step S1208.
Figure 13 is the process flow diagram carrying out data statistics according to the data analysis module of the embodiment of the present invention.
Step S1301, loads AM/BAM classification.
Step S1302, commodity statistics.
Step S1303, shop is added up.
Step S1304, industry statistic.
Step S1305, obtains and pays close attention to commodity, statistics commodity data (statistics_cache_auction and cust_auction) association.
Step S1306, obtains and pays close attention to shop, adds up all commodity data (statistics_cache_auction and the cust_shop) associations under this concern shop.
Step S1307, resets database (statistics_auction).
Step S1308, by commodity statistics value write into Databasce (statistics_auction).
Step S1309, obtains and pays close attention to shop, statistics shop data (statistics_cache_auction and cust_shop) association.
Step S1310, adds up original shop data.
Step S1311, by commodity statistics moon sales volume.
Step S1312, by classification statistics moon sales volume.
Step S1313, resets database (statistics_shop).
Step S1314, by shop statistical value write into Databasce (statistics_shop).
Step S1315, the Statistics data of nearly 30 days.
Step S1316, the Statistics moon data.
Step S1317, resets database (statistics_industry).
Step S1318, by industry statistic value write (statistics_industry).
Figure 14 is the topological diagram of the data exhibiting modules exhibit content according to the embodiment of the present invention.Take shopping website as Taobao for example is described.
As shown in figure 14, data exhibiting module 3 shows that content comprises: login interface, homepage interface, industry analysis interface, shop assay surface, commercial analysis interface, account interface, poly-interface to one's profit and system management interface.
Homepage interface: the moon Sales Volume of Commodity rank and the moon shop sales volume rank.
Industry analysis interface: industry analysis and industry interesting episode.
Shop assay surface: shop search, single shop are analyzed and shop contrast, wherein, single shop is analyzed and is had shop statistics.
Commercial analysis interface: fast-selling inquiry and concern commodity.
Account interface: safety amendment.
Poly-interface to one's profit: gather and calculate inquiry and poly-concern to one's profit.
Two 11 interfaces: store ranking and commodity rank.
System management interface: user management.
The large data analysis system for shopping website of the embodiment of the present invention is centralized procurement collection, analyzes, is showed in the aggregate data system of one, and by real-time data acquisition, complete data analysis, shows client by data results.Adopt the mode that the whole network gathers, the data volume of embodiment is large, and collected object is more comprehensive, has comprehensive directive function to use object analysis market conditions, research and development of products.The present invention adopts the mode of Mysql and solr database combination, and do big data quantity with solr non-relational database and store, do analytic statistics with Mysql, both combinations can avoid data to block, and improve system performance.
In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, identical embodiment or example are not necessarily referred to the schematic representation of above-mentioned term.And the specific features of description, structure, material or feature can combine in an appropriate manner in any one or more embodiment or example.
Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, those of ordinary skill in the art can change above-described embodiment within the scope of the invention when not departing from principle of the present invention and aim, revising, replacing and modification.Scope of the present invention is by claims extremely equivalency.

Claims (10)

1. the large data analysis system for shopping website, it is characterized in that, for gathering the data of shopping website and analyze, the described large data analysis system for shopping website comprises: data acquisition module, data analysis module and data exhibiting module, wherein
Described data acquisition module is for gathering the raw data of described shopping website and being stored in raw data base, wherein, the raw data of described shopping website comprises: universal class order data, AM/BAM classification associated data, infima species merchandise news now and store information;
The data analysis inlet flow that described data analysis module is formed for the raw data received in described raw data base, and analyze to set up index and statistics to described data analysis inlet flow, and carry out synchronous to described data analysis inlet flow with the corresponding data in described data acquisition module, form data analysis output stream, and database after exporting analysis to;
Described data exhibiting module is for receiving the demonstrating data of the data analysis output stream after described analysis in database, and described demonstrating data is presented to user according to the difference of type with interface form checks, and receive the concern instruction of user's input, wherein said concern instruction comprises the title paying close attention to commodity and the title paying close attention to shop, described data exhibiting module also represents system interaction data for generating according to described concern instruction, and the described system interaction data that represent are sent to database after described analysis;
Described data analysis module is also for representing the demonstrating data feed back input stream that system interaction data are formed in database after receiving described analysis, analyze to obtain the title paying close attention to commodity and the title paying close attention to shop to described demonstrating data feed back input stream, index is set up to concern commodity, and carry out synchronous to described concern merchandise news and concern store information with the corresponding data in described data exhibiting module, form demonstrating data feedback output stream, export described one-tenth demonstrating data feedback output stream to described raw data base;
Described data acquisition module is also for receiving the demonstrating data feedback output stream in described raw data base, and pay close attention to commodity according to described demonstrating data feedback output stream preferential collection and pay close attention to the information in shop, and be presented to described user by described data analysis module and data exhibiting module and check.
2. as claimed in claim 1 for the large data analysis system of shopping website, it is characterized in that, described data acquisition module adopts Mysql database server, and described data analysis module and described data exhibiting module adopt Mysql database server and Solr database server.
3., as claimed in claim 1 for the large data analysis system of shopping website, it is characterized in that, the universal class order data that described data acquisition module obtains described shopping website comprise:
Described data acquisition module runs detection first according to configuration information, if there is scheme of classes, then inquires about the one-level classification under class target, otherwise carries out the inquiry of universal class order;
Described data acquisition module calls described shopping website backstage classification Api, upgrades scheme of classes according to rreturn value.
4., as claimed in claim 3 for the large data analysis system of shopping website, it is characterized in that, the AM/BAM classification associated data that described data acquisition module obtains described shopping website comprises:
Described data acquisition module adds one or more foreground one-level classification according to the scheme of classes after renewal, and the foreground category ID according to described foreground one-level classification splices searched page;
According to page code, described data acquisition module judges whether described foreground one-level classification has subprime directory, if had, is judged as parent directory, obtain subprime directory information according to the page;
If there is no subprime directory, be then judged as sub-directory, obtain corresponding backstage classification ID by commodity under this catalogue, obtain other background class attributes corresponding by described backstage classification ID.
5., as claimed in claim 1 for the large data analysis system of shopping website, it is characterized in that, the infima species merchandise news now that described data acquisition module obtains described shopping website comprises:
The classification that described data acquisition module gathers as required, judge whether to there is attribute tags by URL downloading page, if there is no then URL address and attribute information is collected, if existed, judge whether to there is sub-attribute, if there is sub-attribute, collect URL address and sub-attribute information, according to the acquisition URL address searching page, according to page source code coupling merchandise news.
6. as claimed in claim 1 for the large data analysis system of shopping website, it is characterized in that, the infima species store information now that described data acquisition module obtains described shopping website comprises with the letter paying close attention to shop:
The preferential shop data obtaining concern shop, then obtain the shop data in scheme of classes, judge whether to there is the URL page downloaded shop ID and splice, if existence, then mate the page and obtain store information, otherwise deleting the corresponding data in scheme of classes;
Described data acquisition module judges whether new store information is empty, if it is adds new store information, obtains the shop name needing newly-increased shop, and splicing URL address also judges whether to there is downloading page, if existed, then mates the page and obtains store information.
7. as claimed in claim 1 for the large data analysis system of shopping website, it is characterized in that, described data collecting module collected is paid close attention to merchandise news and is comprised:
Described data acquisition module obtains to be paid close attention to commodity ID and pays close attention to shop ID, splicing commodity details page URL address and search page URL address, shop, download search page URL address, shop, coupling obtains all commodity URL addresses under this shop, and further combined with commodity details page URL address, judge whether that there are commodity downloads details page source code, if existed, parses merchandise news.
8., as claimed in claim 1 for the large data analysis system of shopping website, it is characterized in that, described data analysis module is analyzed to set up index to described data analysis inlet flow and demonstrating data feed back input stream and is comprised:
Described data analysis module first initialization index service, then all classifications of commodity in data analysis inlet flow described in initialization and demonstrating data feed back input stream, add the project that need increase index, are respectively original article data and pay close attention to commodity and add index.
9., as claimed in claim 1 for the large data analysis system of shopping website, it is characterized in that, described data analysis module synchronously comprises data:
Described data analysis module loads the classification of commodity, obtains original article data, commodity focused data and shop focused data;
Commodity focused data and shop focused data are updated to described raw data base by described data analysis module;
Described data analysis module arranges thread to described commodity data, and carries out data syn-chronization and statistics of attributes.
10. as claimed in claim 1 for the large data analysis system of shopping website, it is characterized in that, the following content of described data exhibiting modules exhibit: login interface, homepage interface, industry analysis interface, shop assay surface, commercial analysis interface, account interface and system management interface.
CN201510203342.8A 2015-04-24 2015-04-24 A kind of big data analysis system for shopping website Expired - Fee Related CN104778273B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510203342.8A CN104778273B (en) 2015-04-24 2015-04-24 A kind of big data analysis system for shopping website

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510203342.8A CN104778273B (en) 2015-04-24 2015-04-24 A kind of big data analysis system for shopping website

Publications (2)

Publication Number Publication Date
CN104778273A true CN104778273A (en) 2015-07-15
CN104778273B CN104778273B (en) 2018-10-09

Family

ID=53619737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510203342.8A Expired - Fee Related CN104778273B (en) 2015-04-24 2015-04-24 A kind of big data analysis system for shopping website

Country Status (1)

Country Link
CN (1) CN104778273B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760446A (en) * 2016-02-03 2016-07-13 杭州驭猫科技有限公司 Big data analysis method for shopping website
CN105825399A (en) * 2016-03-15 2016-08-03 焦点科技股份有限公司 Internet based B2B e-commerce information collecting method
CN106097009A (en) * 2016-06-13 2016-11-09 上海钢联电子商务股份有限公司 A kind of staple commodities industry data Analysis Service platform
CN106127394A (en) * 2016-06-24 2016-11-16 广州若羽臣科技股份有限公司 A kind of data control method, Data Control terminal, Data Control platform and system
CN106447441A (en) * 2016-09-21 2017-02-22 上海鲶鱼网络科技有限公司 Data processing system and method
CN106570721A (en) * 2016-10-19 2017-04-19 帘盟科技(上海)股份有限公司 Curtain big data acquisition and analysis system
CN108171076A (en) * 2017-12-22 2018-06-15 湖北工业大学 Protect the big data correlation analysis and system of consumer privacy in electronic transaction
CN108665335A (en) * 2017-04-01 2018-10-16 北京京东尚科信息技术有限公司 The method for handling the shopping cart data of shopping website
CN112434204A (en) * 2020-11-23 2021-03-02 洛阳建企大数据服务有限公司 Automatic data acquisition system and method for multi-source website

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206744A (en) * 2006-12-21 2008-06-25 叶克 Method for interfusing commercial articles self-defining information in shopping search engine
CN102368831A (en) * 2011-12-01 2012-03-07 内蒙古中大传媒发展有限公司 Survey method for audience rating of digital television users
CN102594870A (en) * 2011-05-31 2012-07-18 北京亿赞普网络技术有限公司 Cloud computing platform, cloud computing system and service information publishing method for cloud computing system
CN102750654A (en) * 2011-04-20 2012-10-24 中国南方电网有限责任公司 Power dispatching information disclosure platform for large power grid
EP2712451A1 (en) * 2011-05-10 2014-04-02 Thales Canada Inc. Data analysis system
CN103761296A (en) * 2014-01-20 2014-04-30 北京集奥聚合科技有限公司 Method and system for analyzing network behaviors of mobile terminal users

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206744A (en) * 2006-12-21 2008-06-25 叶克 Method for interfusing commercial articles self-defining information in shopping search engine
CN102750654A (en) * 2011-04-20 2012-10-24 中国南方电网有限责任公司 Power dispatching information disclosure platform for large power grid
EP2712451A1 (en) * 2011-05-10 2014-04-02 Thales Canada Inc. Data analysis system
CN102594870A (en) * 2011-05-31 2012-07-18 北京亿赞普网络技术有限公司 Cloud computing platform, cloud computing system and service information publishing method for cloud computing system
CN102368831A (en) * 2011-12-01 2012-03-07 内蒙古中大传媒发展有限公司 Survey method for audience rating of digital television users
CN103761296A (en) * 2014-01-20 2014-04-30 北京集奥聚合科技有限公司 Method and system for analyzing network behaviors of mobile terminal users

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760446A (en) * 2016-02-03 2016-07-13 杭州驭猫科技有限公司 Big data analysis method for shopping website
CN105825399A (en) * 2016-03-15 2016-08-03 焦点科技股份有限公司 Internet based B2B e-commerce information collecting method
CN106097009A (en) * 2016-06-13 2016-11-09 上海钢联电子商务股份有限公司 A kind of staple commodities industry data Analysis Service platform
CN106127394A (en) * 2016-06-24 2016-11-16 广州若羽臣科技股份有限公司 A kind of data control method, Data Control terminal, Data Control platform and system
CN106447441A (en) * 2016-09-21 2017-02-22 上海鲶鱼网络科技有限公司 Data processing system and method
CN106570721A (en) * 2016-10-19 2017-04-19 帘盟科技(上海)股份有限公司 Curtain big data acquisition and analysis system
CN108665335A (en) * 2017-04-01 2018-10-16 北京京东尚科信息技术有限公司 The method for handling the shopping cart data of shopping website
CN108171076A (en) * 2017-12-22 2018-06-15 湖北工业大学 Protect the big data correlation analysis and system of consumer privacy in electronic transaction
CN108171076B (en) * 2017-12-22 2021-04-02 湖北工业大学 Big data correlation analysis method and system for protecting privacy of consumers in electronic transaction
CN112434204A (en) * 2020-11-23 2021-03-02 洛阳建企大数据服务有限公司 Automatic data acquisition system and method for multi-source website

Also Published As

Publication number Publication date
CN104778273B (en) 2018-10-09

Similar Documents

Publication Publication Date Title
CN104778273A (en) Big data analysis system for shopping website
JP6967612B2 (en) Information retrieval methods, devices and systems
CN107729336B (en) Data processing method, device and system
JP5721818B2 (en) Use of model information group in search
CN104205017B (en) The method and system for rolling figure is provided
KR100882716B1 (en) Method for recommending information of goods and system for executing the method
CN104246755B (en) The method and system of the Search Results based on video is provided
CN103914545B (en) Search shows method and device
WO2018014109A1 (en) System and method for analyzing and searching for features associated with objects
KR20160137935A (en) Method, apparatus and server of dispalying social network information flow
CN102375885A (en) Method and device for providing search suggestions corresponding to query sequence
WO2014182585A1 (en) Recommending context based actions for data visualizations
US11086855B1 (en) Enterprise connectivity
CN102521416A (en) Data correlation query method and data correlation query device
KR20100044669A (en) Method, system and computer-readable recording medium for providing information on goods based on image matching
US20160162583A1 (en) Apparatus and method for searching information using graphical user interface
AU2011247915B2 (en) Enhancing an inquiry for a search of a database
CN102214206A (en) Method and equipment for establishing association relation between information entities
Reza et al. Modelmine: a tool to facilitate mining models from open source repositories
US10643142B2 (en) Search term prediction
WO2020150277A1 (en) System and method for cross catalog search
KR102150572B1 (en) Endless search result page
EP3062240A1 (en) Search system, search criteria setting device, control method for search criteria setting device, program, and information storage medium
US20140278983A1 (en) Using entity repository to enhance advertisement display
Hubmann-Haidvogel et al. Visualizing contextual and dynamic features of micropost streams

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210115

Address after: 226000 room 304, building 3, yingnuoyuan Science Park, Chongchuan District, Nantong City, Jiangsu Province

Patentee after: Taojin hechuang e-commerce Jiangsu Co.,Ltd.

Address before: 226000 7th floor, 375 Century Avenue, Chongchuan District, Nantong City, Jiangsu Province

Patentee before: TAOJIN INFORMATION TECHNOLOGY JIANGSU Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20181009

CF01 Termination of patent right due to non-payment of annual fee