CN104484367A - Data mining and analyzing system - Google Patents
Data mining and analyzing system Download PDFInfo
- Publication number
- CN104484367A CN104484367A CN201410736242.7A CN201410736242A CN104484367A CN 104484367 A CN104484367 A CN 104484367A CN 201410736242 A CN201410736242 A CN 201410736242A CN 104484367 A CN104484367 A CN 104484367A
- Authority
- CN
- China
- Prior art keywords
- module
- user
- data
- query
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a data mining and analyzing system which comprises an input and output module, an interest information storage module, an inquiry analyzing module, a Web processing module, a result preprocessing module and an inquiry filtering module. The input and output module provides inquiry input and result output for users, the interest information storage module is used for storing interest data information of the users, the inquiry analyzing module analyzes according to inquiry requests of the users to form new inquiry requests, the Web processing module calls multiple webpage data in a parallel manner, the result preprocessing module integrates data information of the Web processing module and then sends the same to the inquiry filtering module, and the inquiry filtering module performs relevancy ranking on the data information in the result preprocessing module according to the data information in the interest information storage module and outputs inquiry results to the users through the input and output module. By the data mining and analyzing system, returned search results are analyzed and processed, and then targeted search results are returned to the users, so that retrieval efficiency is improved.
Description
Technical field
The present invention relates to Data Mining, be specifically related to a kind of data mining analysis system.
Background technology
Along with network information explosive growth, people are not very little by analyzing the information retrieved, but too many, and great majority are all the information irrelevant with inquiry request.Traditional analysis and general meta analysis system more and more can not meet the demand of people, and thus data mining technology becomes the hot issue of searching field research day by day.Data mining generally refers to the process being hidden in wherein information from a large amount of data by algorithm search.Data mining is usually relevant with computer science, and realizes above-mentioned target by all multi-methods such as statistics, Data Environments, information retrieval, machine learning, expert system (relying on thumb rule in the past) and pattern-recognitions.But prior art can not return effective Search Results in time according to the search keyword of user's input.User is to after the returning results and carry out satisfaction evaluation of data mining analysis, and existing system can not carry out study analysis to the satisfaction feedback information of user, and Search Results specific aim is poor.In addition, existing system structural model is unfavorable for the security and the guarantee uniformity for the treatment of that ensure back-end data.Therefore, the shortcoming existed in prior art, is necessary to make improvements prior art.
Summary of the invention
The object of the invention is to overcome shortcoming of the prior art with not enough, a kind of data mining analysis system that can return specific aim Search Results to user is provided.
The present invention is realized by following technical scheme:
A kind of data mining analysis system, comprising:
Input/output module, provides visual inquiry to input for user and result exports;
Interest information memory module, for depositing user interest data message;
Query analysis module, the data message according to interest information memory module is analyzed user's inquiry request, and carries out expanding to query statement and form new longer, inquiry request more accurately;
Web processing module, calls multiple web data, to obtain required web data and web data is sent to result pretreatment module by parallel mode;
Result pretreatment module, sends to query filter module after carrying out integration process to the data message of Web processing module;
Query filter module, carries out relevancy ranking according to the data message in interest information memory module to the data message in result pretreatment module, and Query Result is exported to user by input/output module.
User interest data message in described interest information memory module is the information extraction in user's accessed web page historical record.
Described result output is a linear lists of documents.
Described query filter module comprises receiving processing module and data analysis module, and described receiving processing module receives the index file that user's inquiry request obtains, and to be analyzed and provide Query Result by data analysis module to described index file; Described data analysis module obtains new query statement according to the analysis of user interest data message, obtains required target index file according to new query statement in described index file.
Described query analysis module analysis user behavior obtains user interest data message.
Described user behavior comprises locality and the user's clicking rate that user browses the selectivity of webpage, user browses webpage.
Described user's clicking rate comprises the accessed number of times of the page or the searched number of times of the page.
Described data mining analysis system also comprises satisfaction evaluation module, described satisfaction evaluation module returns to interest information memory module according to the satisfaction information of user to Query Result, carries out relevancy ranking for described query filter module to the data message in result pretreatment module.
Described data mining analysis system has three-decker, comprises presentation layer, Business Logic and Data Persistence Layer.
Relative to prior art, the present invention can return Search Results in time according to the search keyword of user's input, and can carry out study analysis according to user to the feedback information of Search Results, returns Search Results targetedly to user, realize data mining analysis, improve data mining analysis efficiency.The object of data mining analysis is according to the background of user, hobby, research direction, retrieval object etc., to provide corresponding demand information to user.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the schematic diagram of data mining analysis system of the present invention;
Fig. 2 is the query analysis module principle figure of data mining analysis system of the present invention;
Fig. 3 is the three-decker schematic diagram of data mining analysis system of the present invention;
Fig. 4 is the meta analysis schematic diagram of data mining analysis system of the present invention.
In figure:
1. input/output module; 2. interest information memory module; 3. query analysis module; 4.Web processing module; 5. result pretreatment module; 6. query filter module; 7. receiving processing module; 8. data analysis module; 9. index file; 10. target index file; 11. knowledge bases; 12. result treatment modules; 13. presentation layers; 14. Business Logics; 15. Data Persistence Layers.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
A data mining analysis system as shown in Figures 1 to 4, comprising:
Input/output module 1, provides visual inquiry to input for user and result exports; In inquiry input, user can input a series of keyword, a series of boolean operators etc., and result output is a linear lists of documents.
Interest information memory module 2, for depositing user interest data message; User interest data message in interest information memory module 2 is the information extraction in user's accessed web page historical record.Interesting data information not only requires objective, comprehensive representation user interest data knowledge, but also good later stage interest assessment operability will be possessed.
Query analysis module 3, the data message according to interest information memory module 2 is analyzed user's inquiry request, and carries out expanding to query statement and form new longer, inquiry request more accurately; Inquiry request is reasonably set and can reduces invalid content in Search Results greatly, improve search efficiency.This query analysis module 3 is analyzed user behavior and is obtained user interest data message.User behavior comprises locality and the user's clicking rate that user browses the selectivity of webpage, user browses webpage.User browses the selectivity of webpage, when user searches at every turn, analyze and all can return hundreds and thousands of Query Results, if user clicks a Query Result, just can think that user looks this Query Result quality higher, be clicked the page browsed by user and thought by user the page that quality is higher.User browses the locality of webpage, the URL that user clicks is quite concentrated, major part user clicks and drops on several pages above, user's clicking rate of first page accounts for 47% of total click, and the clicking rate of 5 pages accounts for more than 75% of total click above, number of clicks less than the page of total amount 1/3 accounts for 2/3 of total number of clicks, and this shows that user clicks URL and has very strong locality.User's clicking rate, the time existed due to webpage is longer, and the access times adding up to get off may be more, therefore the accessed number of times of webpage can not reflect the quality of a web page contents well.So, user's clicking rate of webpage should be used to reflect the quality of the page.User's clicking rate comprises the accessed number of times of the page or the searched number of times of the page.Although it is all the click under certain query term that each user clicks, result of study shows, under most query term, and the click frequency of URL and basically identical in the click frequency of all query term URL.Therefore, just need not consider that this number of clicks is the number of clicks under what project when calculating user's clicking rate.
Web processing module 4, calls multiple web data, to obtain required web data and web data is sent to result pretreatment module 5 by parallel mode;
Result pretreatment module 5, sends to query filter module 3 after carrying out integration process to the data message of Web processing module 4; Result from different web pages data analysis is integrated, rejects repetition, consolidation form, inspection link validity and classification etc.
Query filter module 6, carries out relevancy ranking according to the data message in interest information memory module 2 to the data message in result pretreatment module 5, and Query Result is exported to user by input/output module 1.This query filter module 6 comprises receiving processing module 7 and data analysis module 8, and this receiving processing module 7 receives the index file 9 that user's inquiry request obtains, and is analyzed by data analysis module 8 and provides Query Result described index file 9; This data analysis module 8 obtains new query statement according to the analysis of user interest data message, obtains required target index file 10 according in new query statement indexed file 9.
Data mining analysis system also comprises satisfaction evaluation module, this satisfaction evaluation module returns to interest information memory module 2 according to the satisfaction information of user to Query Result, carries out relevancy ranking for query filter module 6 to the data message in result pretreatment module 5.User is the direct user of analysis, is also the final judge of service quality quality.Use the investigation of analytical behavior to be that analysis optimization particularly needs to user, and analyze as user's information of looking for provides guide.Hugely simultaneously also expose many problems easily owing to analyzing to bring to the network user, to address these problems in time, analysis being optimized, so then needing a large amount of user profile.And the satisfaction provided during customer analysis and unsatisfied evaluation, a large amount of user profile can be obtained.
Data mining analysis system has three-decker, comprises presentation layer 13, Business Logic 14 and Data Persistence Layer 15.Three-decker can ensure that user accesses and directly not contact background application and data resource, but by access middle layer, obtains the data resource on backstage, so namely can ensure the security of back-end data, can ensure uniformity for the treatment of again.
Data mining analysis refers to the historical record analyzed according to user search, returns the Search Results being more suitable for this user.These search history records comprise the keyword that user searches for, the click situation in Search Results, in the access situation of each website, and bookmark situation etc.Analysis is analyzed after having grasped these subscriber datas, when the keyword that user search is new, can return Search Results more targetedly, thus improves Consumer's Experience.And analyze, it is exactly collection, discovery information in internet with certain technology and strategy, and understands information, extracts and process, and is the service that user provides Web to search for.
Meta analysis regards existing multiple analysis as an entirety, for user provides a unified query interface, the inquiry request of user by meta analysis according to the information in knowledge base 11, be converted to the form that multiple analysis can identify, then each independent analysis called is sent to respectively, actual information retrieval has been analyzed by these, last meta analysis is again by collection that result treatment module 10 returns each analysis, compare analysis, eliminate redundancy information, returns to user with certain form.Meta analysis refers under unified user's query interface and information feed back form, and the knowledge base 11 sharing multiple analysis provides the system of information service for user.
The search keyword that the present invention inputs according to user, returns Search Results in time, collects user search interesting data information simultaneously, returns have more Search Results targetedly in search afterwards to user.User to analyze return results and carry out satisfaction evaluation after, the present invention can carry out study analysis to the satisfaction feedback information of user, improves recall precision.The present invention, according to user interest data message Optimizing Search result, preferentially returns the interested web page contents of user.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (9)
1. a data mining analysis system, is characterized in that, comprising:
Input/output module, provides visual inquiry to input for user and result exports;
Interest information memory module, for depositing user interest data message;
Query analysis module, the data message according to interest information memory module is analyzed user's inquiry request, and carries out expanding to query statement and form new longer, inquiry request more accurately;
Web processing module, calls multiple web data, to obtain required web data and web data is sent to result pretreatment module by parallel mode;
Result pretreatment module, sends to query filter module after carrying out integration process to the data message of Web processing module;
Query filter module, carries out relevancy ranking according to the data message in interest information memory module to the data message in result pretreatment module, and Query Result is exported to user by input/output module.
2. data mining analysis system according to claim 1, is characterized in that: the user interest data message in described interest information memory module is the information extraction in user's accessed web page historical record.
3. data mining analysis system according to claim 1, is characterized in that: described result output is a linear lists of documents.
4. data mining analysis system according to claim 1, it is characterized in that: described query filter module comprises receiving processing module and data analysis module, described receiving processing module receives the index file that user's inquiry request obtains, and to be analyzed and provide Query Result by data analysis module to described index file; Described data analysis module obtains new query statement according to the analysis of user interest data message, obtains required target index file according to new query statement in described index file.
5. data mining analysis system according to claim 1, is characterized in that: described query analysis module analysis user behavior obtains user interest data message.
6. data mining analysis system according to claim 5, is characterized in that: described user behavior comprises locality and the user's clicking rate that user browses the selectivity of webpage, user browses webpage.
7. data mining analysis system according to claim 6, is characterized in that: described user's clicking rate comprises the accessed number of times of the page or the searched number of times of the page.
8. data mining analysis system according to claim 1, it is characterized in that: described data mining analysis system also comprises satisfaction evaluation module, described satisfaction evaluation module returns to interest information memory module according to the satisfaction information of user to Query Result, carries out relevancy ranking for described query filter module to the data message in result pretreatment module.
9. data mining analysis system according to claim 1, is characterized in that: described data mining analysis system has three-decker, comprises presentation layer, Business Logic and Data Persistence Layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410736242.7A CN104484367A (en) | 2014-12-05 | 2014-12-05 | Data mining and analyzing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410736242.7A CN104484367A (en) | 2014-12-05 | 2014-12-05 | Data mining and analyzing system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104484367A true CN104484367A (en) | 2015-04-01 |
Family
ID=52758908
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410736242.7A Pending CN104484367A (en) | 2014-12-05 | 2014-12-05 | Data mining and analyzing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104484367A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106294847A (en) * | 2016-08-22 | 2017-01-04 | 成都天地网络科技有限公司 | Business operation system based on data mining |
CN107103365A (en) * | 2017-04-12 | 2017-08-29 | 邹霞 | The perspective analysis method of machine learning model |
CN109214357A (en) * | 2018-09-30 | 2019-01-15 | 赵学义 | A kind of method and electronic equipment carrying out data mining based on face recognition algorithms |
CN112783294A (en) * | 2021-02-15 | 2021-05-11 | 北京泽桥传媒科技股份有限公司 | User retention data analysis method and device for mobile Internet system |
CN113392304A (en) * | 2020-03-11 | 2021-09-14 | 淄博职业学院 | Big data storage service method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050144162A1 (en) * | 2003-12-29 | 2005-06-30 | Ping Liang | Advanced search, file system, and intelligent assistant agent |
CN101075239A (en) * | 2006-08-23 | 2007-11-21 | 腾讯科技(深圳)有限公司 | Composite searching method and system |
CN101127043A (en) * | 2007-08-03 | 2008-02-20 | 哈尔滨工程大学 | Lightweight individualized search engine and its searching method |
CN102033955A (en) * | 2010-12-24 | 2011-04-27 | 常华 | Method for expanding user search results and server |
-
2014
- 2014-12-05 CN CN201410736242.7A patent/CN104484367A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050144162A1 (en) * | 2003-12-29 | 2005-06-30 | Ping Liang | Advanced search, file system, and intelligent assistant agent |
CN101075239A (en) * | 2006-08-23 | 2007-11-21 | 腾讯科技(深圳)有限公司 | Composite searching method and system |
CN101127043A (en) * | 2007-08-03 | 2008-02-20 | 哈尔滨工程大学 | Lightweight individualized search engine and its searching method |
CN102033955A (en) * | 2010-12-24 | 2011-04-27 | 常华 | Method for expanding user search results and server |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106294847A (en) * | 2016-08-22 | 2017-01-04 | 成都天地网络科技有限公司 | Business operation system based on data mining |
CN107103365A (en) * | 2017-04-12 | 2017-08-29 | 邹霞 | The perspective analysis method of machine learning model |
CN109214357A (en) * | 2018-09-30 | 2019-01-15 | 赵学义 | A kind of method and electronic equipment carrying out data mining based on face recognition algorithms |
CN113392304A (en) * | 2020-03-11 | 2021-09-14 | 淄博职业学院 | Big data storage service method |
CN113392304B (en) * | 2020-03-11 | 2023-05-12 | 淄博职业学院 | Big data storage service method |
CN112783294A (en) * | 2021-02-15 | 2021-05-11 | 北京泽桥传媒科技股份有限公司 | User retention data analysis method and device for mobile Internet system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11663254B2 (en) | System and engine for seeded clustering of news events | |
CN102760138B (en) | Classification method and device for user network behaviors and search method and device for user network behaviors | |
KR101463974B1 (en) | Big data analysis system for marketing and method thereof | |
US20170140038A1 (en) | Method and system for hybrid information query | |
CN105022827A (en) | Field subject-oriented Web news dynamic aggregation method | |
CN101727454A (en) | Method for automatic classification of objects and system | |
CN104077286A (en) | Commodity information search method and system | |
CN104484367A (en) | Data mining and analyzing system | |
CN102737021A (en) | Search engine and realization method thereof | |
CN102722499A (en) | Search engine and implementation method thereof | |
US10127617B2 (en) | System for analyzing social media data and method of analyzing social media data using the same | |
CA2956627A1 (en) | System and engine for seeded clustering of news events | |
Vijiyarani et al. | Research issues in web mining | |
Dias et al. | Automating the extraction of static content and dynamic behaviour from e-commerce websites | |
Romero-Frías | Googling companies-a webometric approach to business studies | |
Bhujbal et al. | News aggregation using web scraping news portals | |
Saikumar et al. | A Lite-SVM Based Semantic Search Model for Bigdata Analytics in Smart Cities | |
CN111723273A (en) | Smart cloud retrieval system and method | |
KR101208964B1 (en) | Method for providing data of user intention based on ontology and server | |
Wang et al. | Crawling ranked deep web data sources | |
CN103631779A (en) | Word recommending system based on socialized dictionary | |
KR102041915B1 (en) | Database module using artificial intelligence, economic data providing system and method using the same | |
KR20210037488A (en) | Big Data Analytics-Based Advertising Marketing System | |
CN105912584B (en) | Data indexing system based on webpage information data | |
Khurana et al. | Survey of techniques for deep web source selection and surfacing the hidden web content |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150401 |
|
RJ01 | Rejection of invention patent application after publication |