CN106933864A - A kind of search engine system and its searching method - Google Patents

A kind of search engine system and its searching method Download PDF

Info

Publication number
CN106933864A
CN106933864A CN201511023304.0A CN201511023304A CN106933864A CN 106933864 A CN106933864 A CN 106933864A CN 201511023304 A CN201511023304 A CN 201511023304A CN 106933864 A CN106933864 A CN 106933864A
Authority
CN
China
Prior art keywords
data
comment
search
unit
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201511023304.0A
Other languages
Chinese (zh)
Inventor
李栋
李栋一
赵鹤
姜青山
陈会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201511023304.0A priority Critical patent/CN106933864A/en
Publication of CN106933864A publication Critical patent/CN106933864A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of search engine system and its searching method.The search engine system includes webcrawler module, DBM, sentiment analysis module, background server and client;The webcrawler module is used to capture the web data of search target, and the web data that will be grabbed is transmitted to DBM and stored;The sentiment analysis module is used to read the corresponding user comment data of search target from the web data of DBM storage, user comment data are carried out to generate comment summary after sentiment analysis treatment, and comment summary is transmitted to DBM and stored;The client is connected with background server, and for sending searching request to background server, the background server accesses DBM according to searching request, and obtains corresponding web data and return to client after commenting on summary.Retrieval result of the present invention is more accurate, more hommization.

Description

A kind of search engine system and its searching method
Technical field
The invention belongs to search engine technique field, more particularly to a kind of search engine system and its searching method.
Background technology
Since 21st century, Internet technology has obtained development by leaps and bounds, and people obtain the letter of magnanimity by internet Breath.In face of a large amount of contents that internet is provided, meaningful valuable information how is rapidly and accurately picked out, be information-based society The severe problem that can be faced.With the popularization of the mobile communication equipments such as smart mobile phone, panel computer and greatly improving for performance, with And the quick popularization of 3G/4G mobile networks and wlan network, the search engine transfer from computer to mobile device gradually, move Dynamic search has come into the visual field of masses as a kind of new business.
Mobile search refer to user in the mobile communication network, by mobile terminal, using various spies such as SMS, WAP, IVR Determine the search behavior that way of search obtains information needed.Under the background of Modern Information, mobile search engine is mutual with traditional Networking search engine is compared, with following four unique features:
(1) convenience of search
Compared with internet hunt, mobile search has the bigger free degree, is truly realized and searches at any time, everywhere. In our real life, many users can't carry with computer or possess internet, and mobile search technology The mobile phone for only needing to a connection network is capable of achieving, and user is not limited by time, place, obtains think whenever and wherever possible The information wanted.
(2) accuracy of search
Smaller in view of mobile phone terminal screen, the features such as network insertion speed is slower, Mobile searching engine system needs to carry The supply more accurate information of user, therefore mobile search technology more focuses on the ageing of the parsimony that uses and inquiry, On the other hand, mobile search also needs possess stronger natural language analysis ability, more accurate vertical so as to provide the user Search Results.
(3) personalized service
Mobile searching engine system can be personal to search custom, search purpose of user etc. partially by data mining technology It is analyzed well, the function of search of demands of individuals is more conformed to so as to provide the user.Outside this, Mobile searching engine system Can be combined with positioning service technology, provide the user more targeted information.
(4) user terminal enormous amount
Mobile search possesses huge customer group, and the quantity of mobile terminal greatly exceed Internet user terminal. Issued according to " Yi Guan think tanks "《China Mobile Internet user behavior statistical report 2015》Middle data display:2014, in State mobile Internet userbase about 7.29 hundred million, China Mobile Internet userbase in recent years is as shown in figure 1, be 2009- China Mobile Internet userbase figure in 2014.
With the rapid growth of mobile interchange network users, mobile search has become more and more popular, but simply will be mutual It is far from being enough that online general use search engine is transplanted to the mobile terminals such as mobile phone.It is present for universal search engine During search engine is mainly all or part of content of webpage downloaded into self-built index database by Robot, due to universal search The substantial amounts of engine retrieval result, the page many of download is garbage or temporary information, the institute with Keywords matching Having information can all return to user, wherein also contains substantial amounts of duplicate message, user needs arduously to be sought in the information for returning Real desired information is looked for, retrieval rate is not only influenceed, user search burden is also add.Meanwhile, universal search engine retrieval The precision of result is not high, and the information comprising search keyword of any field or any theme can all return to user, so that Cause the theme diversity of return information, but it is often wherein some field or some master that user is of concern Topic, other information are unworthy;In addition, the form of the typically no fixation of object information that universal search engine is returned, letter The diversity for ceasing form can make troubles to user.
In sum, the shortcoming of existing search engine is mainly manifested in:
(1) search engine retrieving mode is single
Search engine retrieving is typically all that by the way of keyword retrieval, but in many cases, user is difficult with simple Keyword or keyword between assemble, to give expression to the information content of real needs exactly, cause because expressing difficulty Retrieval difficulty or the result for retrieving are inaccurate.
(2) search engine is on a declining curve on the whole to the coverage rate of the network information
Sharply increasing for the network information, makes the comprehensive search to cover all subjects, all types information as objective draw Hold up and be increasingly difficult to deal with, even being known as upgrading of the function search engine the most powerful in network information search and machining software Also the growth rate of the network information cannot be kept up with exploitation.
(3) search engine functionality of the specific area such as wedding celebration commodity is simple
For example, there are many wedding celebration electric business search engine systems based on " the rich meeting of China's wedding " in wedding celebration commodity subject fields System, but these search engine system functions are simple, most of only to show with the details of commodity comprising some wedding celebration shops, user The valuable information of tool that can therefrom obtain is extremely limited.
The content of the invention
The invention provides a kind of search engine system and its searching method, it is intended at least solve to a certain extent existing Above-mentioned technical problem in technology.
Implementation of the present invention is as follows, a kind of search engine system, including webcrawler module, DBM, emotion Analysis module and background server;The webcrawler module is used to capture the web data of search target, and will grab Web data is transmitted to DBM and stored;The sentiment analysis module is used for the webpage number from DBM storage The corresponding user comment data of target are searched for according to middle reading, generation comment is plucked after user comment data are carried out with sentiment analysis treatment Will, and comment summary is transmitted to DBM and stored;The client is connected with background server, for backstage Server sends searching request, and the background server accesses DBM according to searching request, and obtains corresponding webpage Client is returned to after data and comment summary.
The technical scheme that the embodiment of the present invention is taken also includes:The webcrawler module includes seed setting unit, spy Levy extraction unit, list judging unit and data resolution unit;
The seed setting unit is used to set the sub-pages address of web crawlers, and sub-pages address is added to In " url list to be captured ";
The feature extraction unit is used to extract the feature of sub-pages address, and the sub-pages address feature that will be extracted Storage is in " having downloaded URL feature sets ";
The list judging unit is used to judge whether " url list to be captured " is empty:If " url list to be captured " is no It is sky, then the sub-pages address in " url list to be captured " is parsed by data resolution unit;If " waiting to capture Url list " is sky, then webcrawler module end-of-job;
The sub-pages address that the data resolution unit is used to extract in " url list to be captured " is parsed, and will be planted The corresponding page download of subnet page address is got off, and extracts the web data of correlation, and the web data that will be extracted is stored in data In library module.
The technical scheme that the embodiment of the present invention is taken also includes:The sentiment analysis module is by sentiment analysis technology to commenting Excavated by data and analyzed;The sentiment analysis module specifically includes data capture unit, data sorting unit and data Extraction unit;
The data capture unit is used to obtain the corresponding original comment data of search target from DBM, and right Original comment data carries out the treatment of subordinate sentence, participle and part-of-speech tagging;
The data sorting unit is used to carry out subjective and objective classification to comment sentence according to annotation results, retains subjective comments Sentence, filters objective comment sentence;
The data extracting unit be used for from subjective comments sentence extract emotion word and comment described in businessman or The attribute information of commodity, according to emotion word and attribute information generation comment summary, and comment summary is transmitted to DBM Stored.
The technical scheme that the embodiment of the present invention is taken also includes:The background server includes the first controller, the second control Device processed, application service layer and data access layer;
First controller is used to receive the searching request of client transmission, and the searching request is entrusted into the second control Device processed carries out dissection process;
The second controller is used to carry out searching request dissection process, extract search key in searching request or Additional parameter, and search key or additional parameter are transmitted to application service layer carry out business logic processing;
The application service layer is used to receive the search key or additional parameter of second controller transmission, and calls data Access layer obtains web data;
The data access layer is used to access DBM, and database mould is obtained according to search key or additional parameter Web data and comment summary in block, and web data and comment summary are back to by the second control by application service layer Device, the second controller is back to client after web data and comment summary are packaged into treatment by the first controller End.
The technical scheme that the embodiment of the present invention is taken also includes:The client includes search unit, data receipt unit With route planning unit;
The search unit is used to be encapsulated in the search key or accessory parameters of user input in HTTP request to be sent out Give background server;
The data receipt unit is used to receive the web data and comment summary of background server return, and to receiving number After according to dissection process, analysis result is shown to user;
The route planning unit is used to be returned according to the search target or background server of user by location-based service technology The Business Information for returning obtains merchant location, and obtains user current location, is that user carries out path planning.
Another technical scheme that the embodiment of the present invention is taken is:A kind of searching method of search engine, comprises the following steps:
Step a:The web data of crawl search target, and the web data storage that will be captured is in database;
Step b:Read web data in the corresponding user comment data of the search target, by sentiment analysis technology to Family comment data generates comment summary after carrying out sentiment analysis treatment, and by comment summary storage in database;
Step c:Search key and additional parameter are input into by client, and searching request is sent to background server;
Step d:Searching request is received by background server, corresponding webpage is obtained in database according to searching request Data and comment summary, and the web data of acquisition and comment summary are back to client.
The technical scheme that the embodiment of the present invention is taken also includes:In the step a, the webpage of the crawl search target The method of data is comprised the following steps:
Step a1:The sub-pages address of web crawlers is set, and sub-pages address is added to " URL column to be captured In table ";
Step a2:The feature of sub-pages address is extracted, and the sub-pages address characteristic storage extracted " is being downloaded In URL feature sets ";
Step a3:Judge whether " url list to be captured " is empty, if " url list to be captured " is not sky, extraction " is treated Sub-pages address in crawl url list " is parsed, and the corresponding page download in sub-pages address is got off, and extract Related web data, the web data that will be extracted is stored in database;If " url list to be captured " is sky, network Reptile power cut-off.
The technical scheme that the embodiment of the present invention is taken also includes:It is described that user comment data are carried out in the step b The method of sentiment analysis is comprised the following steps:
Step b1:The corresponding original comment data of search target is obtained from database, and original comment data is carried out The treatment of subordinate sentence, participle and part-of-speech tagging;
Step b2:Subjective and objective classification is carried out to comment sentence according to annotation results, retains subjective comments sentence, it is objective to filter Comment sentence;
Step b3:The attribute letter of emotion word and the businessman described in comment or commodity is extracted from subjective comments sentence Breath, comment summary according to emotion word and attribute information generation comment summary, and transmitted to database and stored.
The technical scheme that the embodiment of the present invention is taken also includes:In the step d, the background server treatment search The method of request specifically includes following steps:
Step d1:The searching request that client sends is received by the first controller, and the searching request is entrusted to the Two controllers carry out dissection process;
Step d2:Dissection process is carried out to searching request, search key or additional parameter in searching request is extracted, and Search key or additional parameter are transmitted to application service layer carries out business logic processing;
Step d3:The search key or additional parameter of second controller transmission are received, and calls data access layer to obtain Web data;
Step d4:The web data and comment summary in database are obtained according to search key or additional parameter, and will Web data and comment summary are back to client after being packaged treatment by the first controller.
The technical scheme that the embodiment of the present invention is taken also includes:It is further comprising the steps of after the step d:
Step d5:Web data and comment summary that background server is returned are received by client, and to receiving data User is shown to after dissection process;
Step d6:Obtained according to the Business Information that the search target or background server of user are returned by location-based service technology Merchant location is taken, and obtains user current location, be that user carries out path planning.
The search engine system and its searching method of the embodiment of the present invention dig by the comment information for searching for target Pick and analysis, and sentiment analysis result is shown to consumer, consumer will be seen that the popular Sentiment orientation to certain businessman, from And optimize the purchase decision of oneself;Meanwhile, retailer will be seen that consumer to its commodity and the feedback information of service, and disappear Evaluation of the person of expense to oneself and to rival, so that improving product improves service, it is itself to win competitive advantage, no matter offset Expense person or retailer suffer from very positive meaning.Because the present invention is integrated with sentiment analysis technology, with common search Engine is compared, and retrieval result is more accurate, more hommization;And position clothes are integrated with the basis of mobile search engine Business technology so that consumer can more easily inquire about the route that businessman is gone in current location, greatly save consumer's time. And the present invention meets the trend that contemporary mobile Internet is developed rapidly, user's search need anywhere or anytime is met.
Brief description of the drawings
Fig. 1 is 2009-2014 China Mobile Internet userbase figures;
Fig. 2 is the structural representation of the search engine system of the embodiment of the present invention;
Fig. 3 is the flow chart of the searching method of the search engine of the embodiment of the present invention;
Fig. 4 is the flow chart of the method for the crawl web data of the embodiment of the present invention;
Fig. 5 is the flow chart of the method that user comment data are carried out with sentiment analysis of the embodiment of the present invention;
Fig. 6 is the flow chart of the method for the background server treatment searching request of the embodiment of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
It is the structural representation of the search engine system of the embodiment of the present invention also referring to Fig. 2.The embodiment of the present invention Search engine system includes webcrawler module, DBM, sentiment analysis module, background server and client;Network Reptile module is used to capture the web data of search target, and the web data that will be grabbed is transmitted to DBM;Data Library module is used to store the web data of crawl;Sentiment analysis module is searched for reading this from the web data of DBM The corresponding user comment data of rope target, generate comment summary, and will comment after user comment data are carried out with sentiment analysis treatment Transmitted to DBM by summary and stored;Client is connected with background server, is searched for being sent to background server Rope ask, background server according to searching request access DBM, and obtain corresponding web data and comment summary after Client is returned to be shown.Wherein, the client mobile terminal such as including Android client or ios clients.
Specifically:Webcrawler module includes seed setting unit, feature extraction unit, list judging unit and data solution Analysis unit;
For setting web crawlers, (web crawlers is a kind of program of automatic acquisition web page contents to seed setting unit, is The important component of search engine) seed URL (sub-pages address), and URL is added to " url list to be captured " In;
Feature extraction unit is used to extract the feature of URL, and safeguards one " having downloaded URL feature sets ", the URL that will be extracted Characteristic storage is in " having downloaded URL feature sets ";
List judging unit is used to judge whether " url list to be captured " is empty:If " url list to be captured " is not Sky, then parsed by data resolution unit to the URL in " url list to be captured ";If " url list to be captured " is Sky, represents that data resolution units whole parsings of all of URL will be finished in " url list capture ", then web crawlers work Work terminates;
The URL that data resolution unit is used to extract in " url list to be captured " is parsed, by under the corresponding webpages of URL Load is got off, and extracts the web data of correlation;Wherein, data resolution unit is specially to the processing mode of web data:Parsing Current web page content, using regular expression (Regular Expression, in code often be abbreviated as regex, regexp or RE, a concept of computer science;Regular expression described using single character string, match and a series of meet certain syntax The character string of rule) effective URL in webpage is extracted, and extract the feature of the URL, by the URL features with " downloaded URL features in URL feature sets " are matched, and judge that the URL whether there is, and " are treated if there is no then the URL is added to In crawl url list ", otherwise by the url filtering, so as to the URL for avoiding addition from repeating;And extract current using regular expression Web data in webpage, the web data that will be extracted is stored in DBM;Wherein, the web data of extraction includes Search for merchandise news, Business Information and comment on commodity information of target etc..
DBM sets up index database according to the theme of search target, and the web data storage that will be extracted is in correspondence theme Index database under.
Sentiment analysis module is excavated and analyzed by sentiment analysis technology to comment data;Sentiment analysis technology (sentiment analysis), excavates or opinion mining (opinion mining) also known as comment, refers to by automatic point The content of text of certain comment on commodity is analysed, it is found that consumer passes judgement on attitude and opinion to the commodity.DoubleClick Inc were once Through having carried out one for U.S.'s rag trade, computer hardware equipment industry, sport and fitness product industry and tourist industry network visitor The research at family, discovery has the nearly consumer of more than half meeting before purchase decision is done that the introduction of dependent merchandise is searched on network The comment information of information and other consumers to commodity.As can be seen here, internet comment on commodity is in consumer's purchasing process Very big effect is played, has important influence for the purchase decision of consumer.Therefore, market are entered to commodity online comment Sense analysis has become more and more important.
Specifically, sentiment analysis module includes data capture unit, data sorting unit and data extracting unit;
Data capture unit is used to obtain the corresponding original comment data of search target from DBM, and to original Comment data carries out the treatment such as subordinate sentence, participle and part-of-speech tagging;
Data sorting unit is used to carry out subjective and objective classification to comment sentence according to annotation results, retains subjective comments sentence Son, filters objective comment sentence;
Data extracting unit is used to be extracted from subjective comments sentence emotion word with the businessman's (commodity) described in comment Attribute information, according to emotion word and businessman's attribute information generation comment summary, and comment summary is transmitted to DBM Row storage.Specifically, the present invention is by counting the word frequency of the positive emotion word and negative emotion word occurred in comment data and weighing Weight, and according to the positive and negative attribute of the overall emotion weighted value of the comment data come to judge the comment data be front comment or negative Comment in face;Wherein, positive emotion word weighted value is positive number, and negative emotion word weighted value is negative;For example, for wedding celebration businessman Or the comment data of commodity is when carrying out emotion abstract extraction, in screening comment data according to existing wedding celebration theme dictionary first The all emotions summary combination for occurring, determines that final accurately emotion is plucked by matching existing syntactic pattern afterwards Will.
Background server is developed using MVC pattern, due to not including the exploitation of the web front end page in the present invention, because View (View) layer is not included in this background server, background server directly carries out data interaction with client;Background service Implement body includes the first controller, second controller, application service layer and data access layer;
First controller is used to receive the searching request of client transmission, and the searching request is entrusted into second controller Carry out dissection process;
Second controller is used to carry out searching request dissection process, extracts the search key or additional in searching request Parameter, and search key or additional parameter are transmitted to application service layer carry out business logic processing;Wherein, the additional ginseng Number is including emotion condition etc..
Application service layer is used to receive the search key or additional parameter of second controller transmission, and calls data access Layer obtains web data;
Data access layer is used to access DBM, according in search key or additional parameter acquisition DBM Web data and comment summary, to obtain web data and comment summary be added, delete, change with inquiry etc. grasp Make, and web data and comment summary are back to by second controller by application service layer, second controller is by web data And comment summary is packaged after processing and is back to client by the first controller.
Client specifically includes search unit, data receipt unit and route planning unit;
Search unit is used to for the search key of user input or other accessory parameters to be encapsulated in HTTP (hypertext biographies Defeated agreement, HyperText Transfer Protocol) request in be sent to background server;Wherein, the accessory parameters bag Include the additional conditions such as Sentiment orientation;
Data receipt unit is used to receive the web data and comment summary of background server return, and to receiving data solution After analysis treatment, analysis result is shown to user;Wherein, the present invention using comment summary sentiment analysis result, and by emotion Analysis result is shown to user so that consumer will be seen that masses to certain businessman or the main Sentiment orientation of commodity, so that excellent Change the purchase decision of oneself;On the other hand, businessman also is understood that consumer to its commodity and the feedback information of service, and disappears Evaluation of the person of expense to oneself and to rival, so that improving product improves service, it is that itself wins competitive advantage.
Route planning unit is used for what is returned according to the search target or background server of user by location-based service technology Business Information obtains merchant location, and obtains user current location, is that user carries out path planning;Wherein, the path planning Mode includes walking path planning, public transport path planning and path planning etc. of driving;With mobile positioning technique, radio communication network Network, GIS-Geographic Information System, Internet technology are developed rapidly, based on location-based service (Location-Based Services, LBS application) is also comparatively fast developed.Location-based information service is a kind of increment industry provided according to user position Business, it is main that user's present position is obtained by mobile positioning technique, under the support of electronic map and business platform, there is provided Give customer location related information service, its maximum feature is under the time, place and environment that user needs, to provide the user The information associated with position, so that be more close to the users demand and usage scenario.
Fig. 3 is referred to, is the flow chart of the searching method of the search engine of the embodiment of the present invention.The embodiment of the present invention is searched The searching method that index is held up is comprised the following steps:
Step 100:The web data of search target is captured by web crawlers, and the web data that will be captured is stored in number According in storehouse;
It is the method for the crawl web data of the embodiment of the present invention also referring to Fig. 4 for clear explanation step 100 Flow chart.The method of the crawl web data of the embodiment of the present invention is comprised the following steps:
Step 101:The seed URL of web crawlers is set, and URL is added in " url list to be captured ";
Step 102:The feature of URL is extracted, and safeguards one " having downloaded URL feature sets ", the URL characteristic storages that will be extracted In " having downloaded URL feature sets ";
Step 103:Judge whether " url list to be captured " is empty:If " url list to be captured " is not sky, step is performed Rapid 104;If " url list to be captured " is sky, step 105 is performed;
Step 104:The URL extracted in " url list to be captured " is parsed, and the corresponding page downloads of URL are got off, and Related web data is extracted, the web data that will be extracted is stored in database;
At step 104, the processing mode to web data is specially:Parsing current web page content, using regular expressions Formula extracts effective URL in webpage, and extracts the feature of the URL, by the URL features and " having downloaded URL feature sets " In URL features matched, judge that the URL whether there is, if there is no the URL then is added into " URL column to be captured In table ", otherwise by the url filtering, so as to the URL for avoiding addition from repeating;And using in regular expression extraction current web page Web data, the web data that will be extracted is stored in database;The web data of extraction includes the commodity letter of search target Breath, Business Information and comment on commodity information etc..
Step 105:Web crawlers end-of-job.
Step 200:The corresponding user comment data of the search target in web data are read, by sentiment analysis technology pair User comment data generate comment summary after carrying out sentiment analysis treatment, and by comment summary storage in database;
It is the embodiment of the present invention user comment data are carried out also referring to Fig. 5 for clear explanation step 200 The flow chart of the method for sentiment analysis.The method that user comment data are carried out with sentiment analysis of the embodiment of the present invention includes following Step:
Step 201:The corresponding original comment data of search target is obtained from database, and original comment data is carried out The treatment such as subordinate sentence, participle and part-of-speech tagging;
Step 202:Subjective and objective classification is carried out to comment sentence according to annotation results, retains subjective comments sentence, filtering visitor See comment sentence;
Step 203:Emotion word and businessman (commodity) attribute information described in comment are extracted from subjective comments sentence, According to emotion word and businessman attribute information generation comment summary, and comment summary is transmitted to database and stored.
Step 300:Index database is set up according to the theme of search target by database root, by the web data for extracting and comment Summary storage is under the index database of correspondence theme;
Step 400:Search key and other additional parameters are input into by client, and search is sent to background server Request;
In step 400, the accessory parameters additional conditions such as including Sentiment orientation.
Step 500:Searching request is received by background server, corresponding net is obtained in database according to searching request Page data and comment summary, and the web data of acquisition and comment summary are back to client;
It is the background server treatment search of the embodiment of the present invention also referring to Fig. 6 for clear explanation step 500 The flow chart of the method for request.The method of the background server treatment searching request of the embodiment of the present invention is comprised the following steps:
Step 501:The searching request that client sends is received by the first controller, and the searching request is entrusted to the Two controllers carry out dissection process;
Step 502:Dissection process is carried out to searching request, search key or additional parameter in searching request is extracted, And search key or additional parameter are transmitted to application service layer carry out business logic processing;
Step 503:The search key or additional parameter of second controller transmission are received, and calls data access layer to obtain Web data;
Step 504:The web data and comment summary in database are obtained according to search key or additional parameter, to obtaining The web data that takes and comment summary be added, delete, change with the operation such as inquiry, and by web data and commenting on make a summary into Client is back to by the first controller after row encapsulation process.
Step 600:Web data and comment summary that background server is returned are received by client, and to receiving data User is shown to after dissection process;
Step 700:The Business Information returned according to the search target or background server of user by location-based service technology Merchant location is obtained, and obtains user current location, be that user carries out path planning;
In step 700, the path planning mode includes walking path planning, public transport path planning and path rule of driving Draw etc..
Present invention can apply to multiple fields or the internet search engine of theme, such as wedding celebration electric business search engine or number Code product search engine etc.;By taking wedding celebration electric business search engine as an example, with " meeting is won in China's wedding " and other wedding celebration websites as data are come Source, businessman or merchandise news in crawl website, and carries out sentiment analysis to the comment data in website, excavate it is popular to businessman or The Sentiment orientation of commodity, provides the user reference value, it is ensured that user can get when using wedding celebration subject search function Accurate Search Results, for contemporary people's marriage celebration provides highly effective help.
The search engine system and its searching method of the embodiment of the present invention dig by the comment information for searching for target Pick and analysis, and sentiment analysis result is shown to consumer, consumer will be seen that the popular Sentiment orientation to certain businessman, from And optimize the purchase decision of oneself;Meanwhile, retailer will be seen that consumer to its commodity and the feedback information of service, and disappear Evaluation of the person of expense to oneself and to rival, so that improving product improves service, it is itself to win competitive advantage, no matter offset Expense person or retailer suffer from very positive meaning.Because the present invention is integrated with sentiment analysis technology, with common search Engine is compared, and retrieval result is more accurate, more hommization;And it is integrated with location-based service on the basis of mobile search engine Technology so that consumer can more easily inquire about the route that businessman is gone in current location, greatly save consumer's time.And The present invention meets the trend that contemporary mobile Internet is developed rapidly, meets user's search need anywhere or anytime.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.

Claims (10)

1. a kind of search engine system, it is characterised in that including webcrawler module, DBM, sentiment analysis module, after Platform server and client;The webcrawler module is used to capture the web data of search target, and the webpage that will be grabbed Data transfer to DBM is stored;The sentiment analysis module is used for from the web data of DBM storage The corresponding user comment data of search target are read, comment summary are generated after user comment data are carried out with sentiment analysis treatment, And comment summary is transmitted to DBM and stored;The client is connected with background server, for being taken to backstage Business device sends searching request, and the background server accesses DBM according to searching request, and obtains corresponding webpage number According to and comment summary after return to client.
2. search engine system according to claim 1, it is characterised in that the webcrawler module includes that seed is set Unit, feature extraction unit, list judging unit and data resolution unit;
The seed setting unit " is treated for setting the sub-pages address of web crawlers, and sub-pages address being added to In crawl url list ";
The feature extraction unit is used to extract the feature of sub-pages address, and the sub-pages address characteristic storage that will be extracted In " having downloaded URL feature sets ";
The list judging unit is used to judge whether " url list to be captured " is empty:If " url list to be captured " is not Sky, then parsed by data resolution unit to the sub-pages address in " url list to be captured ";If " URL to be captured List " is sky, then webcrawler module end-of-job;
The sub-pages address that the data resolution unit is used to extract in " url list to be captured " is parsed, and will plant subnet The corresponding page download of page address is got off, and extracts the web data of correlation, and the web data that will be extracted is stored in database mould In block.
3. search engine system according to claim 1, it is characterised in that the sentiment analysis module passes through sentiment analysis Technology is excavated and analyzed to comment data;The sentiment analysis module specifically includes data capture unit, data grouping sheet Unit and data extracting unit;
The data capture unit is used to obtain the corresponding original comment data of search target from DBM, and to original Comment data carries out the treatment of subordinate sentence, participle and part-of-speech tagging;
The data sorting unit is used to carry out subjective and objective classification to comment sentence according to annotation results, retains subjective comments sentence Son, filters objective comment sentence;
The data extracting unit is used to be extracted from subjective comments sentence emotion word with the businessman described in comment or commodity Attribute information, according to emotion word and attribute information generation comment summary, and comment summary is transmitted to DBM and carried out Storage.
4. search engine system according to claim 3, it is characterised in that the background server includes the first control Device, second controller, application service layer and data access layer;
First controller is used to receive the searching request of client transmission, and the searching request is entrusted into second controller Carry out dissection process;
The second controller is used to carry out searching request dissection process, extracts the search key or additional in searching request Parameter, and search key or additional parameter are transmitted to application service layer carry out business logic processing;
The application service layer is used to receive the search key or additional parameter of second controller transmission, and calls data access Layer obtains web data;
The data access layer is used to access DBM, according in search key or additional parameter acquisition DBM Web data and comment summary, and by application service layer by web data and comment summary be back to second controller, institute State after web data and comment summary are packaged treatment by second controller and client is back to by the first controller.
5. search engine system according to claim 4, it is characterised in that the client includes search unit, data Receiving unit and route planning unit;
The search unit is used to be encapsulated in the search key or accessory parameters of user input in HTTP request to be sent to Background server;
The data receipt unit is used to receive the web data and comment summary of background server return, and to receiving data solution After analysis treatment, analysis result is shown to user;
The route planning unit is used for what is returned according to the search target or background server of user by location-based service technology Business Information obtains merchant location, and obtains user current location, is that user carries out path planning.
6. a kind of searching method of search engine, comprises the following steps:
Step a:The web data of crawl search target, and the web data storage that will be captured is in database;
Step b:The corresponding user comment data of the search target in web data are read, user is commented by sentiment analysis technology Comment summary is generated after carrying out sentiment analysis treatment by data, and by comment summary storage in database;
Step c:Search key and additional parameter are input into by client, and searching request is sent to background server;
Step d:Searching request is received by background server, corresponding web data is obtained in database according to searching request And comment summary, and the web data of acquisition and comment summary are back to client.
7. the searching method of search engine according to claim 6, it is characterised in that in the step a, the crawl The method for searching for the web data of target is comprised the following steps:
Step a1:The sub-pages address of web crawlers is set, and sub-pages address is added to " url list to be captured " In;
Step a2:The feature of sub-pages address is extracted, and the sub-pages address characteristic storage extracted " is being downloaded into URL In feature set ";
Step a3:Judge whether " url list to be captured " is empty, if " url list to be captured " is not sky, extraction " is waited to capture Sub-pages address in url list " is parsed, and the corresponding page download in sub-pages address is got off, and extract correlation Web data, the web data that will extract stored in database;If " url list to be captured " is sky, web crawlers Power cut-off.
8. the searching method of search engine according to claim 7, it is characterised in that in the step b, described pair with The method that family comment data carries out sentiment analysis is comprised the following steps:
Step b1:Obtain the search corresponding original comment data of target from database, and original comment data is carried out subordinate sentence, Participle and part-of-speech tagging are processed;
Step b2:Subjective and objective classification is carried out to comment sentence according to annotation results, retains subjective comments sentence, filter objective comment Sentence;
Step b3:Emotion word is extracted from subjective comments sentence with the businessman described in comment or the attribute information of commodity, root According to emotion word and attribute information generation comment summary, and comment summary is transmitted to database and stored.
9. the searching method of search engine according to claim 8, it is characterised in that in the step d, the backstage The method of server process searching request specifically includes following steps:
Step d1:The searching request that client sends is received by the first controller, and the searching request is entrusted into the second control Device processed carries out dissection process;
Step d2:Dissection process is carried out to searching request, search key or additional parameter in searching request is extracted, and will search Rope keyword or additional parameter are transmitted to application service layer and carry out business logic processing;
Step d3:The search key or additional parameter of second controller transmission are received, and calls data access layer to obtain webpage Data;
Step d4:Obtain the web data and comment summary in database according to search key or additional parameter, and by webpage Data and comment summary are back to client after being packaged treatment by the first controller.
10. the searching method of search engine according to claim 9, it is characterised in that also include after the step d following Step:
Step d5:Web data and comment summary that background server is returned are received by client, and to receiving data parsing User is shown to after treatment;
Step d6:The Business Information returned according to the search target or background server of user by location-based service technology obtains business Family position, and user current location is obtained, it is that user carries out path planning.
CN201511023304.0A 2015-12-30 2015-12-30 A kind of search engine system and its searching method Pending CN106933864A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511023304.0A CN106933864A (en) 2015-12-30 2015-12-30 A kind of search engine system and its searching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511023304.0A CN106933864A (en) 2015-12-30 2015-12-30 A kind of search engine system and its searching method

Publications (1)

Publication Number Publication Date
CN106933864A true CN106933864A (en) 2017-07-07

Family

ID=59441819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511023304.0A Pending CN106933864A (en) 2015-12-30 2015-12-30 A kind of search engine system and its searching method

Country Status (1)

Country Link
CN (1) CN106933864A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423072A (en) * 2017-08-11 2017-12-01 郑州云海信息技术有限公司 The method and apparatus of web page dynamic column filter and search list data
CN107463630A (en) * 2017-07-14 2017-12-12 太仓诚泽网络科技有限公司 Multiterminal webpage control system
CN108197106A (en) * 2017-12-29 2018-06-22 深圳市中易科技有限责任公司 A kind of product competition analysis method based on deep learning, apparatus and system
WO2021093821A1 (en) * 2019-11-14 2021-05-20 中兴通讯股份有限公司 Intelligent assistant evaluation and recommendation methods, system, terminal, and readable storage medium
CN113553490A (en) * 2021-08-11 2021-10-26 长沙学院 Data management platform and data management method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101667194A (en) * 2009-09-29 2010-03-10 北京大学 Automatic abstracting method and system based on user comment text feature
CN102509229A (en) * 2011-09-29 2012-06-20 四川长虹电器股份有限公司 Group purchase system based on position service and group purchase service realizing method
CN103123633A (en) * 2011-11-21 2013-05-29 阿里巴巴集团控股有限公司 Generation method of evaluation parameters and information searching method based on evaluation parameters
US20130275043A1 (en) * 2012-04-12 2013-10-17 Mitac Research (Shanghai) Ltd. Location-Based Service System and Wishing Service Method Thereof
CN103823893A (en) * 2014-03-11 2014-05-28 北京大学 User comment-based product search method and system
US20140317089A1 (en) * 2013-04-18 2014-10-23 International Business Machines Corporation Context aware dynamic sentiment analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101667194A (en) * 2009-09-29 2010-03-10 北京大学 Automatic abstracting method and system based on user comment text feature
CN102509229A (en) * 2011-09-29 2012-06-20 四川长虹电器股份有限公司 Group purchase system based on position service and group purchase service realizing method
CN103123633A (en) * 2011-11-21 2013-05-29 阿里巴巴集团控股有限公司 Generation method of evaluation parameters and information searching method based on evaluation parameters
US20130275043A1 (en) * 2012-04-12 2013-10-17 Mitac Research (Shanghai) Ltd. Location-Based Service System and Wishing Service Method Thereof
US20140317089A1 (en) * 2013-04-18 2014-10-23 International Business Machines Corporation Context aware dynamic sentiment analysis
CN103823893A (en) * 2014-03-11 2014-05-28 北京大学 User comment-based product search method and system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463630A (en) * 2017-07-14 2017-12-12 太仓诚泽网络科技有限公司 Multiterminal webpage control system
CN107423072A (en) * 2017-08-11 2017-12-01 郑州云海信息技术有限公司 The method and apparatus of web page dynamic column filter and search list data
CN107423072B (en) * 2017-08-11 2020-10-30 苏州浪潮智能科技有限公司 Method and device for screening dynamic columns of web page and searching table data
CN108197106A (en) * 2017-12-29 2018-06-22 深圳市中易科技有限责任公司 A kind of product competition analysis method based on deep learning, apparatus and system
CN108197106B (en) * 2017-12-29 2021-07-13 深圳市中易科技有限责任公司 Product competition analysis method, device and system based on deep learning
WO2021093821A1 (en) * 2019-11-14 2021-05-20 中兴通讯股份有限公司 Intelligent assistant evaluation and recommendation methods, system, terminal, and readable storage medium
CN113553490A (en) * 2021-08-11 2021-10-26 长沙学院 Data management platform and data management method

Similar Documents

Publication Publication Date Title
CN103577416B (en) Expanding query method and system
CN103218431B (en) A kind ofly can identify the system that info web gathers automatically
CN103902535B (en) Obtain the method, apparatus and system of associational word
CN101901241B (en) Index generating system, information retrieval system, and index generating method
CN103631794B (en) A kind of method, apparatus and equipment for being ranked up to search result
CN106933864A (en) A kind of search engine system and its searching method
CN103714119B (en) A kind for the treatment of method and apparatus of browser data
CN101000623A (en) Method for image identification search by mobile phone photographing and device using the method
WO2011063035A1 (en) A method and system to contextualize information being displayed to a user
CN103076892A (en) Method and equipment for providing input candidate items corresponding to input character string
CN101464897A (en) Word matching and information query method and device
CN103455524A (en) Method and device for displaying and acquiring entry information
CN102708174A (en) Method and device for displaying rich media information in browser
CN103210387B (en) Conjunctive word calling mechanism, information processor, conjunctive word register method and conjunctive word register system
CN103150663A (en) Method and device for placing network placement data
CN103530339A (en) Mobile application information push method and device
CN106709073A (en) Browser notification pushing method and browser terminal
TW201401088A (en) Search method and apparatus
CN103034680A (en) Data interaction method and device for terminal device
CN107491465A (en) For searching for the method and apparatus and data handling system of content
CN103338260A (en) Distributed analytical system and analytical method for URL logs in network auditing
CN108027820A (en) For producing phrase blacklist to prevent some contents from appearing in the method and system in search result in response to search inquiry
CN107463592A (en) For by the method, equipment and data handling system of content item and images match
CN110245289A (en) A kind of information search method and relevant device
CN101959178A (en) Method and equipment for identifying terminal attribute of wireless terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170707