CN106933864A - A kind of search engine system and its searching method - Google Patents
A kind of search engine system and its searching method Download PDFInfo
- Publication number
- CN106933864A CN106933864A CN201511023304.0A CN201511023304A CN106933864A CN 106933864 A CN106933864 A CN 106933864A CN 201511023304 A CN201511023304 A CN 201511023304A CN 106933864 A CN106933864 A CN 106933864A
- Authority
- CN
- China
- Prior art keywords
- data
- comment
- search
- unit
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of search engine system and its searching method.The search engine system includes webcrawler module, DBM, sentiment analysis module, background server and client;The webcrawler module is used to capture the web data of search target, and the web data that will be grabbed is transmitted to DBM and stored;The sentiment analysis module is used to read the corresponding user comment data of search target from the web data of DBM storage, user comment data are carried out to generate comment summary after sentiment analysis treatment, and comment summary is transmitted to DBM and stored;The client is connected with background server, and for sending searching request to background server, the background server accesses DBM according to searching request, and obtains corresponding web data and return to client after commenting on summary.Retrieval result of the present invention is more accurate, more hommization.
Description
Technical field
The invention belongs to search engine technique field, more particularly to a kind of search engine system and its searching method.
Background technology
Since 21st century, Internet technology has obtained development by leaps and bounds, and people obtain the letter of magnanimity by internet
Breath.In face of a large amount of contents that internet is provided, meaningful valuable information how is rapidly and accurately picked out, be information-based society
The severe problem that can be faced.With the popularization of the mobile communication equipments such as smart mobile phone, panel computer and greatly improving for performance, with
And the quick popularization of 3G/4G mobile networks and wlan network, the search engine transfer from computer to mobile device gradually, move
Dynamic search has come into the visual field of masses as a kind of new business.
Mobile search refer to user in the mobile communication network, by mobile terminal, using various spies such as SMS, WAP, IVR
Determine the search behavior that way of search obtains information needed.Under the background of Modern Information, mobile search engine is mutual with traditional
Networking search engine is compared, with following four unique features:
(1) convenience of search
Compared with internet hunt, mobile search has the bigger free degree, is truly realized and searches at any time, everywhere.
In our real life, many users can't carry with computer or possess internet, and mobile search technology
The mobile phone for only needing to a connection network is capable of achieving, and user is not limited by time, place, obtains think whenever and wherever possible
The information wanted.
(2) accuracy of search
Smaller in view of mobile phone terminal screen, the features such as network insertion speed is slower, Mobile searching engine system needs to carry
The supply more accurate information of user, therefore mobile search technology more focuses on the ageing of the parsimony that uses and inquiry,
On the other hand, mobile search also needs possess stronger natural language analysis ability, more accurate vertical so as to provide the user
Search Results.
(3) personalized service
Mobile searching engine system can be personal to search custom, search purpose of user etc. partially by data mining technology
It is analyzed well, the function of search of demands of individuals is more conformed to so as to provide the user.Outside this, Mobile searching engine system
Can be combined with positioning service technology, provide the user more targeted information.
(4) user terminal enormous amount
Mobile search possesses huge customer group, and the quantity of mobile terminal greatly exceed Internet user terminal.
Issued according to " Yi Guan think tanks "《China Mobile Internet user behavior statistical report 2015》Middle data display:2014, in
State mobile Internet userbase about 7.29 hundred million, China Mobile Internet userbase in recent years is as shown in figure 1, be 2009-
China Mobile Internet userbase figure in 2014.
With the rapid growth of mobile interchange network users, mobile search has become more and more popular, but simply will be mutual
It is far from being enough that online general use search engine is transplanted to the mobile terminals such as mobile phone.It is present for universal search engine
During search engine is mainly all or part of content of webpage downloaded into self-built index database by Robot, due to universal search
The substantial amounts of engine retrieval result, the page many of download is garbage or temporary information, the institute with Keywords matching
Having information can all return to user, wherein also contains substantial amounts of duplicate message, user needs arduously to be sought in the information for returning
Real desired information is looked for, retrieval rate is not only influenceed, user search burden is also add.Meanwhile, universal search engine retrieval
The precision of result is not high, and the information comprising search keyword of any field or any theme can all return to user, so that
Cause the theme diversity of return information, but it is often wherein some field or some master that user is of concern
Topic, other information are unworthy;In addition, the form of the typically no fixation of object information that universal search engine is returned, letter
The diversity for ceasing form can make troubles to user.
In sum, the shortcoming of existing search engine is mainly manifested in:
(1) search engine retrieving mode is single
Search engine retrieving is typically all that by the way of keyword retrieval, but in many cases, user is difficult with simple
Keyword or keyword between assemble, to give expression to the information content of real needs exactly, cause because expressing difficulty
Retrieval difficulty or the result for retrieving are inaccurate.
(2) search engine is on a declining curve on the whole to the coverage rate of the network information
Sharply increasing for the network information, makes the comprehensive search to cover all subjects, all types information as objective draw
Hold up and be increasingly difficult to deal with, even being known as upgrading of the function search engine the most powerful in network information search and machining software
Also the growth rate of the network information cannot be kept up with exploitation.
(3) search engine functionality of the specific area such as wedding celebration commodity is simple
For example, there are many wedding celebration electric business search engine systems based on " the rich meeting of China's wedding " in wedding celebration commodity subject fields
System, but these search engine system functions are simple, most of only to show with the details of commodity comprising some wedding celebration shops, user
The valuable information of tool that can therefrom obtain is extremely limited.
The content of the invention
The invention provides a kind of search engine system and its searching method, it is intended at least solve to a certain extent existing
Above-mentioned technical problem in technology.
Implementation of the present invention is as follows, a kind of search engine system, including webcrawler module, DBM, emotion
Analysis module and background server;The webcrawler module is used to capture the web data of search target, and will grab
Web data is transmitted to DBM and stored;The sentiment analysis module is used for the webpage number from DBM storage
The corresponding user comment data of target are searched for according to middle reading, generation comment is plucked after user comment data are carried out with sentiment analysis treatment
Will, and comment summary is transmitted to DBM and stored;The client is connected with background server, for backstage
Server sends searching request, and the background server accesses DBM according to searching request, and obtains corresponding webpage
Client is returned to after data and comment summary.
The technical scheme that the embodiment of the present invention is taken also includes:The webcrawler module includes seed setting unit, spy
Levy extraction unit, list judging unit and data resolution unit;
The seed setting unit is used to set the sub-pages address of web crawlers, and sub-pages address is added to
In " url list to be captured ";
The feature extraction unit is used to extract the feature of sub-pages address, and the sub-pages address feature that will be extracted
Storage is in " having downloaded URL feature sets ";
The list judging unit is used to judge whether " url list to be captured " is empty:If " url list to be captured " is no
It is sky, then the sub-pages address in " url list to be captured " is parsed by data resolution unit;If " waiting to capture
Url list " is sky, then webcrawler module end-of-job;
The sub-pages address that the data resolution unit is used to extract in " url list to be captured " is parsed, and will be planted
The corresponding page download of subnet page address is got off, and extracts the web data of correlation, and the web data that will be extracted is stored in data
In library module.
The technical scheme that the embodiment of the present invention is taken also includes:The sentiment analysis module is by sentiment analysis technology to commenting
Excavated by data and analyzed;The sentiment analysis module specifically includes data capture unit, data sorting unit and data
Extraction unit;
The data capture unit is used to obtain the corresponding original comment data of search target from DBM, and right
Original comment data carries out the treatment of subordinate sentence, participle and part-of-speech tagging;
The data sorting unit is used to carry out subjective and objective classification to comment sentence according to annotation results, retains subjective comments
Sentence, filters objective comment sentence;
The data extracting unit be used for from subjective comments sentence extract emotion word and comment described in businessman or
The attribute information of commodity, according to emotion word and attribute information generation comment summary, and comment summary is transmitted to DBM
Stored.
The technical scheme that the embodiment of the present invention is taken also includes:The background server includes the first controller, the second control
Device processed, application service layer and data access layer;
First controller is used to receive the searching request of client transmission, and the searching request is entrusted into the second control
Device processed carries out dissection process;
The second controller is used to carry out searching request dissection process, extract search key in searching request or
Additional parameter, and search key or additional parameter are transmitted to application service layer carry out business logic processing;
The application service layer is used to receive the search key or additional parameter of second controller transmission, and calls data
Access layer obtains web data;
The data access layer is used to access DBM, and database mould is obtained according to search key or additional parameter
Web data and comment summary in block, and web data and comment summary are back to by the second control by application service layer
Device, the second controller is back to client after web data and comment summary are packaged into treatment by the first controller
End.
The technical scheme that the embodiment of the present invention is taken also includes:The client includes search unit, data receipt unit
With route planning unit;
The search unit is used to be encapsulated in the search key or accessory parameters of user input in HTTP request to be sent out
Give background server;
The data receipt unit is used to receive the web data and comment summary of background server return, and to receiving number
After according to dissection process, analysis result is shown to user;
The route planning unit is used to be returned according to the search target or background server of user by location-based service technology
The Business Information for returning obtains merchant location, and obtains user current location, is that user carries out path planning.
Another technical scheme that the embodiment of the present invention is taken is:A kind of searching method of search engine, comprises the following steps:
Step a:The web data of crawl search target, and the web data storage that will be captured is in database;
Step b:Read web data in the corresponding user comment data of the search target, by sentiment analysis technology to
Family comment data generates comment summary after carrying out sentiment analysis treatment, and by comment summary storage in database;
Step c:Search key and additional parameter are input into by client, and searching request is sent to background server;
Step d:Searching request is received by background server, corresponding webpage is obtained in database according to searching request
Data and comment summary, and the web data of acquisition and comment summary are back to client.
The technical scheme that the embodiment of the present invention is taken also includes:In the step a, the webpage of the crawl search target
The method of data is comprised the following steps:
Step a1:The sub-pages address of web crawlers is set, and sub-pages address is added to " URL column to be captured
In table ";
Step a2:The feature of sub-pages address is extracted, and the sub-pages address characteristic storage extracted " is being downloaded
In URL feature sets ";
Step a3:Judge whether " url list to be captured " is empty, if " url list to be captured " is not sky, extraction " is treated
Sub-pages address in crawl url list " is parsed, and the corresponding page download in sub-pages address is got off, and extract
Related web data, the web data that will be extracted is stored in database;If " url list to be captured " is sky, network
Reptile power cut-off.
The technical scheme that the embodiment of the present invention is taken also includes:It is described that user comment data are carried out in the step b
The method of sentiment analysis is comprised the following steps:
Step b1:The corresponding original comment data of search target is obtained from database, and original comment data is carried out
The treatment of subordinate sentence, participle and part-of-speech tagging;
Step b2:Subjective and objective classification is carried out to comment sentence according to annotation results, retains subjective comments sentence, it is objective to filter
Comment sentence;
Step b3:The attribute letter of emotion word and the businessman described in comment or commodity is extracted from subjective comments sentence
Breath, comment summary according to emotion word and attribute information generation comment summary, and transmitted to database and stored.
The technical scheme that the embodiment of the present invention is taken also includes:In the step d, the background server treatment search
The method of request specifically includes following steps:
Step d1:The searching request that client sends is received by the first controller, and the searching request is entrusted to the
Two controllers carry out dissection process;
Step d2:Dissection process is carried out to searching request, search key or additional parameter in searching request is extracted, and
Search key or additional parameter are transmitted to application service layer carries out business logic processing;
Step d3:The search key or additional parameter of second controller transmission are received, and calls data access layer to obtain
Web data;
Step d4:The web data and comment summary in database are obtained according to search key or additional parameter, and will
Web data and comment summary are back to client after being packaged treatment by the first controller.
The technical scheme that the embodiment of the present invention is taken also includes:It is further comprising the steps of after the step d:
Step d5:Web data and comment summary that background server is returned are received by client, and to receiving data
User is shown to after dissection process;
Step d6:Obtained according to the Business Information that the search target or background server of user are returned by location-based service technology
Merchant location is taken, and obtains user current location, be that user carries out path planning.
The search engine system and its searching method of the embodiment of the present invention dig by the comment information for searching for target
Pick and analysis, and sentiment analysis result is shown to consumer, consumer will be seen that the popular Sentiment orientation to certain businessman, from
And optimize the purchase decision of oneself;Meanwhile, retailer will be seen that consumer to its commodity and the feedback information of service, and disappear
Evaluation of the person of expense to oneself and to rival, so that improving product improves service, it is itself to win competitive advantage, no matter offset
Expense person or retailer suffer from very positive meaning.Because the present invention is integrated with sentiment analysis technology, with common search
Engine is compared, and retrieval result is more accurate, more hommization;And position clothes are integrated with the basis of mobile search engine
Business technology so that consumer can more easily inquire about the route that businessman is gone in current location, greatly save consumer's time.
And the present invention meets the trend that contemporary mobile Internet is developed rapidly, user's search need anywhere or anytime is met.
Brief description of the drawings
Fig. 1 is 2009-2014 China Mobile Internet userbase figures;
Fig. 2 is the structural representation of the search engine system of the embodiment of the present invention;
Fig. 3 is the flow chart of the searching method of the search engine of the embodiment of the present invention;
Fig. 4 is the flow chart of the method for the crawl web data of the embodiment of the present invention;
Fig. 5 is the flow chart of the method that user comment data are carried out with sentiment analysis of the embodiment of the present invention;
Fig. 6 is the flow chart of the method for the background server treatment searching request of the embodiment of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
It is the structural representation of the search engine system of the embodiment of the present invention also referring to Fig. 2.The embodiment of the present invention
Search engine system includes webcrawler module, DBM, sentiment analysis module, background server and client;Network
Reptile module is used to capture the web data of search target, and the web data that will be grabbed is transmitted to DBM;Data
Library module is used to store the web data of crawl;Sentiment analysis module is searched for reading this from the web data of DBM
The corresponding user comment data of rope target, generate comment summary, and will comment after user comment data are carried out with sentiment analysis treatment
Transmitted to DBM by summary and stored;Client is connected with background server, is searched for being sent to background server
Rope ask, background server according to searching request access DBM, and obtain corresponding web data and comment summary after
Client is returned to be shown.Wherein, the client mobile terminal such as including Android client or ios clients.
Specifically:Webcrawler module includes seed setting unit, feature extraction unit, list judging unit and data solution
Analysis unit;
For setting web crawlers, (web crawlers is a kind of program of automatic acquisition web page contents to seed setting unit, is
The important component of search engine) seed URL (sub-pages address), and URL is added to " url list to be captured "
In;
Feature extraction unit is used to extract the feature of URL, and safeguards one " having downloaded URL feature sets ", the URL that will be extracted
Characteristic storage is in " having downloaded URL feature sets ";
List judging unit is used to judge whether " url list to be captured " is empty:If " url list to be captured " is not
Sky, then parsed by data resolution unit to the URL in " url list to be captured ";If " url list to be captured " is
Sky, represents that data resolution units whole parsings of all of URL will be finished in " url list capture ", then web crawlers work
Work terminates;
The URL that data resolution unit is used to extract in " url list to be captured " is parsed, by under the corresponding webpages of URL
Load is got off, and extracts the web data of correlation;Wherein, data resolution unit is specially to the processing mode of web data:Parsing
Current web page content, using regular expression (Regular Expression, in code often be abbreviated as regex, regexp or
RE, a concept of computer science;Regular expression described using single character string, match and a series of meet certain syntax
The character string of rule) effective URL in webpage is extracted, and extract the feature of the URL, by the URL features with " downloaded
URL features in URL feature sets " are matched, and judge that the URL whether there is, and " are treated if there is no then the URL is added to
In crawl url list ", otherwise by the url filtering, so as to the URL for avoiding addition from repeating;And extract current using regular expression
Web data in webpage, the web data that will be extracted is stored in DBM;Wherein, the web data of extraction includes
Search for merchandise news, Business Information and comment on commodity information of target etc..
DBM sets up index database according to the theme of search target, and the web data storage that will be extracted is in correspondence theme
Index database under.
Sentiment analysis module is excavated and analyzed by sentiment analysis technology to comment data;Sentiment analysis technology
(sentiment analysis), excavates or opinion mining (opinion mining) also known as comment, refers to by automatic point
The content of text of certain comment on commodity is analysed, it is found that consumer passes judgement on attitude and opinion to the commodity.DoubleClick Inc were once
Through having carried out one for U.S.'s rag trade, computer hardware equipment industry, sport and fitness product industry and tourist industry network visitor
The research at family, discovery has the nearly consumer of more than half meeting before purchase decision is done that the introduction of dependent merchandise is searched on network
The comment information of information and other consumers to commodity.As can be seen here, internet comment on commodity is in consumer's purchasing process
Very big effect is played, has important influence for the purchase decision of consumer.Therefore, market are entered to commodity online comment
Sense analysis has become more and more important.
Specifically, sentiment analysis module includes data capture unit, data sorting unit and data extracting unit;
Data capture unit is used to obtain the corresponding original comment data of search target from DBM, and to original
Comment data carries out the treatment such as subordinate sentence, participle and part-of-speech tagging;
Data sorting unit is used to carry out subjective and objective classification to comment sentence according to annotation results, retains subjective comments sentence
Son, filters objective comment sentence;
Data extracting unit is used to be extracted from subjective comments sentence emotion word with the businessman's (commodity) described in comment
Attribute information, according to emotion word and businessman's attribute information generation comment summary, and comment summary is transmitted to DBM
Row storage.Specifically, the present invention is by counting the word frequency of the positive emotion word and negative emotion word occurred in comment data and weighing
Weight, and according to the positive and negative attribute of the overall emotion weighted value of the comment data come to judge the comment data be front comment or negative
Comment in face;Wherein, positive emotion word weighted value is positive number, and negative emotion word weighted value is negative;For example, for wedding celebration businessman
Or the comment data of commodity is when carrying out emotion abstract extraction, in screening comment data according to existing wedding celebration theme dictionary first
The all emotions summary combination for occurring, determines that final accurately emotion is plucked by matching existing syntactic pattern afterwards
Will.
Background server is developed using MVC pattern, due to not including the exploitation of the web front end page in the present invention, because
View (View) layer is not included in this background server, background server directly carries out data interaction with client;Background service
Implement body includes the first controller, second controller, application service layer and data access layer;
First controller is used to receive the searching request of client transmission, and the searching request is entrusted into second controller
Carry out dissection process;
Second controller is used to carry out searching request dissection process, extracts the search key or additional in searching request
Parameter, and search key or additional parameter are transmitted to application service layer carry out business logic processing;Wherein, the additional ginseng
Number is including emotion condition etc..
Application service layer is used to receive the search key or additional parameter of second controller transmission, and calls data access
Layer obtains web data;
Data access layer is used to access DBM, according in search key or additional parameter acquisition DBM
Web data and comment summary, to obtain web data and comment summary be added, delete, change with inquiry etc. grasp
Make, and web data and comment summary are back to by second controller by application service layer, second controller is by web data
And comment summary is packaged after processing and is back to client by the first controller.
Client specifically includes search unit, data receipt unit and route planning unit;
Search unit is used to for the search key of user input or other accessory parameters to be encapsulated in HTTP (hypertext biographies
Defeated agreement, HyperText Transfer Protocol) request in be sent to background server;Wherein, the accessory parameters bag
Include the additional conditions such as Sentiment orientation;
Data receipt unit is used to receive the web data and comment summary of background server return, and to receiving data solution
After analysis treatment, analysis result is shown to user;Wherein, the present invention using comment summary sentiment analysis result, and by emotion
Analysis result is shown to user so that consumer will be seen that masses to certain businessman or the main Sentiment orientation of commodity, so that excellent
Change the purchase decision of oneself;On the other hand, businessman also is understood that consumer to its commodity and the feedback information of service, and disappears
Evaluation of the person of expense to oneself and to rival, so that improving product improves service, it is that itself wins competitive advantage.
Route planning unit is used for what is returned according to the search target or background server of user by location-based service technology
Business Information obtains merchant location, and obtains user current location, is that user carries out path planning;Wherein, the path planning
Mode includes walking path planning, public transport path planning and path planning etc. of driving;With mobile positioning technique, radio communication network
Network, GIS-Geographic Information System, Internet technology are developed rapidly, based on location-based service (Location-Based Services,
LBS application) is also comparatively fast developed.Location-based information service is a kind of increment industry provided according to user position
Business, it is main that user's present position is obtained by mobile positioning technique, under the support of electronic map and business platform, there is provided
Give customer location related information service, its maximum feature is under the time, place and environment that user needs, to provide the user
The information associated with position, so that be more close to the users demand and usage scenario.
Fig. 3 is referred to, is the flow chart of the searching method of the search engine of the embodiment of the present invention.The embodiment of the present invention is searched
The searching method that index is held up is comprised the following steps:
Step 100:The web data of search target is captured by web crawlers, and the web data that will be captured is stored in number
According in storehouse;
It is the method for the crawl web data of the embodiment of the present invention also referring to Fig. 4 for clear explanation step 100
Flow chart.The method of the crawl web data of the embodiment of the present invention is comprised the following steps:
Step 101:The seed URL of web crawlers is set, and URL is added in " url list to be captured ";
Step 102:The feature of URL is extracted, and safeguards one " having downloaded URL feature sets ", the URL characteristic storages that will be extracted
In " having downloaded URL feature sets ";
Step 103:Judge whether " url list to be captured " is empty:If " url list to be captured " is not sky, step is performed
Rapid 104;If " url list to be captured " is sky, step 105 is performed;
Step 104:The URL extracted in " url list to be captured " is parsed, and the corresponding page downloads of URL are got off, and
Related web data is extracted, the web data that will be extracted is stored in database;
At step 104, the processing mode to web data is specially:Parsing current web page content, using regular expressions
Formula extracts effective URL in webpage, and extracts the feature of the URL, by the URL features and " having downloaded URL feature sets "
In URL features matched, judge that the URL whether there is, if there is no the URL then is added into " URL column to be captured
In table ", otherwise by the url filtering, so as to the URL for avoiding addition from repeating;And using in regular expression extraction current web page
Web data, the web data that will be extracted is stored in database;The web data of extraction includes the commodity letter of search target
Breath, Business Information and comment on commodity information etc..
Step 105:Web crawlers end-of-job.
Step 200:The corresponding user comment data of the search target in web data are read, by sentiment analysis technology pair
User comment data generate comment summary after carrying out sentiment analysis treatment, and by comment summary storage in database;
It is the embodiment of the present invention user comment data are carried out also referring to Fig. 5 for clear explanation step 200
The flow chart of the method for sentiment analysis.The method that user comment data are carried out with sentiment analysis of the embodiment of the present invention includes following
Step:
Step 201:The corresponding original comment data of search target is obtained from database, and original comment data is carried out
The treatment such as subordinate sentence, participle and part-of-speech tagging;
Step 202:Subjective and objective classification is carried out to comment sentence according to annotation results, retains subjective comments sentence, filtering visitor
See comment sentence;
Step 203:Emotion word and businessman (commodity) attribute information described in comment are extracted from subjective comments sentence,
According to emotion word and businessman attribute information generation comment summary, and comment summary is transmitted to database and stored.
Step 300:Index database is set up according to the theme of search target by database root, by the web data for extracting and comment
Summary storage is under the index database of correspondence theme;
Step 400:Search key and other additional parameters are input into by client, and search is sent to background server
Request;
In step 400, the accessory parameters additional conditions such as including Sentiment orientation.
Step 500:Searching request is received by background server, corresponding net is obtained in database according to searching request
Page data and comment summary, and the web data of acquisition and comment summary are back to client;
It is the background server treatment search of the embodiment of the present invention also referring to Fig. 6 for clear explanation step 500
The flow chart of the method for request.The method of the background server treatment searching request of the embodiment of the present invention is comprised the following steps:
Step 501:The searching request that client sends is received by the first controller, and the searching request is entrusted to the
Two controllers carry out dissection process;
Step 502:Dissection process is carried out to searching request, search key or additional parameter in searching request is extracted,
And search key or additional parameter are transmitted to application service layer carry out business logic processing;
Step 503:The search key or additional parameter of second controller transmission are received, and calls data access layer to obtain
Web data;
Step 504:The web data and comment summary in database are obtained according to search key or additional parameter, to obtaining
The web data that takes and comment summary be added, delete, change with the operation such as inquiry, and by web data and commenting on make a summary into
Client is back to by the first controller after row encapsulation process.
Step 600:Web data and comment summary that background server is returned are received by client, and to receiving data
User is shown to after dissection process;
Step 700:The Business Information returned according to the search target or background server of user by location-based service technology
Merchant location is obtained, and obtains user current location, be that user carries out path planning;
In step 700, the path planning mode includes walking path planning, public transport path planning and path rule of driving
Draw etc..
Present invention can apply to multiple fields or the internet search engine of theme, such as wedding celebration electric business search engine or number
Code product search engine etc.;By taking wedding celebration electric business search engine as an example, with " meeting is won in China's wedding " and other wedding celebration websites as data are come
Source, businessman or merchandise news in crawl website, and carries out sentiment analysis to the comment data in website, excavate it is popular to businessman or
The Sentiment orientation of commodity, provides the user reference value, it is ensured that user can get when using wedding celebration subject search function
Accurate Search Results, for contemporary people's marriage celebration provides highly effective help.
The search engine system and its searching method of the embodiment of the present invention dig by the comment information for searching for target
Pick and analysis, and sentiment analysis result is shown to consumer, consumer will be seen that the popular Sentiment orientation to certain businessman, from
And optimize the purchase decision of oneself;Meanwhile, retailer will be seen that consumer to its commodity and the feedback information of service, and disappear
Evaluation of the person of expense to oneself and to rival, so that improving product improves service, it is itself to win competitive advantage, no matter offset
Expense person or retailer suffer from very positive meaning.Because the present invention is integrated with sentiment analysis technology, with common search
Engine is compared, and retrieval result is more accurate, more hommization;And it is integrated with location-based service on the basis of mobile search engine
Technology so that consumer can more easily inquire about the route that businessman is gone in current location, greatly save consumer's time.And
The present invention meets the trend that contemporary mobile Internet is developed rapidly, meets user's search need anywhere or anytime.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention
Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.
Claims (10)
1. a kind of search engine system, it is characterised in that including webcrawler module, DBM, sentiment analysis module, after
Platform server and client;The webcrawler module is used to capture the web data of search target, and the webpage that will be grabbed
Data transfer to DBM is stored;The sentiment analysis module is used for from the web data of DBM storage
The corresponding user comment data of search target are read, comment summary are generated after user comment data are carried out with sentiment analysis treatment,
And comment summary is transmitted to DBM and stored;The client is connected with background server, for being taken to backstage
Business device sends searching request, and the background server accesses DBM according to searching request, and obtains corresponding webpage number
According to and comment summary after return to client.
2. search engine system according to claim 1, it is characterised in that the webcrawler module includes that seed is set
Unit, feature extraction unit, list judging unit and data resolution unit;
The seed setting unit " is treated for setting the sub-pages address of web crawlers, and sub-pages address being added to
In crawl url list ";
The feature extraction unit is used to extract the feature of sub-pages address, and the sub-pages address characteristic storage that will be extracted
In " having downloaded URL feature sets ";
The list judging unit is used to judge whether " url list to be captured " is empty:If " url list to be captured " is not
Sky, then parsed by data resolution unit to the sub-pages address in " url list to be captured ";If " URL to be captured
List " is sky, then webcrawler module end-of-job;
The sub-pages address that the data resolution unit is used to extract in " url list to be captured " is parsed, and will plant subnet
The corresponding page download of page address is got off, and extracts the web data of correlation, and the web data that will be extracted is stored in database mould
In block.
3. search engine system according to claim 1, it is characterised in that the sentiment analysis module passes through sentiment analysis
Technology is excavated and analyzed to comment data;The sentiment analysis module specifically includes data capture unit, data grouping sheet
Unit and data extracting unit;
The data capture unit is used to obtain the corresponding original comment data of search target from DBM, and to original
Comment data carries out the treatment of subordinate sentence, participle and part-of-speech tagging;
The data sorting unit is used to carry out subjective and objective classification to comment sentence according to annotation results, retains subjective comments sentence
Son, filters objective comment sentence;
The data extracting unit is used to be extracted from subjective comments sentence emotion word with the businessman described in comment or commodity
Attribute information, according to emotion word and attribute information generation comment summary, and comment summary is transmitted to DBM and carried out
Storage.
4. search engine system according to claim 3, it is characterised in that the background server includes the first control
Device, second controller, application service layer and data access layer;
First controller is used to receive the searching request of client transmission, and the searching request is entrusted into second controller
Carry out dissection process;
The second controller is used to carry out searching request dissection process, extracts the search key or additional in searching request
Parameter, and search key or additional parameter are transmitted to application service layer carry out business logic processing;
The application service layer is used to receive the search key or additional parameter of second controller transmission, and calls data access
Layer obtains web data;
The data access layer is used to access DBM, according in search key or additional parameter acquisition DBM
Web data and comment summary, and by application service layer by web data and comment summary be back to second controller, institute
State after web data and comment summary are packaged treatment by second controller and client is back to by the first controller.
5. search engine system according to claim 4, it is characterised in that the client includes search unit, data
Receiving unit and route planning unit;
The search unit is used to be encapsulated in the search key or accessory parameters of user input in HTTP request to be sent to
Background server;
The data receipt unit is used to receive the web data and comment summary of background server return, and to receiving data solution
After analysis treatment, analysis result is shown to user;
The route planning unit is used for what is returned according to the search target or background server of user by location-based service technology
Business Information obtains merchant location, and obtains user current location, is that user carries out path planning.
6. a kind of searching method of search engine, comprises the following steps:
Step a:The web data of crawl search target, and the web data storage that will be captured is in database;
Step b:The corresponding user comment data of the search target in web data are read, user is commented by sentiment analysis technology
Comment summary is generated after carrying out sentiment analysis treatment by data, and by comment summary storage in database;
Step c:Search key and additional parameter are input into by client, and searching request is sent to background server;
Step d:Searching request is received by background server, corresponding web data is obtained in database according to searching request
And comment summary, and the web data of acquisition and comment summary are back to client.
7. the searching method of search engine according to claim 6, it is characterised in that in the step a, the crawl
The method for searching for the web data of target is comprised the following steps:
Step a1:The sub-pages address of web crawlers is set, and sub-pages address is added to " url list to be captured "
In;
Step a2:The feature of sub-pages address is extracted, and the sub-pages address characteristic storage extracted " is being downloaded into URL
In feature set ";
Step a3:Judge whether " url list to be captured " is empty, if " url list to be captured " is not sky, extraction " is waited to capture
Sub-pages address in url list " is parsed, and the corresponding page download in sub-pages address is got off, and extract correlation
Web data, the web data that will extract stored in database;If " url list to be captured " is sky, web crawlers
Power cut-off.
8. the searching method of search engine according to claim 7, it is characterised in that in the step b, described pair with
The method that family comment data carries out sentiment analysis is comprised the following steps:
Step b1:Obtain the search corresponding original comment data of target from database, and original comment data is carried out subordinate sentence,
Participle and part-of-speech tagging are processed;
Step b2:Subjective and objective classification is carried out to comment sentence according to annotation results, retains subjective comments sentence, filter objective comment
Sentence;
Step b3:Emotion word is extracted from subjective comments sentence with the businessman described in comment or the attribute information of commodity, root
According to emotion word and attribute information generation comment summary, and comment summary is transmitted to database and stored.
9. the searching method of search engine according to claim 8, it is characterised in that in the step d, the backstage
The method of server process searching request specifically includes following steps:
Step d1:The searching request that client sends is received by the first controller, and the searching request is entrusted into the second control
Device processed carries out dissection process;
Step d2:Dissection process is carried out to searching request, search key or additional parameter in searching request is extracted, and will search
Rope keyword or additional parameter are transmitted to application service layer and carry out business logic processing;
Step d3:The search key or additional parameter of second controller transmission are received, and calls data access layer to obtain webpage
Data;
Step d4:Obtain the web data and comment summary in database according to search key or additional parameter, and by webpage
Data and comment summary are back to client after being packaged treatment by the first controller.
10. the searching method of search engine according to claim 9, it is characterised in that also include after the step d following
Step:
Step d5:Web data and comment summary that background server is returned are received by client, and to receiving data parsing
User is shown to after treatment;
Step d6:The Business Information returned according to the search target or background server of user by location-based service technology obtains business
Family position, and user current location is obtained, it is that user carries out path planning.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511023304.0A CN106933864A (en) | 2015-12-30 | 2015-12-30 | A kind of search engine system and its searching method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511023304.0A CN106933864A (en) | 2015-12-30 | 2015-12-30 | A kind of search engine system and its searching method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106933864A true CN106933864A (en) | 2017-07-07 |
Family
ID=59441819
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201511023304.0A Pending CN106933864A (en) | 2015-12-30 | 2015-12-30 | A kind of search engine system and its searching method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106933864A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423072A (en) * | 2017-08-11 | 2017-12-01 | 郑州云海信息技术有限公司 | The method and apparatus of web page dynamic column filter and search list data |
CN107463630A (en) * | 2017-07-14 | 2017-12-12 | 太仓诚泽网络科技有限公司 | Multiterminal webpage control system |
CN108197106A (en) * | 2017-12-29 | 2018-06-22 | 深圳市中易科技有限责任公司 | A kind of product competition analysis method based on deep learning, apparatus and system |
WO2021093821A1 (en) * | 2019-11-14 | 2021-05-20 | 中兴通讯股份有限公司 | Intelligent assistant evaluation and recommendation methods, system, terminal, and readable storage medium |
CN113553490A (en) * | 2021-08-11 | 2021-10-26 | 长沙学院 | Data management platform and data management method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101667194A (en) * | 2009-09-29 | 2010-03-10 | 北京大学 | Automatic abstracting method and system based on user comment text feature |
CN102509229A (en) * | 2011-09-29 | 2012-06-20 | 四川长虹电器股份有限公司 | Group purchase system based on position service and group purchase service realizing method |
CN103123633A (en) * | 2011-11-21 | 2013-05-29 | 阿里巴巴集团控股有限公司 | Generation method of evaluation parameters and information searching method based on evaluation parameters |
US20130275043A1 (en) * | 2012-04-12 | 2013-10-17 | Mitac Research (Shanghai) Ltd. | Location-Based Service System and Wishing Service Method Thereof |
CN103823893A (en) * | 2014-03-11 | 2014-05-28 | 北京大学 | User comment-based product search method and system |
US20140317089A1 (en) * | 2013-04-18 | 2014-10-23 | International Business Machines Corporation | Context aware dynamic sentiment analysis |
-
2015
- 2015-12-30 CN CN201511023304.0A patent/CN106933864A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101667194A (en) * | 2009-09-29 | 2010-03-10 | 北京大学 | Automatic abstracting method and system based on user comment text feature |
CN102509229A (en) * | 2011-09-29 | 2012-06-20 | 四川长虹电器股份有限公司 | Group purchase system based on position service and group purchase service realizing method |
CN103123633A (en) * | 2011-11-21 | 2013-05-29 | 阿里巴巴集团控股有限公司 | Generation method of evaluation parameters and information searching method based on evaluation parameters |
US20130275043A1 (en) * | 2012-04-12 | 2013-10-17 | Mitac Research (Shanghai) Ltd. | Location-Based Service System and Wishing Service Method Thereof |
US20140317089A1 (en) * | 2013-04-18 | 2014-10-23 | International Business Machines Corporation | Context aware dynamic sentiment analysis |
CN103823893A (en) * | 2014-03-11 | 2014-05-28 | 北京大学 | User comment-based product search method and system |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463630A (en) * | 2017-07-14 | 2017-12-12 | 太仓诚泽网络科技有限公司 | Multiterminal webpage control system |
CN107423072A (en) * | 2017-08-11 | 2017-12-01 | 郑州云海信息技术有限公司 | The method and apparatus of web page dynamic column filter and search list data |
CN107423072B (en) * | 2017-08-11 | 2020-10-30 | 苏州浪潮智能科技有限公司 | Method and device for screening dynamic columns of web page and searching table data |
CN108197106A (en) * | 2017-12-29 | 2018-06-22 | 深圳市中易科技有限责任公司 | A kind of product competition analysis method based on deep learning, apparatus and system |
CN108197106B (en) * | 2017-12-29 | 2021-07-13 | 深圳市中易科技有限责任公司 | Product competition analysis method, device and system based on deep learning |
WO2021093821A1 (en) * | 2019-11-14 | 2021-05-20 | 中兴通讯股份有限公司 | Intelligent assistant evaluation and recommendation methods, system, terminal, and readable storage medium |
CN113553490A (en) * | 2021-08-11 | 2021-10-26 | 长沙学院 | Data management platform and data management method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103577416B (en) | Expanding query method and system | |
CN103218431B (en) | A kind ofly can identify the system that info web gathers automatically | |
CN103902535B (en) | Obtain the method, apparatus and system of associational word | |
CN101901241B (en) | Index generating system, information retrieval system, and index generating method | |
CN103631794B (en) | A kind of method, apparatus and equipment for being ranked up to search result | |
CN106933864A (en) | A kind of search engine system and its searching method | |
CN103714119B (en) | A kind for the treatment of method and apparatus of browser data | |
CN101000623A (en) | Method for image identification search by mobile phone photographing and device using the method | |
WO2011063035A1 (en) | A method and system to contextualize information being displayed to a user | |
CN103076892A (en) | Method and equipment for providing input candidate items corresponding to input character string | |
CN101464897A (en) | Word matching and information query method and device | |
CN103455524A (en) | Method and device for displaying and acquiring entry information | |
CN102708174A (en) | Method and device for displaying rich media information in browser | |
CN103210387B (en) | Conjunctive word calling mechanism, information processor, conjunctive word register method and conjunctive word register system | |
CN103150663A (en) | Method and device for placing network placement data | |
CN103530339A (en) | Mobile application information push method and device | |
CN106709073A (en) | Browser notification pushing method and browser terminal | |
TW201401088A (en) | Search method and apparatus | |
CN103034680A (en) | Data interaction method and device for terminal device | |
CN107491465A (en) | For searching for the method and apparatus and data handling system of content | |
CN103338260A (en) | Distributed analytical system and analytical method for URL logs in network auditing | |
CN108027820A (en) | For producing phrase blacklist to prevent some contents from appearing in the method and system in search result in response to search inquiry | |
CN107463592A (en) | For by the method, equipment and data handling system of content item and images match | |
CN110245289A (en) | A kind of information search method and relevant device | |
CN101959178A (en) | Method and equipment for identifying terminal attribute of wireless terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170707 |