Summary of the invention
The application's technical matters to be solved is to provide a kind of vertical intelligent uprightness searching method and system, solves user and cannot obtain comprehensively, accurately and rapidly the shortcoming of relevant information when search information.
In order to address the above problem, the application discloses a kind of intelligent uprightness searching method, comprising:
Obtain the query word of inputting in an industry of user side selection;
According to the result of classified each entries match of the sector in described query word and taxonomy database, obtain the entry relevant to query word of all categories, and category by the entry of classification described in each and correlated information exhibition to user; Wherein, described entry is the one or more business tine in ecommerce webpage;
Wherein, classified each entry of described the sector obtains by following steps:
Obtain all entries and the relevant information of all webpages of described industry;
The morphology matching result of the keyword in each entry and the sector corresponding keyword of all categories is weighted to classification under each entry of Analysis deterrmination;
And/or, according to the occurrence number ratio of each keyword of unfiled entry with according to the similarity of the probability of occurrence of each keyword of the class entry that statistics obtains in advance, determine classification under each entry.
Preferably, described query word comprises:
Using the keyword of user's input as query word;
Or one in the suggestion word that the input word according to this user that user is selected returns as query word; Wherein, described suggestion word extracts acquisition by the input word of user's input and the click relation of corresponding result of adding up in advance.
Preferably, also comprise following safety detection step:
Steps A, for the chained address at described each the entry place to be sorted obtaining, by safety inspection engine and wooden horse killing engine, check whether safety of described chained address, if safety is classified to described entry;
And/or, step B, the chained address for the classified entry place in described taxonomy database, constantly travels through each chained address by safety inspection engine, if dangerous, the related data of this chained address is deleted from taxonomy database;
And/or, step C, the chained address at an entry place of clicking for user, checks whether safety of this chained address by safety inspection engine, if dangerous, point out user and in taxonomy database, the related data of this chained address is deleted.
Preferably, by safety inspection engine and wooden horse killing engine, check whether safe process is undertaken by following steps in described chained address:
Step P1, for the chained address at described each the entry place to be sorted obtaining, submits to safety inspection engine by this chained address and checks whether it exists in safe class storehouse;
Step P2, if existed and safety, classifies to described entry;
Step P3, if existed but dangerous, sends warning message, and filters out the related data of this chained address;
Step P4, if there is no, checks this chained address by wooden horse killing engine, judges that whether this link safety, if safety deposits this chained address in safe class storehouse, and proceeds to step P1.
Preferably, also comprise following safety detection step:
By comprising chained address icp record information and/or the website real-name authentication system of described entry, described entry is carried out to safety detection.
Preferably, when obtaining all entries of all webpages of described industry and relevant information, comprise:
Automatic capturing step, for all entries and the relevant information of all webpages of industry described in automatic capturing;
Supplement typing step, for supplementing entry and the relevant information of typing one webpage.
Preferably, at category, the entry of classification described in each and correlated information exhibition are comprised during to user side:
The user behavior entrance that entry is relevant directly offers user side.
Preferably, also comprise:
Intelligent correction step, the query word of inputting for user error by intelligent correction engine carries out error correction.
Preferably, the occurrence number ratio of each keyword of described entry i and the difference of the probability of occurrence of each keyword of class entry i, by logarithm weighting, calculate classification c deviate G (c) corresponding to entry, the less similarity of deviate is higher, gets minimum deviate place and classifies and determine classification under entry.
Disclosed herein as well is accordingly a kind of intelligent uprightness searching system, comprising:
Search engine, the query word of inputting for obtaining an industry of user side selection; According to the result of classified each entries match of the sector in described query word and taxonomy database, obtain the entry relevant to query word of all categories, and category by the entry of classification described in each and correlated information exhibition to user; Wherein, described entry is the one or more business tine in ecommerce webpage;
Taxonomy database, for storing every profession and trade grouped data;
Data acquisition module, for obtaining all entries and the relevant information of all webpages of described industry;
Sorter, for being weighted the morphology matching result of the keyword of each entry and the sector corresponding keyword of all categories classification under each entry of Analysis deterrmination; And/or, according to the occurrence number ratio of each keyword of unfiled entry with according to the similarity of the probability of occurrence of each keyword of the class entry that statistics obtains in advance, determine classification under each entry.
Preferably, also comprise:
Safety check module, for the chained address for described each the entry place to be sorted obtaining, checks whether safety of described chained address by safety inspection engine and wooden horse killing engine, if safety is classified to described entry;
And/or the chained address for the classified entry place in described taxonomy database, constantly travels through each chained address by safety inspection engine, if dangerous, the related data of this chained address is deleted from taxonomy database;
And/or the chained address at an entry place of clicking for user, checks whether safety of this chained address by safety inspection engine, if dangerous, point out user and in taxonomy database, the related data of this chained address is deleted.
Preferably, also comprise:
Keyword suggestion engine, returns to suggestion word for the input word according to user; Described suggestion word extracts acquisition by the input word of user's input and the click relation of corresponding result of adding up in advance.
Preferably, also comprise:
Intelligent correction engine, carries out error correction for the query word for user error input.
Preferably, described data acquisition module comprises:
Data grabber, for all entries and the relevant information of all webpages of industry described in automatic capturing;
Supplement typing module, for supplementing entry and the relevant information of typing one webpage.
Preferably, also comprise:
Interface provides module, for the entry of classification described in each and correlated information exhibition directly being offered to user side by the relevant user behavior entrance of entry during to user side at category.
Compared with prior art, the application comprises following advantage:
The application is by take trade classification as search starting point, obtain the entry of all websites of every profession and trade in network, and by the morphology matching result of the keyword in each entry and the sector corresponding keyword of all categories is weighted under each entry of Analysis deterrmination classification and/or according to the occurrence number ratio of each keyword of unfiled entry with according to the similarity of the probability of occurrence of each keyword of the class entry that statistics obtains in advance, determine classification under each entry; After all entries of an industry are classified, when user inquires about, search engine can be according to the result of classified each entries match of the sector in described query word and taxonomy database, obtain the entry relevant to query word of all categories, and category by the entry of classification described in each and correlated information exhibition to user; The application, by setting up automatic classification system, has realized automatic classification and the screening function of data, can represent more accurately, more comprehensively, more efficiently the information relevant to user's focus.
Embodiment
For the application's above-mentioned purpose, feature and advantage can be become apparent more, below in conjunction with the drawings and specific embodiments, the application is described in further detail.
With reference to Fig. 1, show the schematic flow sheet of a kind of intelligent uprightness searching method of the application, comprising:
Step 110, obtains the query word of inputting in an industry of user side selection.
Step 120, according to the result of classified each entries match of the sector in described query word and taxonomy database, obtains the entry relevant to query word of all categories, and category by the entry of classification described in each and correlated information exhibition to user; Wherein, described entry is the one or more business tine in ecommerce webpage.Wherein, business tine is not containing non-main business information such as news, advertisement, question and answer, and do not comprise new line on webpage, advertisement or icp (Internet Content Provider, the Web content service provider) information in region such as put on record etc. on or below a left side.
Wherein, classified each entry of described the sector obtains by following steps:
Step 210, obtains all entries and the relevant information of all webpages of described industry;
Step 220, is weighted classification under each entry of Analysis deterrmination by the morphology matching result of the keyword in each entry and the sector corresponding keyword of all categories;
And/or, according to the occurrence number ratio of each keyword of unfiled entry with according to the similarity of the probability of occurrence of each keyword of the class entry that statistics obtains in advance, determine classification under each entry.
The application further classifies by step 210 and step 220 pair every profession and trade under can be online in advance.Such as for education sector, can be divided into juvenile education, secondary education, higher education, the large class such as vocational education, in juvenile education, can be divided into child again, children rises little, one grade, second grade, three grades, senior class, five grades, six grades, the classes such as from-primary-to-junior-middle-school, secondary education can be divided into the junior one again, the junior two, the junior three, senior middle school's preparatory course, high by one, Senior Two, Senior Three, the classes such as preparatory course, higher education can be divided into English Test (Band 4 and 6) again, new ideas, English is relevant, rare foreign languages, the English of preparing for the postgraduate qualifying examination, the mathematics of preparing for the postgraduate qualifying examination, the politics of preparing for the postgraduate qualifying examination, specialized course prepares for the postgraduate qualifying examination, the TOEFL, the classes such as IELTS, vocational education can be divided into TOEIC again, occupation rare foreign languages, financial accounting, self-study examination, computing machine, driving school, construction work, economy and trade/finance, medicine, working graduate student, human resources, the classes such as civil servant.Wherein, each class can contain corresponding a plurality of keyword, such as contain child, children in juvenile education class, rises the keywords such as little, a grade, second grade, three grades, senior class, five grades, six grades, from-primary-to-junior-middle-school, elementary English, mathematics, Chinese language, preschool class, speciality class.
Again such as for game industry, can be divided into the classes such as type of play, game subject matter and game picture, the classes such as type of play can be divided into again that role playing, turn-based, action game, FPS shooting, TPS shooting match, car are competed for speed, sports, music and dance, fistfight are fought, strategy; Game subject matter can be divided into again the classes such as swordsman, fantasy, magical, science fiction, cartoon, history; Game picture can be divided into the classes such as 3D, 2D, 2.5D again.Wherein each class can contain corresponding a plurality of keyword.
When classification, first by step 210, by industry, obtain all entries of all websites of the sector in network, described entry is the one or more business tine in ecommerce webpage, business tine not containing non-main business information such as news, advertisement, question and answer wherein, and do not comprise new line on webpage, advertisement or the icp information in region such as put on record etc. on or below a left side, such as education sector is respectively educated course content and the title thereof of type website, such as " English of preparing for the postgraduate qualifying examination spurt class " and content thereof.
Wherein, when obtaining all entries of all webpages of described industry and relevant information, comprise:
Automatic capturing step, for all entries and the relevant information of all webpages of industry described in automatic capturing.
Wherein, automatic capturing step process can be undertaken by following steps:
Step m1, the chained address of all websites of automatic search the sector, generates and captures list, and record each crawl time and seized condition.
Step m2, adopts distributed deployment, according to Regional Distribution, different web sites is carried out to periodicity and capture.If website data has change, website side can adopt proactive notification mechanism, calls the notification interface that the application provides, and realizes the real-time update to this website data.When website data changes, only need access the Notify Address that the application provides, crawl behavior can be triggered in this address.
Step m3, carries out safety inspection and validity check to capturing the data of coming.Safety inspection is submitted to safety check module by address and is checked whether whether this address has wooden horse or virus to exist, include this address thereby return to prompting.Whether validity check can normally open this address of inspection, if this address is returned, does not exist or other mistakes, and this crawl will can not included this address.
Step m4, submits to the sorter processing of classifying by data, and capture program continues to capture new data.After this crawl finishes, data grabber will restart to capture, thereby and judge whether file has renewal to determine whether to skip and capture next record.
Supplement typing step, for supplementing entry and the relevant information of typing one webpage.Can be used for supplementing the content that automatic capturing step does not grab.
Common web data are unstructured datas, if use the mode of traditional crawl web page, need data to carry out structured analysis, in this process, because the Intelligent Recognition of system is limited in one's ability, will cause the loss of partial information.In order to address this problem, the application, in conjunction with the feature of problem, has set up a kind of cooperative mechanism with data source provider, by formulated and provide industry data format standard by the application, by data source provider, according to this standard, fill in data content, thereby reached the effect of getting twice the result with half the effort.Can set up by the following method format standard: the selected industry that need to carry out vertical search, such as education, game; Analyze the data characteristics of the sector, formulate format standard.The game of take is example, when people mention game, can expect game name, type of play, and game profile, the relevant informations such as game address, carry out induction-arrangement to these information, finally form standard interface.The game interface field forming is as follows:
Field |
Explanation of field |
Game name |
Be no more than 50 words |
Game characteristic |
Be no more than 50 words |
Mission Description |
Be no more than 500 words |
Game state |
Test or issue |
Type of play |
Client game/web game etc. |
Game operator |
For online game |
Game start address |
For online game |
Game download address |
Download link address is provided |
How to start |
Be no more than 500 words |
Operating guidance |
Be no more than 500 words |
Game picture |
Picture chained address is provided |
Registered address |
Game registered address is provided |
Different according to the rank of data volume, the application has formulated full dose interface and variable interface, by data source provider, according to this interface, generates corresponding interface data, and the application realizes relevant calling.The feature of full dose structure can disposablely be returned to all data, and variable interface is each returning part data.The application advises using the data of xml form to transmit.When data that Dang Yige mechanism provides are fewer (size of general whole xml file is in 50Mb), can directly use full dose data-interface; When data volume is larger, generally need to provides variable interface, thereby obtain in batches data.Address is with the situation of change of every data of formal description of id, and another one address can obtain corresponding data by id.Form is as follows:
<id do=’insert’>1000</id>
<id do=’update’>1001</id>
<id do=’delete’>1002</id>
<id do=’insert’>1003</id>
When the value of do is insert, show this id corresponding be recorded as newly-increased record;
When the value of do is update, show this id corresponding be recorded as amendment record;
When do while being worth for delete, show this id corresponding be recorded as deletion record.
Because the Data classification information of obtaining from each data source is all different, when data exhibiting, for convenient the screening of user's energy, the application need to reclassify according to unified standard various data.Enter step 220, by rule match classification: the morphology matching result of the keyword in each entry and the sector corresponding keyword of all categories is weighted to classification under each entry of Analysis deterrmination;
And/or, by statistics, mate classification: according to the occurrence number ratio of each keyword of unfiled entry with according to the similarity of the probability of occurrence of each keyword of the class entry that statistics obtains in advance, determine classification under each entry.
For rule match classification:
For example, below title:
" 3 years, class in autumn junior middle school level physical a surname Feng Shi system top-student class "
By this title, the classified information that the application can obtain is: the information such as autumn class, the junior three, physics.
First, can find out that three classified informations that the application obtains have corresponding word in title, obtain this classified information so the application can set up three rules.
Rule one: if contain " class in autumn " in title, be categorized as " class in autumn ";
Rule two: if contain " three grades, junior middle school " in title, be categorized as " junior three ";
Rule three: if contain " physics " in title, be categorized as " physics ".
In this way, similar title classified information below can be obtained.
" class is improved towards high system in level physical east, 3 years, class in autumn junior middle school "
" 3 years level physical top-student class of senior middle school of class in autumn "
" class in autumn junior middle school one grade mathematics improves class "
" three grades, class in autumn junior middle school English improves class "
Rule match method is based upon on morphology matching mechanisms, and it determines according to the common keyword occurring in entry and all kinds of keyword which classification is entry belong to, and by weighted analysis, finally determines criteria for classification, and its formula is as follows:
P
1=x
1r
1+x
2r
2+x
3r
3+...+x
nr
n
P wherein
ifor the classification results drawing by single rule, for certain classification P
ithe highest, this entry just belongs to such; x
ifor morphology coefficient; r
ifor morphology matching result.Morphology matching result refers to the number of times that this word occurs in this coupling.Morphology coefficient refers to the weight that this morphology matching result is stood in all morphology matching results.Higher this value of weight more approaches 1, and lower this value of weight more approaches 0.The value of this value is by manually arranging.Such as match result in title, morphology coefficient is relatively high, in description or content, matches result, and morphology coefficient is lower.
The coefficient that table one is used while being the classification of coupling English, is categorized as example with " Gao Yi English Band training seminar " coupling " English ":
Morphology coupling |
Morphology matching result |
Morphology coefficient |
In title, contain " English " |
1 |
0.8 |
In title, contain " level Four " |
1 |
0.7 |
In title, contain " IELTS " |
0 |
0.7 |
In description, contain " English " |
3 |
0.3 |
In description, contain " level Four " |
1 |
0.2 |
Table one
Can calculate batch mark of mixing English classification is: 2.6
Then the mark that calculates other classification of coupling, finally checks that the highest this entry of mark of which classification belongs to this classification.
For Statistical Classification:
First the application arranges the associative key of all impact classification, then classified data are added up, check that in each entry, these keywords exist or non-existent relation, next with these keywords to those still non-classified entry analyze, check the relation that these keywords exist in entry, finally contrast the occurrence number ratio of the probability of classified each keyword and each keyword of unfiled entry, if two close, can think that this unfiled entry belongs to such.
According to analysis above, the application sets up following formula:
Wherein, c is classification, and G (c) is classification deviate, and 1 is constant, and for guaranteeing that log value is effective, i is keyword, T
cifor the keyword of the class entry probability of occurrence having counted, t
cifor entry keyword occurrence number ratio to be sorted.G (c) is less, illustrates that similarity is higher, judges this entry and belongs to c classification.Wherein, class entry keyword probability of occurrence equals the geometric mean of the keyword occurrence number ratio of all entries; The number of times that number of times/all keywords occur in entry that entry keyword occurrence number ratio=keyword to be sorted occurs in entry.
According to the occurrence number ratio of each keyword of unfiled entry with according to the similarity of the probability of occurrence of each keyword of the class entry that statistics obtains in advance, determine classification under each entry.
Eigenwert for example: English, mathematics, Chinese language, the entry of prior classification is carried out to Feature Words probability of occurrence statistics, obtain as following table two:
Table two
The application carries out eigenwert statistics to four unfiled entries and obtains table three below:
Table three
Finally according to aforementioned formula, calculate as table four classification results:
Table four
With reference to Fig. 2, be that this Statistical Classification method is along with the continuous increase of statistics and the corresponding diagram of classification quality.Can find out, along with the continuous increase of statistics, the accuracy rate of sample classification method constantly improves, and sample size is larger, and its classification accuracy more approaches 1, so this Statistical Classification method has sufficient validity.
In actual applications, with reference to Fig. 3, be the classification process schematic diagram of the application's the best.
For the convenience of computing and the load of attenuating system of system, the keyword that first the application adopts aforesaid matched rule classification to be about in each entry is weighted the affiliated classification of each entry of Analysis deterrmination with the morphology matching result of the sector corresponding keyword of all categories; When in threshold time, when certain entry cannot be classified by matched rule classification, by Statistical Classification method according to the occurrence number ratio of each keyword of unfiled entry with according to the similarity of the probability of occurrence of each keyword of the class entry that statistics obtains in advance, determine classification under each entry.
After in to network, the classification of every profession and trade completes, when user uses the application's system, query word for user, search engine can be according to the result of classified each entries match of the sector in described query word and taxonomy database, obtain the entry relevant to query word of all categories, and category by the entry of classification described in each and correlated information exhibition to user.
Wherein, described query word comprises:
Using the keyword of user's input as query word.
Or one in the suggestion word that the input word according to this user that user is selected returns as query word; Wherein, described suggestion word extracts acquisition by the input word of user's input and the click relation of corresponding result of adding up in advance.
For suggestion word, because user is in when search, the keyword of choosing is very general word to a great extent, has a lot of meanings, can corresponding various webpages, and in fact user may only want to look for specific content.Such as user search " English ", the webpage that can match this word has countless webpages, and in fact user may think search be contents such as " Expert English language training by qualified teachers " or " English exam ", so for can better match user demand, the application advises that by intelligence dictionary analyzes the keyword of user input, thereby provide a suggestion word to allow user select to search for again, reached refinement user's request, thereby can understand more accurately user view, for user provides Search Results more accurately.
Further, can set up as follows intelligence suggestion dictionary, then by keyword suggestion engine, according to the input word of user's input, return and respectively advise word.
Step n1, the input word of counting user input and the click relation of corresponding result.Such as user search English, clicked the course of Expert English language training by qualified teachers, searching handset has been clicked the webpage of buying mobile phone.
Step n2, sorts according to the result of statistics, hot topic corresponding to each input word is clicked to title and carry out word segmentation processing.
Step n3, according to word segmentation result, extracts the click word relevant to input word, produces input word and the corresponding relation of clicking word.Such as the corresponding Expert English language training by qualified teachers of English, English Band, English is prepared for the postgraduate qualifying examination etc.User during keyword, clicks the prompting of word in input to user, if user has selected click, word screens, good refinement user search object, thereby more accurately for user provides Search Results.
With reference to Fig. 4, be to have suggestion word and without suggestion word the contrast of continuous six days of Search Results clicking rate.As can be seen from the figure, there is the result clicking rate of suggestion word apparently higher than the clicking rate without suggestion word, prove that the application's suggestion word structural scheme has validity.
In addition, the application can carry out error correction by intelligent correction engine for the query word of user's input, and such as user's input " test English ", intelligent correction engine can be corrected as " English of preparing for the postgraduate qualifying examination ".
In addition, the application comprises the entry of classification described in each and correlated information exhibition during to user side at category: the user behavior entrance that entry is relevant directly offers user side.Such as game has download, course has registration etc., when representing, directly these interfaces is offered to user.
Preferably, the application's the process that represents is:
Step q1, gets the query word of user search.
Step q2, inserts intelligent correction engine by query word and inquires about.This engine can carry out obfuscation participle by user input query word, by the natural participle dictionary having generated, and standard participle dictionary, the modes such as phonetic error correction dictionary, produce a discernible participle text of search engine.
Step q3, search engine is inserted the text in the compound full-text index generating by taxonomy database in advance and is retrieved, and Search Results is carried out to intelligent sequencing, then by search engine, returns to the result that sorts and optimized.Wherein, intelligent sequencing mode is processed according to the temperature of participle text generating and the degree of correlation.Because in advance industry data having been carried out to format analyzes, in interface by definition, directly obtained the address that user may use, when foreground represents, directly user behavior entrance being offered to user (downloads such as game has immediately, course has registration, audiovisual etc.), realize user and click download, the webpage that need not enter again the other side just can directly be downloaded.User clicks registration, need not enter the INTRODUCTION OF THE COURSE STRUCTURE page again and just can directly register, and finally realizes the through page of wanting of a key.
By said process, can realize user's query word is carried out to intelligent correction, the query aim with user is precisely represented, and the through user behavior entrance of a key.
In addition, the application also comprises following safety detection step:
Steps A, for the chained address at described each the entry place to be sorted obtaining, by safety inspection engine and wooden horse killing engine, check whether safety of described chained address, if safety is classified to described entry.
With reference to Fig. 5, by safety inspection engine and wooden horse killing engine, check whether safe process is undertaken by following steps in described chained address:
Step P1, for the chained address at described each the entry place to be sorted obtaining, submits to safety inspection engine by this chained address and checks whether it exists in safe class storehouse;
Step P2, if existed and safety, classifies to described entry;
Step P3, if existed but dangerous, sends warning message, and filters out the related data of this chained address;
Step P4, if there is no, checks this chained address by wooden horse killing engine, judges that whether this link safety, if safety deposits this chained address in safe class storehouse, and proceeds to step P1.
And/or, step B, the chained address for the classified entry place in described taxonomy database, constantly travels through each chained address by safety inspection engine, if dangerous, the related data of this chained address is deleted from taxonomy database.The safe class storehouse of safety inspection engine can ceaselessly travel through each address; Once finding has non-safety information in link, immediately this address is made as dangerously, notification data engine, deletes this link related data simultaneously.
And/or, step C, the chained address at an entry place of clicking for user, checks whether safety of this chained address by safety inspection engine, if dangerous, point out user and in taxonomy database, the related data of this chained address is deleted.Record and allly to user, represent and safety inspection engine is inserted by these addresses in the chained address clicked, carry out safety inspection, if it is dangerous to be checked through address, point out immediately user " to be checked through this network address dangerous; whether continue access ", notification service end is just deleted this link related data simultaneously.
Preferably, the application uses steps A and step B and step C simultaneously, sets up network address Safety actuality checking mechanism.Such as using 360 wooden horse cloud killing engines, by uninterrupted circular test with click follow-up investigations technology, to all obtained to connect immediately investigate and set up safe class storehouse.Use triple security mechanisms, data are checked.The first heavy security mechanism: when entering storehouse to be sorted, data check for the first time whether safety of web page address for entry to be sorted, and associated with safe class database data, once pinpoint the problems, will delete immediately this record; The second heavy security mechanism: will carry out uninterrupted circular test after data are classified; Triple security mechanisms: when user clicks the corresponding network address of Search Results, system will submit to this address to judge whether safety of this address to safety inspection mechanism level.
The application, also comprises following safety detection step:
By comprising chained address icp record information and/or the website real-name authentication system of described entry, described entry is carried out to safety detection.
In practice, also may exist some in form normal, do not have wooden horse and any other virus but in fact non-honest webpage, such as some fishing webpages.The application can be by described entry place the ICP record information of website of chained address can the basic condition of query web, the situations such as situation of website owner compare and confirm whether safety of described entry with actual information, whether chained address that also can be by closely carrying out entry place described in the website real-name authentication system verification of real-name authentication safety, if safety, deposits described entry and relevant information thereof in taxonomy database.
By above-mentioned safety inspection process, can ensure the safety of webpage in user's use procedure.
With reference to Fig. 6, show the structural representation of a kind of intelligent uprightness searching system of the application.
Search engine 310, the query word of inputting for obtaining an industry of user side selection; According to the result of classified each entries match of the sector in described query word and taxonomy database, obtain the entry relevant to query word of all categories, and category by the entry of classification described in each and correlated information exhibition to user; Wherein, described entry is the one or more business tine in ecommerce webpage.
Wherein, business tine is containing the webpage of the non-main business information such as news, advertisement, question and answer, and do not comprise new line on webpage, advertisement or the icp information in region such as put on record etc. on or below a left side.
Taxonomy database 320, for storing every profession and trade grouped data.
Data acquisition module 410, for obtaining all entries and the relevant information of all webpages of described industry.
Sorter 420, for being weighted the morphology matching result of the keyword of each entry and the sector corresponding keyword of all categories classification under each entry of Analysis deterrmination; And/or, according to the occurrence number ratio of each keyword of unfiled entry with according to the similarity of the probability of occurrence of each keyword of the class entry that statistics obtains in advance, determine classification under each entry.
Further also comprise: safety check module, for the chained address for described each the entry place to be sorted obtaining, by safety inspection engine and wooden horse killing engine, check whether safety of described chained address, if safety is classified to described entry;
And/or the chained address for the classified entry place in described taxonomy database, constantly travels through each chained address by safety inspection engine, if dangerous, the related data of this chained address is deleted from taxonomy database;
And/or the chained address at an entry place of clicking for user, checks whether safety of this chained address by safety inspection engine, if dangerous, point out user and in taxonomy database, the related data of this chained address is deleted.
Further, also comprise: keyword suggestion engine, returns to suggestion word for the input word according to user; Described suggestion word extracts acquisition by the input word of user's input and the click relation of corresponding result of adding up in advance.
Further, also comprise: intelligent correction engine, carries out error correction for the query word for user error input.Wherein intelligent correction engine can be included in search engine.
Further, described data acquisition module comprises:
Data grabber, for all entries and the relevant information of all webpages of industry described in automatic capturing;
Supplement typing module, for supplementing entry and the relevant information of typing one webpage.
Further, also comprise: interface provides module, for the entry of classification described in each and correlated information exhibition directly being offered to user side by the relevant user behavior entrance of entry during to user side at category.
With reference to Fig. 7, show the preferred structural representation of a kind of intelligent uprightness searching system of the application.
User inputs relevant input word by browser; Keyword suggestion engine can return to the suggestion word relevant to inputting word according to intelligence suggestion dictionary wherein, user can select oneself input word or system to suggestion word as query word; When user has confirmed after query word, search engine returns to according to the class entry in query word combining classification database and relevant information the result for retrieval that user is relevant to query word to be shown, while wherein showing, category, by the entry of classification described in each and correlated information exhibition to user, also can directly represent the relevant user behavior interface of entry to user simultaneously; Wherein search engine also can carry out intelligent correction to user's query word by its intelligent correction engine.
Wherein, by data grabber, in network, capture web data, the data that also can not grab by supplementing typing module supplementary data grabber; Described data transfer to sorter to carry out the classification of various entries and relevant information thereof; Then sorter can deposit classified data in taxonomy database in.
At this simultaneously, safety check module can be carried out the first heavily safety inspection when data acquisition, in the chained address process of data grabber crawl data, in the chained address process of the supplementary typing module supplementary data in family, carry out the first heavily safety inspection, the chained address of safety just can be put into sorter and classify; In taxonomy database, carry out the second heavily safety inspection, safety check module constantly travels through the chained address that checks the data in safety database, checks its whether safety, and the data of safety just can retain; When user clicks the chained address of certain entry by browser, safety inspection engine can carry out triple safe inspection, the webpage of user being clicked carries out actual time safety inspection, for unsafe chained address, safety inspection engine can be pointed out user's potential risk, and reporting system is deleted the related data of this chained address.
The application, by setting up network address Safety actuality checking mechanism, has reduced user by the poisoning risk of search.By foundation, capture typing management platform, solved the single problem of Data Source, realized and obtained by all kinds of means data, make data more comprehensively with abundant.By setting up automatic classification system, realized automatic classification and the screening function of data.By setting up keyword intelligence suggestion mode, refinement user search demand, realized and offered user's Search Results more accurately.By setting up industry Data Mining Mechanism, a key of realizing general utility functions is through.Reached and allowed user security search, user does not worry harmful network address in result; Cover, a station search just can be obtained whole industry information comprehensively; Precisely result, returns to the real valuable result to user, rather than allows user oneself select in magnanimity result; Quick through: to user, to provide immediate access, and need not enter again the page, select.
For system embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part is referring to the part explanation of embodiment of the method.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and each embodiment stresses is the difference with other embodiment, between each embodiment identical similar part mutually referring to.
A kind of vertical intelligent uprightness searching method and the system that above the application are provided, be described in detail, applied specific case herein the application's principle and embodiment are set forth, the explanation of above embodiment is just for helping to understand the application's method and core concept thereof; Meanwhile, for one of ordinary skill in the art, the thought according to the application, all will change in specific embodiments and applications, and in sum, this description should not be construed as the restriction to the application.