Summary of the invention
The application's technical matters to be solved provides a kind of vertical intelligent uprightness searching method and system, solves the user can't obtain relevant information comprehensively, accurately and rapidly when search information shortcoming.
In order to address the above problem, the application discloses a kind of intelligent uprightness searching method, comprising:
Obtain the query word of importing in the industry of user side selection;
According to the result of classified each entries match of the sector in said query word and the taxonomy database, obtain the clauses and subclauses relevant of all categories, and category is given the user with the clauses and subclauses and the correlated information exhibition of each said classification with query word; Wherein, described clauses and subclauses are or multinomial business tine in the ecommerce webpage;
Wherein, classified each clauses and subclauses of said the sector obtain through following steps:
Obtain all clauses and subclauses and the relevant information of said all webpages of industry;
The morphology matching result of the keyword in each clauses and subclauses and the sector corresponding keyword of all categories is carried out weighted analysis confirm classification under each clauses and subclauses;
And/or, confirm classification under each clauses and subclauses according to the occurrence number ratio of each keyword of unfiled clauses and subclauses and the similarity of the probability of occurrence of each keyword of the class entry that obtains according to statistics in advance.
Preferably, described query word comprises:
With the keyword of user's input as query word;
Perhaps, one in the suggestion speech that the input speech of the user being selected according to this user returns as query word; Wherein, described suggestion speech obtains with corresponding result's click relation extraction through the input speech of user's input of statistics in advance.
Preferably, also comprise following safety detection step:
Steps A, for the chained address at said each clauses and subclauses place to be classified that obtains, check whether safety of said chained address through safety inspection engine and wooden horse killing engine, if safety is then classified to said clauses and subclauses;
And/or, step B, the chained address for the classified clauses and subclauses place in the said taxonomy database constantly travels through each chained address through the safety inspection engine, if dangerous, then the related data of this chained address is deleted from taxonomy database;
And/or, step C, the chained address at the clauses and subclauses place of clicking for the user, through this chained address of safety inspection engine inspection safety whether, if dangerous, then point out the user and in taxonomy database with the related data deletion of this chained address.
Preferably, check through safety inspection engine and wooden horse killing engine whether safe process is carried out through following steps in said chained address:
Step P1, for the chained address at said each the clauses and subclauses place to be classified that obtains, the inspection of safety inspection engine is submitted in this chained address, and whether it exists in the safe class storehouse;
Step P2 if exist and safety, then classifies to said clauses and subclauses;
Step P3 if exist but dangerous, then sends warning message, and filters out the related data of this chained address;
Step P4 if do not exist, then through this chained address of wooden horse killing engine inspection, judges that whether this link safety, if safety deposits this chained address in the safe class storehouse, and changes step P1 over to.
Preferably, also comprise following safety detection step:
Through put on record information and/or website real name Verification System of the chained address icp that comprises said clauses and subclauses said clauses and subclauses are carried out safety detection.
Preferably, when all clauses and subclauses of obtaining said all webpages of industry and relevant information, comprise:
Automatically grasp step, be used for grasping automatically all clauses and subclauses and the relevant information of said all webpages of industry;
Replenish the typing step, be used for replenishing the clauses and subclauses and the relevant information of typing one webpage.
Preferably,, category comprises when giving user side with the clauses and subclauses of each said classification and correlated information exhibition:
The user behavior inlet that clauses and subclauses are relevant directly offers user side.
Preferably, also comprise:
The intelligent correction step, the query word of importing for user error through the intelligent correction engine carries out error correction.
Preferably; The occurrence number ratio of each keyword i of said clauses and subclauses and the difference of the probability of occurrence of each keyword i of class entry; Through the logarithm weighting; Calculate the corresponding classification c deviate G (c) of clauses and subclauses, the more little then similarity of deviate is high more, gets minimum deviate place classification and confirms to classify under the clauses and subclauses.
Disclosed herein as well is a kind of intelligent uprightness searching system accordingly, comprising:
Search engine is used for obtaining the query word that a industry that user side selects is imported; According to the result of classified each entries match of the sector in said query word and the taxonomy database, obtain the clauses and subclauses relevant of all categories, and category is given the user with the clauses and subclauses and the correlated information exhibition of each said classification with query word; Wherein, described clauses and subclauses are or multinomial business tine in the ecommerce webpage;
Taxonomy database is used to store every profession and trade grouped data;
Data acquisition module is used to obtain all clauses and subclauses and the relevant information of said all webpages of industry;
Sorter is used for morphology matching result with the keyword of each clauses and subclauses and the sector corresponding keyword of all categories and carries out weighted analysis and confirm classification under each clauses and subclauses; And/or, confirm classification under each clauses and subclauses according to the occurrence number ratio of each keyword of unfiled clauses and subclauses and the similarity of the probability of occurrence of each keyword of the class entry that obtains according to statistics in advance.
Preferably, also comprise:
Safety check module is used for the chained address for said each clauses and subclauses place to be classified that obtains, and checks whether safety of said chained address through safety inspection engine and wooden horse killing engine, if safety is then classified to said clauses and subclauses;
And/or the chained address for the classified clauses and subclauses place in the said taxonomy database constantly travels through each chained address through the safety inspection engine, if dangerous, then the related data of this chained address is deleted from taxonomy database;
And/or, the chained address at the clauses and subclauses place of clicking for the user, through this chained address of safety inspection engine inspection safety whether, if dangerous, then point out the user and in taxonomy database with the related data deletion of this chained address.
Preferably, also comprise:
The keyword suggestion engine is used for returning the suggestion speech according to user's input speech; Described suggestion speech obtains with corresponding result's click relation extraction through the input speech of user's input of statistics in advance.
Preferably, also comprise:
The intelligent correction engine is used for carrying out error correction for the query word of user error input.
Preferably, described data acquisition module comprises:
The data grabber is used for grasping automatically all clauses and subclauses and the relevant information of said all webpages of industry;
Replenish the typing module, be used for replenishing the clauses and subclauses and the relevant information of typing one webpage.
Preferably, also comprise:
Interface provides module, is used for that the user behavior inlet that clauses and subclauses are relevant directly offers user side when category is given user side with the clauses and subclauses of each said classification and correlated information exhibition.
Compared with prior art, the application comprises following advantage:
The application is through being the search starting point with the trade classification; Obtain the clauses and subclauses of all websites of every profession and trade in the network, and carry out weighted analysis through morphology matching result and confirm classification under each clauses and subclauses and/or confirm classification under each clauses and subclauses according to the occurrence number ratio of each keyword of unfiled clauses and subclauses and the similarity of the probability of occurrence of each keyword of the class entry that obtains according to statistics in advance with the keyword in each clauses and subclauses and the sector corresponding keyword of all categories; After all clauses and subclauses of an industry are classified; When the user inquires about; Search engine can be according to the result of classified each entries match of the sector in said query word and the taxonomy database; Obtain the clauses and subclauses relevant of all categories, and category is given the user with the clauses and subclauses and the correlated information exhibition of each said classification with query word; The application has realized the automatic classification and the screening function of data through setting up automatic classification system, can represent the information relevant with user's focus more accurately, more comprehensively, more efficiently.
Embodiment
For above-mentioned purpose, the feature and advantage that make the application can be more obviously understandable, the application is done further detailed explanation below in conjunction with accompanying drawing and embodiment.
With reference to Fig. 1, show the schematic flow sheet of a kind of intelligent uprightness searching method of the application, comprising:
Step 110 is obtained the query word of importing in the industry of user side selection.
Step 120 according to the result of classified each entries match of the sector in said query word and the taxonomy database, is obtained the clauses and subclauses relevant with query word of all categories, and category is given the user with the clauses and subclauses and the correlated information exhibition of each said classification; Wherein, described clauses and subclauses are or multinomial business tine in the ecommerce webpage.Wherein, business tine does not contain non-main business information such as news, advertisement, question and answer, and does not comprise new line on the webpage, advertisement or icp (Internet Content Provider, the Web content service provider) information in zones such as upper left or below such as put on record.
Wherein, classified each clauses and subclauses of said the sector obtain through following steps:
Step 210 is obtained all clauses and subclauses and the relevant information of said all webpages of industry;
Step 220 is carried out weighted analysis with the morphology matching result of the keyword in each clauses and subclauses and the sector corresponding keyword of all categories and is confirmed classification under each clauses and subclauses;
And/or, confirm classification under each clauses and subclauses according to the occurrence number ratio of each keyword of unfiled clauses and subclauses and the similarity of the probability of occurrence of each keyword of the class entry that obtains according to statistics in advance.
The application can onlinely further classify through step 210 and step 220 pair every profession and trade down in advance.Such as for education sector; Can it be divided into juvenile education; Secondary education; Higher education; Big type of vocational educations etc. can be divided into child, children again and rise little, one grade, second grade, three grades, senior class, five grades, six grades, from-primary-to-junior-middle-school etc. type in juvenile education, secondary education can be divided into the junior one, the junior two, the junior three, senior middle school's preparatory course, high by one, Senior Two, Senior Three, preparatory course etc. type again; Higher education can be divided into again that English Test (Band 4 and 6), new ideas, English are relevant, rare foreign languages, the English of preparing for the postgraduate qualifying examination, the mathematics of preparing for the postgraduate qualifying examination, the politics of preparing for the postgraduate qualifying examination, the specialized course for preparing for the postgraduate qualifying examination, the TOEFL, IELTS etc. type, and vocational education can be divided into classes such as TOEIC, professional rare foreign languages, financial accounting, self-study examination, computing machine, driving school, construction work, economy and trade/finance, medicine, working graduate student, human resources, civil servant again.Wherein, Each type all can contain corresponding a plurality of keyword, rises little, keywords such as a grade, second grade, three grades, senior class, five grades, six grades, from-primary-to-junior-middle-school, elementary English, mathematics, Chinese language, preschool class, speciality class such as contain child, children in the juvenile education class.
Again for example, for the recreation industry; Can be divided into type of play, recreation subject matter and game picture etc. type, type of play can be divided into again that role playing, turn-based, action game, FPS shooting, TPS shooting match, car are competed for speed, sports, music and dance, fistfight fight, strategy etc. type; The recreation subject matter can be divided into swordsman, fantasy, magical, science fiction, cartoon, history etc. type again; Game picture can be divided into 3D, 2D, 2.5D etc. types again.Wherein each class all can contain corresponding a plurality of keyword.
At a minute time-like; At first obtain all clauses and subclauses of all websites of the sector in the network by industry through step 210; Described clauses and subclauses are or multinomial business tine in the ecommerce webpage; Wherein business tine does not contain non-main business information such as news, advertisement, question and answer; And do not comprise new line on the webpage, advertisement or the icp information in zones such as upper left or below such as put on record, such as education sector is respectively educated the course content and the title thereof of type website, such as " English of preparing for the postgraduate qualifying examination spurt class " and content thereof.
Wherein, when all clauses and subclauses of obtaining said all webpages of industry and relevant information, comprise:
Automatically grasp step, be used for grasping automatically all clauses and subclauses and the relevant information of said all webpages of industry.
Wherein, grasping step process automatically can carry out through following steps:
Step m1 searches for the chained address of all websites of the sector automatically, generates and grasps tabulation, and write down each extracting time and seized condition.
Step m2 adopts distributed deployment, according to Regional Distribution, different web sites is carried out periodicity grasp.If website data has change, the notification interface that the application provides can be called to adopt proactive notification mechanism in the website, realizes the real-time update to this website data.Only need visit the Notify Address that the application provides when website data changes, the extracting behavior can be triggered in this address.
Step m3 carries out safety inspection and validity check to grasping the data of coming.Whether safety inspection is submitted to this address of safety check module inspection with the address has wooden horse or virus to exist, and whether includes this address thereby return prompting.Validity check will check whether this address can normally open, and not exist or other mistakes if this address is returned, and this extracting will can not included this address.
Step m4 submits to sorter with data and carries out classification processing, and capture program continues to grasp new data.After this grasps end, the data grabber will restart to grasp, thereby and judge whether file has renewal to determine whether to skip and grasp next bar record.
Replenish the typing step, be used for replenishing the clauses and subclauses and the relevant information of typing one webpage.Can be used for replenishing and grasp the content that step does not grab automatically.
Common web data are unstructured datas, if use the mode of traditional extracting web page then need carry out structured analysis to data, in this process, because the Intelligent Recognition of system is limited in one's ability, will cause losing of partial information.In order to address this problem; The application combines the characteristics of problem, has set up a kind of cooperative mechanism with the data source provider, through formulating and provide the industry data format standard by the application; Fill in data content by the data source provider according to this standard, thereby reached the effect of getting twice the result with half the effort.Can set up format standard through following method: selected needs carry out the industry of vertical search, such as education, recreation; Analyze the data characteristics of the sector, formulate format standard.With the recreation is example, when people mention recreation, can expect game name, type of play, and game profile, relevant informations such as recreation address are carried out induction-arrangement to these information, finally form standard interface.The game interface field that forms is following:
Field |
Explanation of field |
Game name |
Be no more than 50 words |
Game characteristic |
Be no more than 50 words |
Mission Description |
Be no more than 500 words |
Game state |
Test or issue |
Type of play |
Client recreation/web game etc. |
Recreation operator |
To online game |
The recreation start address |
To online game |
The game download address |
The download link address is provided |
How to begin |
Be no more than 500 words |
Operating guidance |
Be no more than 500 words |
Game picture |
The picture chained address is provided |
The registered address |
The recreation registered address is provided |
Different according to the rank of data volume, the application has formulated full dose interface and variable interface, and according to these interface generation corresponding interface data, what the application realized being correlated with calls by the data source provider.The characteristics of full dose structure can disposablely be returned all data, and the variable interface then is each returning part data.The application advises using the data of xml form to transmit.When the data that provide when a mechanism are fewer (size of general whole xml file is in 50Mb), can directly use the full dose data-interface; When data volume is bigger, generally the variable interface need be provided, thereby obtains data in batches.Promptly address is with the situation of change of every data of formal description of id, and the another one address can obtain corresponding data through id.Form is following:
<id?do=’insert’>1000</id>
<id?do=’update’>1001</id>
<id?do=’delete’>1002</id>
<id?do=’insert’>1003</id>
When the value of do is insert, show the newly-increased record that is recorded as of this id correspondence;
When the value of do is update, show the amendment record that is recorded as of this id correspondence;
When do ground value is delete, show the deletion record that is recorded as of this id correspondence.
Because all different from the data qualification information that each data source is obtained, when data exhibiting, for convenient the screening of user's ability, the application need reclassify according to unified standard various data.Promptly get into step 220, through the rule match classification: the morphology matching result of the keyword in each clauses and subclauses and the sector corresponding keyword of all categories is carried out weighted analysis confirm classification under each clauses and subclauses;
And/or, through statistics match classifying method: confirm classification under each clauses and subclauses according to the occurrence number ratio of each keyword of unfiled clauses and subclauses and the similarity of the probability of occurrence of each keyword of the class entry that obtains according to statistics in advance.
For the rule match classification:
For example following title:
" class's 3 years junior middle schools level physical a surname Feng Shi system top-student class in autumn "
Through this title, the classified information that the application can obtain is: information such as autumn class, the junior three, physics.
At first, can find out that three classified informations that the application obtains all have corresponding word in title, obtain this classified information so the application can set up three rules.
Rule one: if contain " class in autumn " in the title then be categorized as " class in autumn ";
Rule two: if contain " junior middle school three grades " in the title then be categorized as " junior three ";
Rule three: if contain " physics " in the title then be categorized as " physics ".
In this way, similar following title classified information can be obtained.
" autumn, class was improved towards high system in class's 3 years junior middle schools level physical east "
" 3 years level physical top-student class of senior middle school of class in autumn "
" autumn, class's junior middle school one grade mathematics improved class "
" autumn, class's three grades English in junior middle school improved class "
The rule match method is based upon on the morphology matching mechanisms, and which classification it belongs to according to the common keyword decision clauses and subclauses that occur in clauses and subclauses and all kinds of keyword, through weighted analysis, finally confirms criteria for classification, and its formula is following:
P
1=x
1r
1+x
2r
2+x
3r
3+...+x
nr
n
P wherein
iBe the classification results that draws through single rule, for certain classification P
iThe highest, then these clauses and subclauses just belong to such; x
iBe the morphology coefficient; r
iBe the morphology matching result.The morphology matching result refers to the number of times that this speech occurs in this coupling.The morphology coefficient refers to the weight that this morphology matching result is stood in all morphology matching results.Weight is high more then should to be worth more near 1, and weight is low more then should to be worth more near 0.The value of this value is through manually being provided with.Then the morphology coefficient is higher relatively such as in title, matching the result, in description or content, matches the result, and then the morphology coefficient is lower.
Table one is the coefficient that coupling English divides time-like to use, and is categorized as example with " a high English Band training seminar " coupling " English ":
The morphology coupling |
The morphology matching result |
The morphology coefficient |
Contain " English " in the title |
1 |
0.8 |
Contain " level Four " in the title |
1 |
0.7 |
Contain " IELTS " in the title |
0 |
0.7 |
Contain " English " in the description |
3 |
0.3 |
Contain " level Four " in the description |
1 |
0.2 |
Table one
Can calculate batch mark of mixing the English classification is: 2.6
Calculate the mark of other classification of coupling then, check that at last the highest then these clauses and subclauses of mark of which classification belong to this classification.
Classify for statistics:
The application at first put in order the associative key of influential classification; Then classified data are added up; Check that these keywords exist or non-existent relation in each clauses and subclauses; Next with these keywords those still non-classified clauses and subclauses are analyzed, checked the relation that these keywords exist in clauses and subclauses, contrast the occurrence number ratio of each keyword of probability and the unfiled clauses and subclauses of classified each keyword at last; If two close, can think that then these unfiled clauses and subclauses belong to such.
According to the analysis of front, the application sets up following formula:
Wherein, c is classification, and G (c) is the classification deviate, and 1 is constant, and effective for guaranteeing the log value, i is a keyword, T
CiBe the keyword of the class entry probability of occurrence that has counted, t
CiFor treating class entry keyword occurrence number ratio.G (c) is more little, explains that similarity is high more, then judges these clauses and subclauses and belongs to the c classification.Wherein, class entry keyword probability of occurrence equals the geometric mean of the keyword occurrence number ratio of all clauses and subclauses; Treat the number of times that number of times/all keywords occur that class entry keyword occurrence number ratio=keyword occurs in clauses and subclauses in clauses and subclauses.
Promptly confirm classification under each clauses and subclauses according to the occurrence number ratio of each keyword of unfiled clauses and subclauses and the similarity of the probability of occurrence of each keyword of the class entry that obtains according to statistics in advance.
Eigenwert for example: English, mathematics, Chinese language, the clauses and subclauses of prior classification are carried out characteristic speech probability of occurrence statistics, obtain like following table two:
Table two
Following the application carries out the eigenwert statistics to four unfiled clauses and subclauses and obtains table three:
Table three
Calculate like table four classification results according to aforementioned formula at last:
Table four
With reference to Fig. 2, be that this statistics classification is along with the continuous increase of statistics and the corresponding diagram of classification quality.Can find out that along with the continuous increase of statistics, the accuracy rate of sample classification method constantly improves, sample size is big more, and its classification accuracy is more near 1, so this statistics sorting technique has sufficient validity.
In practical application,, be the best classification process synoptic diagram of the application with reference to Fig. 3.
For the convenience and the load that lowers system of the computing of system, the morphology matching result that the application at first adopts aforesaid matched rule classification to be about to keyword and the sector in each clauses and subclauses corresponding keyword of all categories carries out weighted analysis and confirms the affiliated classification of each clauses and subclauses; When in threshold time; Can't certain clauses and subclauses be carried out the branch time-like through the matched rule classification, then promptly confirm classification under each clauses and subclauses according to the occurrence number ratio of each keyword of unfiled clauses and subclauses and the similarity of the probability of occurrence of each keyword of the class entry that obtains according to statistics in advance through the statistics classification.
After the classification of every profession and trade is accomplished in to network; When the user uses the application's system; Query word for the user; Search engine can obtain the clauses and subclauses relevant with query word of all categories according to the result of classified each entries match of the sector in said query word and the taxonomy database, and category is given the user with the clauses and subclauses and the correlated information exhibition of each said classification.
Wherein, described query word comprises:
With the keyword of user's input as query word.
Perhaps, one in the suggestion speech that the input speech of the user being selected according to this user returns as query word; Wherein, described suggestion speech obtains with corresponding result's click relation extraction through the input speech of user's input of statistics in advance.
For the suggestion speech, because the user is in when search, the keyword of choosing is very general speech to a great extent, and a lot of meanings are arranged, can corresponding various webpages, and in fact the user possibly only want to look for specific contents.Such as user search " English "; The webpage that can mate this speech has countless webpages; And in fact the user possibly want to search for is " Expert English language training by qualified teachers " perhaps contents such as " English exams ", so for can better the match user demand, the application analyzes the keyword of user's input through intelligence suggestion dictionary; Thereby provide a suggestion speech to let the user select to search for again; Reached the refinement user's request, thereby can understand user view more accurately, for the user provides Search Results more accurately.
Further, can set up intelligence suggestion dictionary through following steps, the input speech of being imported according to the user by the keyword suggestion engine then returns respectively advises speech.
Step n1, the input speech of statistics of user's input concerns with corresponding result's click.Such as user search English, clicked the course of Expert English language training by qualified teachers, the search mobile phone has been clicked the webpage of buying mobile phone.
Step n2 sorts according to the result who adds up, and clicks title to the corresponding hot topic of each input speech and all carries out word segmentation processing.
Step n3 according to word segmentation result, extracts and the relevant click speech of input speech, produces input speech and the corresponding relation of clicking speech.Such as the corresponding Expert English language training by qualified teachers of English, English Band, English is prepared for the postgraduate qualifying examination or the like.The user is in input during keyword, clicks the prompting of speech to the user, and speech screens if the user has selected click, good refinement user search purpose then, thus for the user Search Results is provided more accurately.
With reference to Fig. 4, be to suggestion speech and the continuous six days contrast of Search Results clicking rate of not having the suggestion speech are arranged.As can be seen from the figure, the clicking rate as a result that the suggestion speech is arranged proves that apparently higher than the clicking rate of not having the suggestion speech the application's suggestion speech structural scheme has validity.
In addition, the application can carry out error correction through the intelligent correction engine for the query word of user's input, and such as user's input " test English ", the intelligent correction engine can be corrected as it " English of preparing for the postgraduate qualifying examination ".
In addition, the application comprises when category is given user side with the clauses and subclauses of each said classification and correlated information exhibition: the user behavior inlet that clauses and subclauses are relevant directly offers user side.Such as recreation download is arranged, course has registration etc., when representing, directly these interfaces is offered the user.
Preferably, the application's the process that represents is:
Step q1 gets access to the query word of user search.
Step q2 inserts the intelligent correction engine with query word and inquires about.This engine can carry out the obfuscation participle with the user input query speech, through the natural participle dictionary that has generated, and standard participle dictionary, modes such as phonetic error correction dictionary produce a discernible participle text of search engine.
Step q3, search engine insert the text in advance and retrieve in the compound full-text index through the taxonomy database generation, and Search Results is carried out intelligent sequencing, return to sort through search engine then and optimize good result.Wherein, the intelligent sequencing mode is handled according to the temperature and the degree of correlation of participle text generating.Because in advance the industry data having been carried out format analyzes; Directly obtained the address that the user possibly use in the interface through definition; When the foreground represents, directly the user behavior inlet is offered user's (such as recreation download is immediately arranged, course has registration, audiovisual etc.); Realize that the user clicks download, the webpage that need not get into the other side more just can directly be downloaded.The user clicks registration, need not get into the INTRODUCTION OF THE COURSE STRUCTURE page again and just can directly register, and finally realizes the through page of wanting of a key.
Through said process, can realize the query word to the user is carried out intelligent correction, the query aim with the user is precisely represented, and the through user behavior inlet of a key.
In addition, the application also comprises following safety detection step:
Steps A, for the chained address at said each clauses and subclauses place to be classified that obtains, check whether safety of said chained address through safety inspection engine and wooden horse killing engine, if safety is then classified to said clauses and subclauses.
With reference to Fig. 5, check through safety inspection engine and wooden horse killing engine whether safe process is carried out through following steps in said chained address:
Step P1, for the chained address at said each the clauses and subclauses place to be classified that obtains, the inspection of safety inspection engine is submitted in this chained address, and whether it exists in the safe class storehouse;
Step P2 if exist and safety, then classifies to said clauses and subclauses;
Step P3 if exist but dangerous, then sends warning message, and filters out the related data of this chained address;
Step P4 if do not exist, then through this chained address of wooden horse killing engine inspection, judges that whether this link safety, if safety deposits this chained address in the safe class storehouse, and changes step P1 over to.
And/or, step B, the chained address for the classified clauses and subclauses place in the said taxonomy database constantly travels through each chained address through the safety inspection engine, if dangerous, then the related data of this chained address is deleted from taxonomy database.Each address of traversal that the safe class storehouse of safety inspection engine can not stopped; In case finding has non-safety information in the link, immediately this address is made as dangerously, the notification data engine should link related data and delete simultaneously.
And/or, step C, the chained address at the clauses and subclauses place of clicking for the user, through this chained address of safety inspection engine inspection safety whether, if dangerous, then point out the user and in taxonomy database with the related data deletion of this chained address.Writing down all represents to the user and the safety inspection engine is inserted with these addresses in the chained address clicked; Carry out safety inspection; If it is dangerous to be checked through the address; Point out the user " it is dangerous to be checked through this network address, whether continues visit " immediately, the notification service end just should link the related data deletion simultaneously.
Preferably, the application uses steps A and step B and step C simultaneously, sets up network address safety dynamic chek mechanism.Such as using 360 wooden horse cloud killing engines,, all have been obtained connection investigated and set up the safe class storehouse immediately through uninterrupted circular test and click follow-up investigations technology.Use triple security mechanisms, data are checked.The first heavy security mechanism: data get into when treating class library for treating that class entry checks whether safety of web page address for the first time, and related with the safe class database data, in case pinpoint the problems deleting this record immediately; The second heavy security mechanism: will carry out uninterrupted circular test after data are classified; Triple security mechanisms: when the user clicked the corresponding network address of Search Results, system will submit to this address to judge whether safety of this address to safety inspection mechanism level.
The application also comprises following safety detection step:
Through put on record information and/or website real name Verification System of the chained address icp that comprises said clauses and subclauses said clauses and subclauses are carried out safety detection.
In reality, also possibly exist some normal in form, do not have wooden horse and any other virus but in fact non-honest webpage, such as some fishing webpages.The application can be through said clauses and subclauses place put on record situation such as the basic condition that information can query web, the situation of website owner and compare with actual information and to confirm whether safety of said clauses and subclauses of the ICP of website of chained address; Whether the chained address that also can verify said clauses and subclauses place through the website real name Verification System of closely carrying out the real name authentication safety; If safety then deposits said clauses and subclauses and relevant information thereof in taxonomy database.
Through above-mentioned safety inspection process, can ensure the safety of webpage in user's use.
With reference to Fig. 6, show the structural representation of a kind of intelligent uprightness searching of the application system.
Search engine 310 is used for obtaining the query word that a industry that user side selects is imported; According to the result of classified each entries match of the sector in said query word and the taxonomy database, obtain the clauses and subclauses relevant of all categories, and category is given the user with the clauses and subclauses and the correlated information exhibition of each said classification with query word; Wherein, described clauses and subclauses are or multinomial business tine in the ecommerce webpage.
Wherein, business tine does not contain the webpage of non-main business information such as news, advertisement, question and answer, and does not comprise new line on the webpage, advertisement or the icp information in zones such as upper left or below such as put on record.
Taxonomy database 320 is used to store every profession and trade grouped data.
Data acquisition module 410 is used to obtain all clauses and subclauses and the relevant information of said all webpages of industry.
Sorter 420 is used for morphology matching result with the keyword of each clauses and subclauses and the sector corresponding keyword of all categories and carries out weighted analysis and confirm classification under each clauses and subclauses; And/or, confirm classification under each clauses and subclauses according to the occurrence number ratio of each keyword of unfiled clauses and subclauses and the similarity of the probability of occurrence of each keyword of the class entry that obtains according to statistics in advance.
Further also comprise: safety check module; Be used for chained address for said each the clauses and subclauses place to be classified that obtains; Check whether safety of said chained address through safety inspection engine and wooden horse killing engine, if safety is then classified to said clauses and subclauses;
And/or the chained address for the classified clauses and subclauses place in the said taxonomy database constantly travels through each chained address through the safety inspection engine, if dangerous, then the related data of this chained address is deleted from taxonomy database;
And/or, the chained address at the clauses and subclauses place of clicking for the user, through this chained address of safety inspection engine inspection safety whether, if dangerous, then point out the user and in taxonomy database with the related data deletion of this chained address.
Further, also comprise: the keyword suggestion engine is used for returning the suggestion speech according to user's input speech; Described suggestion speech obtains with corresponding result's click relation extraction through the input speech of user's input of statistics in advance.
Further, also comprise: the intelligent correction engine is used for carrying out error correction for the query word of user error input.Wherein the intelligent correction engine can be included in the search engine.
Further, described data acquisition module comprises:
The data grabber is used for grasping automatically all clauses and subclauses and the relevant information of said all webpages of industry;
Replenish the typing module, be used for replenishing the clauses and subclauses and the relevant information of typing one webpage.
Further, also comprise: interface provides module, is used for that the user behavior inlet that clauses and subclauses are relevant directly offers user side when category is given user side with the clauses and subclauses of each said classification and correlated information exhibition.
With reference to Fig. 7, show the structural representation of a kind of intelligent uprightness searching optimum system choosing of the application.
The user is through the relevant input speech of browser input; The keyword suggestion engine can return the suggestion speech relevant with importing speech according to wherein intelligence suggestion dictionary, and the user can select suggestion speech that oneself input speech or system give as query word; After the user has confirmed query word; Search engine returns to user's result for retrieval relevant with query word according to the class entry in the query word combining classification database and relevant information to be showed; Category is given the user with the clauses and subclauses and the correlated information exhibition of each said classification when wherein showing, also can directly represent to the user by the user behavior interface that clauses and subclauses are relevant simultaneously; Wherein search engine also can carry out intelligent correction to user's query word through its intelligent correction engine.
Wherein, in network, grasp the web data through the data grabber, also can be through replenishing the data that typing module supplementary data grabber does not grab; Said data are transferred to the classification that sorter carries out various clauses and subclauses and relevant information thereof; Sorter can deposit classified data in the taxonomy database in then.
At this simultaneously; Safety check module can be carried out the first heavily safety inspection when data are obtained; Promptly in the chained address process of data grabber extracting data, carry out the first heavily safety inspection in the chained address process of the additional typing module supplementary data in family, the chained address of safety just can be put into sorter and classify; In taxonomy database, carry out the second heavily safety inspection, promptly safety check module constantly travels through the chained address of the data of inspection in the safety database, checks its whether safety, and the data of safety just can keep; When the user clicks the chained address of certain clauses and subclauses through browser; The safety inspection engine can carry out the triple safe inspection; The webpage of promptly user being clicked carries out the actual time safety inspection; For unsafe chained address, the safety inspection engine can be pointed out user's potential risk, and reporting system is with the related data deletion of this chained address.
The application has reduced the risk that the user poisons through search through setting up network address safety dynamic chek mechanism.Grasp the typing management platform through setting up, solved the single problem of Data Source, realized obtaining by all kinds of means data, make data more comprehensively with abundant.Through setting up automatic classification system, realized the automatic classification and the screening function of data.Through setting up keyword intelligence suggestion model, refinement the user search demand, realized offering user's Search Results more accurately.Through setting up the industry Data Mining Mechanism, realize that a key of general utility functions is through.Reached and let user security search, the user does not worry harmful network address among the result; Cover, station search just can be obtained whole industry information comprehensively; Precisely the result returns the real valuable results to the user, rather than lets the user in the magnanimity result, oneself select; Quick through: as to the user immediate access to be provided, to select and need not get into the page again.
For system embodiment, because it is similar basically with method embodiment, so description is fairly simple, relevant part gets final product referring to the part explanation of method embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed all is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.
More than to a kind of vertical intelligent uprightness searching method and system that the application provided; Carried out detailed introduction; Used concrete example among this paper the application's principle and embodiment are set forth, the explanation of above embodiment just is used to help to understand the application's method and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to the application's thought, the part that on embodiment and range of application, all can change, in sum, this description should not be construed as the restriction to the application.