CN107544980A - A kind of method and device for searching webpage - Google Patents

A kind of method and device for searching webpage Download PDF

Info

Publication number
CN107544980A
CN107544980A CN201610474660.2A CN201610474660A CN107544980A CN 107544980 A CN107544980 A CN 107544980A CN 201610474660 A CN201610474660 A CN 201610474660A CN 107544980 A CN107544980 A CN 107544980A
Authority
CN
China
Prior art keywords
vector
webpage
mark
similarity
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610474660.2A
Other languages
Chinese (zh)
Other versions
CN107544980B (en
Inventor
王天祎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201610474660.2A priority Critical patent/CN107544980B/en
Publication of CN107544980A publication Critical patent/CN107544980A/en
Application granted granted Critical
Publication of CN107544980B publication Critical patent/CN107544980B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiments of the invention provide a kind of method and device for searching webpage, the mark of the first webpage of acquisition;Primary vector corresponding with the mark of first webpage is searched from default page vector database;At least one secondary vector for meeting default similarity with the primary vector is searched from the default page vector database;Export the mark of the second webpage corresponding at least one secondary vector described in the default page vector database.The mark and the corresponding relation of vector of webpage are stored with default page vector database, similarity between vector is used for the Page resemblance for indicating webpage, utilize the similarity between vector in default page vector database, at least one second webpage for meeting preset requirement with the Page resemblance of the first webpage can be quickly found, so as to improve the method accuracy rate for searching similar web page.

Description

A kind of method and device for searching webpage
Technical field
The present invention relates to Internet technical field, more particularly to a kind of method and device for searching webpage.
Background technology
With the development of internet, there is substantial amounts of user to access various websites all the time.Understand in website The Page resemblance of each webpage, for having very important significance due to website, the Page resemblance refers to the content of webpage Similarity degree.
The webpage B similar to webpage A is searched, traditional method, is the content by manually observing webpage, according to observed Webpage content between contact, to determine the Page resemblance of two webpages, and then find the webpage similar to webpage A B.The method workload of this lookup similar web page is big.
The content of the invention
Present invention solves the technical problem that being to provide a kind of method and device for searching webpage, searched so as to improve The method accuracy rate of similar web page.
Therefore, the technical scheme that the present invention solves technical problem is:
First aspect present invention provides a kind of method for searching webpage, and methods described includes:
Obtain the mark of the first webpage;
Primary vector corresponding with the mark of first webpage is searched from default page vector database, it is described Be stored with default page vector database webpage mark with vector corresponding relation, vector between similarity be used for Indicate the Page resemblance of webpage;
Searched from the default page vector database and meet default similarity at least with the primary vector One secondary vector;
Export the second webpage corresponding at least one secondary vector described in the default page vector database Mark.
In the first possible embodiment of first aspect present invention, the default page vector database is adopted Set with following methods:
The browse path of user is obtained, the browse path includes the mark for the webpage that the user browses and the net Page browses order;
Using the mark of the webpage as the word in term vector model, using the term vector model to the browse path It is trained, obtains the vector of the webpage;
The mark of the webpage and the vectorial corresponding relation of the webpage are established as the default page vector Database.
With reference to the possible embodiment of the first of first aspect present invention or first aspect, at second of first aspect In possible embodiment, methods described also includes:
Export at least one secondary vector;
According at least one secondary vector and the primary vector obtain at least one second webpage with it is described The similarity of first webpage.
With reference to second of possible embodiment of first aspect present invention, in the third possible implementation of first aspect In mode, it is described from the default page vector database search with the primary vector meet default similarity to A few secondary vector includes:
Searched and the primary vector similarity highest and/or similarity from the default page vector database Minimum at least one secondary vector.
In the 4th kind of possible embodiment of first aspect present invention, methods described also includes:
Obtain the mark of the 3rd webpage;
The 3rd vector corresponding with the mark of the 3rd webpage is searched from default page vector database;
First webpage and the 3rd webpage are obtained according to the similarity of the primary vector and the described 3rd vector Similarity.
Second aspect of the present invention provides a kind of device for searching webpage, and described device includes:
First acquisition unit, for obtaining the mark of the first webpage;
First searching unit, for searching the mark pair with first webpage from default page vector database The primary vector answered, is stored with the mark and the corresponding relation of vector of webpage in the page vector database, between vector Similarity be used to indicate the Page resemblance of webpage;
Second searching unit, meet for being searched from the default page vector database with the primary vector At least one secondary vector of default similarity;
First output unit, for exporting at least one secondary vector described in the default page vector database The mark of corresponding second webpage.
In the first possible embodiment of second aspect of the present invention, described device also includes:
Second acquisition unit, for obtaining the browse path of user, the browse path includes the net that the user browses The mark of page and browsing sequentially for the webpage;
Training unit, for using the mark of the webpage as the word in term vector model, utilizing the term vector model The browse path is trained, obtains the vector of the webpage;
Unit is established, for establishing the mark of the webpage with the vectorial corresponding relation of the webpage as described default Page vector database.
With reference to the possible embodiment of the first of second aspect of the present invention or second aspect, at second of second aspect In possible embodiment, described device also includes:
Second output unit, for exporting at least one secondary vector;
First obtains unit, at least one according at least one secondary vector and primary vector acquisition The similarity of individual second webpage and first webpage.
With reference to second of possible embodiment of second aspect of the present invention, in the third possible implementation of second aspect In mode,
Second searching unit, for being searched and the primary vector from the default page vector database At least one secondary vector that similarity is nearest and/or similarity is farthest.
In the 4th kind of possible embodiment of second aspect of the present invention, described device also includes:
3rd acquiring unit, for obtaining the mark of the 3rd webpage;
3rd searching unit, for searching the mark pair with the 3rd webpage from default page vector database The 3rd vector answered;
Second obtaining unit, for obtaining first net according to the similarity of the primary vector and the described 3rd vector The similarity of page and the 3rd webpage.
According to the above-mentioned technical solution, the method have the advantages that:
The embodiments of the invention provide a kind of method for searching webpage, the mark of the first webpage of acquisition;From the default page Primary vector corresponding with the mark of first webpage is searched in vectorization database, is deposited in the page vector database Contain the mark and the corresponding relation of vector of webpage, the similarity between vector is used to indicate the Page resemblance of webpage;From institute State at least one secondary vector searched in default page vector database and meet default similarity with the primary vector; Export the mark of the second webpage corresponding at least one secondary vector described in the default page vector database.It is default Page vector database in be stored with webpage mark with vector corresponding relation, vector between similarity be used for indicate The Page resemblance of webpage, using the similarity between vector in default page vector database, can quickly it find Meet at least one second webpage of preset requirement with the Page resemblance of the first webpage, so as to improve the side for searching similar web page Method accuracy rate.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the embodiment of the present invention.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 shows term vector similarity example schematic provided in an embodiment of the present invention;
Fig. 2 shows a kind of method flow diagram for searching webpage provided in an embodiment of the present invention;
Fig. 3 shows a kind of apparatus structure schematic diagram for searching webpage provided in an embodiment of the present invention.
Embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is only The embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ordinary skill people The every other embodiment that member is obtained under the premise of creative work is not made, it should all belong to the model of the application protection Enclose.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, " Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so use Data can exchange in the appropriate case, so as to embodiments herein described herein can with except illustrating herein or Order beyond those of description is implemented.In addition, term " comprising " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, be not necessarily limited to for example, containing the process of series of steps or unit, method, system, product or equipment Those steps or unit clearly listed, but may include not list clearly or for these processes, method, product Or the intrinsic other steps of equipment or unit.
According to the embodiment of the present application, there is provided it is a kind of search webpage method embodiment of the method, it is necessary to explanation, The step of flow of accompanying drawing illustrates can perform in the computer system of such as one group computer executable instructions, also, , in some cases, can be with different from shown in order execution herein although showing logical order in flow charts The step of going out or describing.
In order to provide the implementation for improving the method accuracy rate for searching similar web page, the embodiments of the invention provide one kind The method and device of webpage is searched, the preferred embodiments of the present invention are illustrated below in conjunction with Figure of description.
Term vector (Word Vector) is a kind of learning model of serializing, is widely used in natural language processing Fields such as (Nature Language Processing).Sentence is using word as least unit, by word according to certain order arrangement group Into sentence.The sentence in corpus (by the molecular text of many sentences) is trained using term vector model, can be with By in corpus each word sequence be melted into a vectorial Ω being made up of some dimension real number values, vector between it is similar Degree characterizes the similitude between word and word.Similarity is high between vector, then the distance between vector is near;Similarity between vector Low, then the distance between vector is remote.Wherein, the vectorial Ω of each word characterizes the position that the word occurs in numerous sentence elements and closed System.
Illustrate:In different sentences, word " Apple " always always have with word " iPhone " it is similar up and down Text, such as:Word A → word B → Apple → word C → word D, word A → word B → iPhone → word C → word D.Then via term vector mould After type training, the vectorial Ω of word " Apple "1With the vectorial Ω of word " iPhone "2Closely located, i.e., vectorial Ω1And vector Ω2Similarity it is high, then characterize word " Apple " and word " iPhone " similarity height.
Term vector model can be literal difference, but semantic same or analogous word is mapped as the high vectorial Ω of similarity. Because the position relationship in corpus in numerous sentences between each word, determines the similarity degree between these words.Again For example, as shown in figure 1, dog vector sum puppy vector distance is close, dog vector sum puppy's is vectorial similar Degree is high, and declarer dog and word puppy similarities are high, frequently appear in similar linguistic context, cat vector sum kitten vector Apart from close, cat vector sum kitten vector similarity height, declarer cat and word kitten similarity are high.Dog's Vector sum cat vector distance is far, and dog vector sum cat vector similarity is low, declarer dog and word cat similarity It is low.It is understood that puppy is little dog, often appeared in dog in the similar sentence of context.And cat and Dog is two kinds of different animals, and word cat and word dog are frequently appeared in the sentence that context differs greatly.
Based on the theory of above-mentioned term vector model, in the embodiment of the present invention, using the mark of webpage as term vector model in Word, be trained using term vector model, each page sequence in the browse path of user can be melted into a real number value Vectorial Ω, establish the mark and the vectorial corresponding relation of the webpage of webpage.
Fig. 2 shows a kind of method flow diagram for searching webpage provided in an embodiment of the present invention, and methods described includes:
201:Obtain the mark of the first webpage.
202:Primary vector corresponding with the mark of first webpage is searched from default page vector database, Be stored with the default page vector database webpage mark with vector corresponding relation, vector between similarity For indicating the Page resemblance of webpage.
When user requires to look up at least one second webpage with the default similarity of the first webpage satisfaction, the first webpage is inputted Mark, the mark of first webpage can be URL (the Uniform Resource of the first webpage Locator, URL) address or the first webpage self-defined title etc., the mark of first webpage can with unique mark this first Webpage.
In one example, the default page vector database is set using following methods:
The browse path of user is obtained, the browse path includes the mark for the webpage that the user browses and the net Page browses order;
Using the mark of the webpage as the word in term vector model, using the term vector model to the browse path It is trained, obtains the vector of the webpage;
The mark of the webpage and the vectorial corresponding relation of the webpage are established as the default page vector Database.
All the time substantial amounts of user browses various webpages, generally, each user can browse multiple nets Page.On the basis of a session (session), by multiple webpages that a user browses in a session according to browsing Sequencing arrangement, obtain browse path of the user in a session.Each browse path is that a user exists The mark of the web pages arranged in one session according to the sequencing browsed.Substantial amounts of user is obtained at different moments Browse the substantial amounts of browse path that webpage is formed.
For example, the browse path obtained is as follows:
Webpage A1 mark → webpage A2 mark → webpage A3 mark → webpage A4 mark → webpage A5 mark;
Webpage A1 mark → webpage A2 mark → webpage A6 mark → webpage A4 mark → webpage A5 mark;
Webpage B1 mark → webpage B2 mark → webpage B3 mark → webpage B4 mark → webpage B5 mark → webpage B6 mark → webpage B7 mark;
Webpage C1 mark → webpage C2 mark → webpage C3 mark
……
Using the mark of the webpage in browse path as the word in term vector model, using term vector model to substantial amounts of clear Looking at path is trained, and the mark sequence of each webpage in browse path is melted into a real number value vector, i.e., each webpage Mark correspond to a vector.The vectorial similarity of webpage is used to characterize the similarity between webpage.Term vector model is instructed Sentence after white silk, the vector distance of word is nearer, and the vector similarity of word is higher, and the semanteme between word is more similar.Similarly, from word to To measure in the default page vector database obtained by model training browse path, the high webpage similarity of vector similarity is high, The low webpage similarity of vector similarity is low.
Training method according to term vector model to word in sentence, if for example, in the presence of substantial amounts of browse path:
Webpage A1 mark → webpage A2 mark → webpage A3 mark → webpage A4 mark → webpage A5 mark;
Webpage A1 mark → webpage A2 mark → webpage A6 mark → webpage A4 mark → webpage A5 mark.
Then the distance between vector vector corresponding with webpage A6 mark is relatively near corresponding to webpage A3 mark, webpage A3 Mark corresponding to corresponding with the webpage A6 mark vector similarity of vector it is high, characterize webpage A3 and webpage A6 similarity It is high.
After term vector model is trained to substantial amounts of browse path, obtain in browse path corresponding to the mark of each webpage to Amount.Page vector database is previously generated, stores the corresponding relation between the mark of webpage and vector.Such as:Webpage A1 mark Know the corresponding relation with vectorial A1, webpage A2 mark and vectorial A2 corresponding relation, webpage A3 mark and vectorial A3 pair It should be related to, webpage A4 mark and vectorial A4 corresponding relation, webpage A5 mark and vectorial A5 corresponding relation, webpage A6's Mark and vectorial A6 corresponding relation, etc..
Explanation is needed exist for, default page vector database can be edited according to being actually needed into edlin Including deleting, modification and it is newly-increased in any one or more.
In one example, every preset time period, the browse path of substantial amounts of user during obtaining preset time period, with The mark of webpage is as the word in term vector model in browse path, the browse path using term vector model to substantial amounts of user Train again, obtain vector corresponding to the mark of each webpage in browse path, utilize each webpage obtained by new training Vector updates the default page vector database corresponding to mark.Illustrate:If default page vector data Former vector corresponding to the mark of the webpage stored in storehouse is vectorial different from obtained by new training, then utilizes new training gained The vector arrived replaces former vector.If vector corresponding to the mark of a webpage is not stored in default vectorization database, Increase vector corresponding to the mark of the webpage in default vectorization database newly.
Wherein, preset time period can be according to being actually needed specific setting.Such as:Three months or half a year etc..
In one example, if being searched from default page vector data less than corresponding to the mark of the first webpage the One vector, then obtain the browse path where the first webpage, together with the browse path of the substantial amounts of user previously obtained, forces Browse path is trained using term vector model, primary vector corresponding to the first webpage is obtained, by the mark of the first webpage Increased newly with the corresponding relation of primary vector in default page vector database.
Above-mentioned two example, one is according to preset time period, and browse path is instructed using term vector model automatically Practice, one is that basis is actually needed, and pressure is trained using term vector model to browse path, using in above-mentioned two example Described mode, continuous refresh page is towards quantized data storehouse.
203:Searched from the default page vector database and meet default similarity with the primary vector At least one secondary vector.
At least one secondary vector for meeting default similarity with primary vector is searched, as searches and meets with the first webpage At least one second webpage of default similarity.
In one example, searched from the default page vector database with the primary vector similarity most High at least one secondary vector.
As search and the first at least one second page of webpage similarity highest.In an example, first is calculated Every other vectorial similarity in vectorial and default page vector database, is obtained and the primary vector similarity At least one secondary vector of highest.According to the number of the second webpage to be searched, secondary vector is searched.If that is, search with First one the second webpage of webpage similarity highest, then search one and primary vector similarity highest secondary vector;If Search with the first multiple second webpages of webpage similarity highest, then search with primary vector similarity highest multiple second to Amount, the number of the secondary vector of lookup are identical with the number of the second webpage to be searched.
In another example, searched and the primary vector similarity from the default page vector database Minimum at least one secondary vector.
As search and the minimum at least one second page of the first webpage similarity.In an example, first is calculated Every other vectorial similarity in vectorial and default page vector database, is obtained and the primary vector similarity Minimum at least one secondary vector.According to the number of the second webpage to be searched, secondary vector is searched.If that is, search with Minimum second webpage of first webpage similarity, then search a secondary vector minimum with primary vector similarity;If Search with minimum multiple second webpages of the first webpage similarity, then search with primary vector similarity it is minimum multiple second to Amount, the number of the secondary vector of lookup are identical with the number of the second webpage to be searched.
In another example, searched and the primary vector similarity from the default page vector database Meet at least one secondary vector of pre-set interval.
As search at least one second page for meeting pre-set interval with the first webpage similarity.In an example, Calculate vectorial similarity every other in primary vector and default page vector database, obtain with described first to Amount similarity meets at least one secondary vector of pre-set interval.According to the number of the second webpage to be searched, search and the One vector similarity meets multiple secondary vectors of pre-set interval, the number of the secondary vector of lookup and the second net to be searched The number of page is identical.Such as:Search it is N number of with second webpage of the similarity of the first webpage more than 50%, then search it is N number of with first to Secondary vector of the similarity of amount more than 50%.
Wherein, the similarity between vector is calculated, there are a variety of possible implementations, such as:Calculate the phase between vector Like degree, the Euclidean distance between vector can be calculated, the similarity between vector is weighed with Euclidean distance.Euclidean distance is shorter, Similarity between vector is higher;Euclidean distance is bigger, and the similarity between vector is lower.Such as:Calculate the phase between vector Like degree, the cosine similarity between vector can be calculated.
204:Export the second net corresponding at least one secondary vector described in the default page vector database The mark of page.
Find after meeting at least one secondary vector of default similarity with primary vector, from default page vector The mark of the second webpage corresponding to each secondary vector is searched in database, the of default similarity is even met with primary vector Two vectors only one, then export the mark of second webpage corresponding to the secondary vector;If meet with primary vector default The secondary vector of similarity has multiple, then exports the mark of the second webpage corresponding to each secondary vector.Second net of output Page is the webpage for meeting default similarity with the first webpage.
In one example, in addition to:
Export at least one secondary vector;
According at least one secondary vector and the primary vector obtain at least one second webpage with it is described The similarity of first webpage.
At least one secondary vector found can also be exported, it is similar to primary vector to calculate each secondary vector Degree, as the similarity of the second webpage and the first webpage corresponding to the secondary vector.Can so quantify the first webpage with least The similarity of one the second webpage.Primary vector calculates the method reference foregoing description of similarity with least one secondary vector, Here repeat no more.
In one example, in addition to:
Obtain the mark of the 3rd webpage;
The 3rd vector corresponding with the mark of the 3rd webpage is searched from default page vector database;
First webpage and the 3rd webpage are obtained according to the similarity of the primary vector and the described 3rd vector Similarity.
In the embodiment of the present invention, at least one other net for meeting default similarity with a webpage can be not only searched Page.The mark of two webpages can also be inputted, vector corresponding to two banners difference of lookup, calculates found two The similarity of vector, quantify the similarity of two webpages.
Illustrate:
Input webpage X mark and webpage Y mark;Search webpage X's from default page vector database Vectorial X corresponding to mark, search vectorial Y corresponding to webpage Y mark;Vectorial X and vectorial Y similarity is calculated as webpage X With webpage Y similarity.
Fig. 3 shows the apparatus structure schematic diagram of lookup webpage provided in an embodiment of the present invention, including:
First acquisition unit 301, for obtaining the mark of the first webpage.
First searching unit 302, for searching the mark with first webpage from default page vector database Primary vector corresponding to knowledge, the mark that webpage is stored with the default page vector database corresponding with vector are closed It is that the similarity between vector is used for the Page resemblance for indicating webpage.
Second searching unit 303, for being searched and the primary vector from the default page vector database Meet at least one secondary vector of default similarity.
Second searching unit, for being searched and the primary vector from the default page vector database At least one secondary vector that similarity is nearest and/or similarity is farthest.
First output unit 304, for exporting at least one second described in the default page vector database The mark of second webpage corresponding to vector.
In one example, described device also includes:
Second acquisition unit, for obtaining the browse path of user, the browse path includes the net that the user browses The mark of page and browsing sequentially for the webpage;
Training unit, for using the mark of the webpage as the word in term vector model, utilizing the term vector model The browse path is trained, obtains the vector of the webpage;
Unit is established, for establishing the mark of the webpage with the vectorial corresponding relation of the webpage as described default Page vector database.
In one example, described device also includes:
Second output unit, for exporting at least one secondary vector;
First obtains unit, at least one according at least one secondary vector and primary vector acquisition The similarity of individual second webpage and first webpage.
In one example, described device also includes:
3rd acquiring unit, for obtaining the mark of the 3rd webpage;
3rd searching unit, for searching the mark pair with the 3rd webpage from default page vector database The 3rd vector answered;
Second obtaining unit, for obtaining first net according to the similarity of the primary vector and the described 3rd vector The similarity of page and the 3rd webpage.
The device of lookup webpage shown in Fig. 3 is the device corresponding to the method for the lookup webpage shown in Fig. 2, specific real Existing mode is similar with the method for the lookup webpage shown in Fig. 2, the description in method with reference to shown in figure 2, repeats no more here.
The device for searching webpage includes processor and memory, above-mentioned first acquisition unit, the first searching unit, the Two searching units and the first output unit etc. in memory, storage are stored in by computing device as program unit storage Said procedure unit in device realizes corresponding function.
Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one Or more, meet at least the one of preset requirement with the Page resemblance of the first webpage by adjusting kernel parameter come quick find Individual second webpage, so as to improve the method accuracy rate for searching similar web page.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/ Or the form such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM), memory includes at least one deposit Store up chip.
Present invention also provides a kind of computer program product, when being performed on data processing equipment, is adapted for carrying out just The program code of beginningization there are as below methods step:
Obtain the mark of the first webpage;
Primary vector corresponding with the mark of first webpage is searched from default page vector database, it is described Be stored with default page vector database webpage mark with vector corresponding relation, vector between similarity be used for Indicate the Page resemblance of webpage;
Searched from the default page vector database and meet default similarity at least with the primary vector One secondary vector;
Export the second webpage corresponding at least one secondary vector described in the default page vector database Mark.
In above computer program product, the journey of default page vector database step with the following method Sequence code is set:
The browse path of user is obtained, the browse path includes the mark for the webpage that the user browses and the net Page browses order;
Using the mark of the webpage as the word in term vector model, using the term vector model to the browse path It is trained, obtains the vector of the webpage;
The mark of the webpage and the vectorial corresponding relation of the webpage are established as the default page vector Database.
In above computer program product, include the program code of following method and step:
Export at least one secondary vector;
According at least one secondary vector and the primary vector obtain at least one second webpage with it is described The similarity of first webpage.
In above computer program product, the program code of methods described step is from the default page vector number At least one secondary vector for meeting default similarity with the primary vector according to being searched in storehouse includes:
Searched and the primary vector similarity highest and/or similarity from the default page vector database Minimum at least one secondary vector.
In above computer program product, include the program code of following method and step:
Obtain the mark of the 3rd webpage;
The 3rd vector corresponding with the mark of the 3rd webpage is searched from default page vector database;
First webpage and the 3rd webpage are obtained according to the similarity of the primary vector and the described 3rd vector Similarity.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, the application can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the application can use the computer for wherein including computer usable program code in one or more The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The application is with reference to the flow according to the method for the embodiment of the present application, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/ Or the form such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Memory is computer-readable Jie The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
Embodiments herein is these are only, is not limited to the application.To those skilled in the art, The application can have various modifications and variations.All any modifications made within spirit herein and principle, equivalent substitution, Improve etc., it should be included within the scope of claims hereof.

Claims (10)

  1. A kind of 1. method for searching webpage, it is characterised in that methods described includes:
    Obtain the mark of the first webpage;
    Primary vector corresponding with the mark of first webpage is searched from default page vector database, it is described default Page vector database in be stored with webpage mark with vector corresponding relation, vector between similarity be used for indicate The Page resemblance of webpage;
    Searched from the default page vector database and meet at least one of default similarity with the primary vector Secondary vector;
    Export the mark of the second webpage corresponding at least one secondary vector described in the default page vector database.
  2. 2. according to the method for claim 1, it is characterised in that the default page vector database uses following sides Method is set:
    Obtain the browse path of user, the browse path includes the mark of webpage that the user browses and the webpage Browse order;
    Using the mark of the webpage as the word in term vector model, the browse path is carried out using the term vector model Training, obtain the vector of the webpage;
    The mark of the webpage and the vectorial corresponding relation of the webpage are established as the default page vector data Storehouse.
  3. 3. method according to claim 1 or 2, it is characterised in that methods described also includes:
    Export at least one secondary vector;
    At least one second webpage and described first is obtained according at least one secondary vector and the primary vector The similarity of webpage.
  4. 4. according to the method for claim 3, it is characterised in that described to be looked into from the default page vector database At least one secondary vector for meeting default similarity with the primary vector is looked for include:
    Searched from the default page vector database minimum with the primary vector similarity highest and/or similarity At least one secondary vector.
  5. 5. according to the method for claim 1, it is characterised in that methods described also includes:
    Obtain the mark of the 3rd webpage;
    The 3rd vector corresponding with the mark of the 3rd webpage is searched from default page vector database;
    The phase of first webpage and the 3rd webpage is obtained according to the similarity of the primary vector and the described 3rd vector Like degree.
  6. 6. a kind of device for searching webpage, it is characterised in that described device includes:
    First acquisition unit, for obtaining the mark of the first webpage;
    First searching unit, it is corresponding with the mark of first webpage for being searched from default page vector database Primary vector, is stored with the mark and the corresponding relation of vector of webpage in the page vector database, the phase between vector It is used for the Page resemblance for indicating webpage like degree;
    Second searching unit, for searched from the default page vector database meet with the primary vector it is default At least one secondary vector of similarity;
    First output unit, it is corresponding for exporting at least one secondary vector described in the default page vector database The second webpage mark.
  7. 7. device according to claim 6, it is characterised in that described device also includes:
    Second acquisition unit, for obtaining the browse path of user, the browse path includes the webpage that the user browses Mark and the webpage browse order;
    Training unit, for identifying as the word in term vector model using the webpage, using the term vector model to institute State browse path to be trained, obtain the vector of the webpage;
    Unit is established, for establishing the mark of the webpage and the vectorial corresponding relation of the webpage as the default page Towards quantized data storehouse.
  8. 8. the device according to claim 6 or 7, it is characterised in that described device also includes:
    Second output unit, for exporting at least one secondary vector;
    First obtains unit, for obtaining described at least one the according at least one secondary vector and the primary vector The similarity of two webpages and first webpage.
  9. 9. device according to claim 7, it is characterised in that
    Second searching unit, it is similar to the primary vector for being searched from the default page vector database At least one secondary vector that degree is nearest and/or similarity is farthest.
  10. 10. device according to claim 6, it is characterised in that described device also includes:
    3rd acquiring unit, for obtaining the mark of the 3rd webpage;
    3rd searching unit, it is corresponding with the mark of the 3rd webpage for being searched from default page vector database 3rd vector;
    Second obtaining unit, for according to the similarity of the primary vector and the described 3rd vector obtain first webpage with The similarity of 3rd webpage.
CN201610474660.2A 2016-06-24 2016-06-24 Method and device for searching webpage Active CN107544980B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610474660.2A CN107544980B (en) 2016-06-24 2016-06-24 Method and device for searching webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610474660.2A CN107544980B (en) 2016-06-24 2016-06-24 Method and device for searching webpage

Publications (2)

Publication Number Publication Date
CN107544980A true CN107544980A (en) 2018-01-05
CN107544980B CN107544980B (en) 2020-07-24

Family

ID=60959879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610474660.2A Active CN107544980B (en) 2016-06-24 2016-06-24 Method and device for searching webpage

Country Status (1)

Country Link
CN (1) CN107544980B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656117A (en) * 2021-06-30 2021-11-16 中国银行股份有限公司 Operation page recommendation method and device of multimedia equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100023630A (en) * 2008-08-22 2010-03-04 고려대학교 산학협력단 Method and system of classifying web page using categogory tag information and recording medium using by the same
CN101727500A (en) * 2010-01-15 2010-06-09 清华大学 Text classification method of Chinese web page based on steam clustering
CN103324645A (en) * 2012-03-23 2013-09-25 腾讯科技(深圳)有限公司 Method and device for recommending webpage
CN103810264A (en) * 2014-01-27 2014-05-21 西安理工大学 Webpage text classification method based on feature selection
CN104050203A (en) * 2013-03-17 2014-09-17 祁勇 Method for acquiring personalized characteristics of webpages and users
CN104090890A (en) * 2013-12-12 2014-10-08 深圳市腾讯计算机系统有限公司 Method, device and server for obtaining similarity of key words

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100023630A (en) * 2008-08-22 2010-03-04 고려대학교 산학협력단 Method and system of classifying web page using categogory tag information and recording medium using by the same
CN101727500A (en) * 2010-01-15 2010-06-09 清华大学 Text classification method of Chinese web page based on steam clustering
CN103324645A (en) * 2012-03-23 2013-09-25 腾讯科技(深圳)有限公司 Method and device for recommending webpage
CN104050203A (en) * 2013-03-17 2014-09-17 祁勇 Method for acquiring personalized characteristics of webpages and users
CN104090890A (en) * 2013-12-12 2014-10-08 深圳市腾讯计算机系统有限公司 Method, device and server for obtaining similarity of key words
CN103810264A (en) * 2014-01-27 2014-05-21 西安理工大学 Webpage text classification method based on feature selection

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656117A (en) * 2021-06-30 2021-11-16 中国银行股份有限公司 Operation page recommendation method and device of multimedia equipment

Also Published As

Publication number Publication date
CN107544980B (en) 2020-07-24

Similar Documents

Publication Publication Date Title
CN109492229B (en) Cross-domain emotion classification method and related device
CN104933164B (en) In internet mass data name entity between relationship extracting method and its system
CN105243087B (en) IT syndication Personality of readingization recommends method
CN107704503A (en) User's keyword extracting device, method and computer-readable recording medium
CN108255862B (en) A kind of search method and device of judgement document
CN107491534A (en) Information processing method and device
CN107220386A (en) Information-pushing method and device
CN108255857A (en) A kind of sentence detection method and device
CN106970912A (en) Chinese sentence similarity calculating method, computing device and computer-readable storage medium
CN107590219A (en) Webpage personage subject correlation message extracting method
CN104021185B (en) The method and apparatus is identified by the information attribute of data in webpage
CN104331438B (en) To novel web page contents selectivity abstracting method and device
CN104615768B (en) Same recognition methods of document and device
CN112948575B (en) Text data processing method, apparatus and computer readable storage medium
CN109344406A (en) Part-of-speech tagging method, apparatus and electronic equipment
CN108255999A (en) Content recommendation method and device
CN108062342A (en) The recommendation method and device of application program
CN107832338A (en) A kind of method and system for identifying core product word
CN109597983A (en) A kind of spelling error correction method and device
CN110851609A (en) Representation learning method and device
WO2020063524A1 (en) Method and system for determining legal instrument
JP2018511115A5 (en)
CN115391570A (en) Method and device for constructing emotion knowledge graph based on aspects
CN108090098A (en) A kind of text handling method and device
CN110363206A (en) Cluster, data processing and the data identification method of data object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100080 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant