CN110162356A - Fusion method, device, storage medium and the electronic device of the page - Google Patents
Fusion method, device, storage medium and the electronic device of the page Download PDFInfo
- Publication number
- CN110162356A CN110162356A CN201810456491.9A CN201810456491A CN110162356A CN 110162356 A CN110162356 A CN 110162356A CN 201810456491 A CN201810456491 A CN 201810456491A CN 110162356 A CN110162356 A CN 110162356A
- Authority
- CN
- China
- Prior art keywords
- keyword
- page
- similarity
- target
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/451—Execution arrangements for user interfaces
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of fusion method of page, device, storage medium and electronic devices.Wherein, this method comprises: extracting the first keyword from first page to be fused, and the second keyword is extracted from second page to be fused;The first object keyword that the first weight meets first object condition is extracted from the first keyword, and extracts from the second keyword the second target keyword that the second weight meets first object condition;The target pages similarity of first page and second page is determined according to first object keyword and the second target keyword;In the case where target pages similarity meets the second goal condition, first page and second page are merged.The present invention solves the lower technical problem of fusion efficiencies when merging in the related technology to the page.
Description
Technical field
The present invention relates to computer field, in particular to a kind of fusion method of page, device, storage medium and
Electronic device.
Background technique
Since internet page is that user edits, i.e. user's original content (User Generated Content, letter
Referred to as UGC) mode.So being possible to that there are the redundancy pages for the page of entity identical under website.Such as: the encyclopaedia page
In the information of certain star edited by user A and form a page, while being edited again by user B and foring another page.By
It when constructing knowledge base, needs to integrate page info, with library entity information of enriching one's knowledge, so just needing to carry out page fusion.
Existing page integration program judges whether the page should merge using the exact matching mode of critical field.
If using fields match scheme, firstly, it is necessary to extract critical field to all pages;Then, it is according to the page
The no same keyword section that possesses carries out a point bucket;Finally, judging whether the page should merge according to other several auxiliary informations.It is this
For mode based on human configuration, the fusion efficiencies for resulting in the page are lower.
For above-mentioned problem, currently no effective solution has been proposed.
Summary of the invention
The embodiment of the invention provides a kind of fusion method of page, device, storage medium and electronic devices, at least to solve
Fusion efficiencies lower technical problem when certainly being merged in the related technology to the page.
According to an aspect of an embodiment of the present invention, a kind of fusion method of page is provided, comprising: to be fused
The first keyword is extracted in one page, and the second keyword is extracted from second page to be fused;From first keyword
Middle first weight of extracting meets the first object keyword of first object condition, and it is full to extract from the second keyword the second weight
Second target keyword of the foot first object condition, wherein it is crucial that first weight is used to indicate each described first
Word is used to indicate each second keyword to the second page to the representativeness of the first page, second weight
Representativeness;The first page and described second are determined according to the first object keyword and second target keyword
The target pages similarity of the page;In the case where the target pages similarity meets the second goal condition, by described first
The page and second page fusion.
According to another aspect of an embodiment of the present invention, a kind of fusing device of page is additionally provided, comprising: first extracts mould
Block for extracting the first keyword from first page to be fused, and extracts the second key from second page to be fused
Word;Second extraction module, the first object for meeting first object condition for extracting the first weight from first keyword
Keyword, and extract from the second keyword the second target keyword that the second weight meets the first object condition, wherein
First weight is used to indicate each first keyword to the representativeness of the first page, and second weight is used for
Indicate each second keyword to the representativeness of the second page;First determining module, for according to first mesh
Mark keyword and second target keyword determine the target pages similarity of the first page and the second page;Melt
Block is molded, in the case where the target pages similarity meets the second goal condition, by the first page and described
Second page fusion.
According to another aspect of an embodiment of the present invention, a kind of storage medium is additionally provided, which is characterized in that the storage is situated between
Computer program is stored in matter, wherein the computer program is arranged to execute described in any of the above-described when operation
Method.
According to another aspect of an embodiment of the present invention, a kind of electronic device, including memory and processor are additionally provided,
It is characterized in that, computer program is stored in the memory, and the processor is arranged to hold by the computer program
Method described in row any of the above-described.
In embodiments of the present invention, using extracting the first keyword from first page to be fused, and to be fused
Second page in extract the second keyword;The first mesh that the first weight meets first object condition is extracted from the first keyword
Keyword is marked, and extracts from the second keyword the second target keyword that the second weight meets first object condition;According to
One target keyword and the second target keyword determine the target pages similarity of first page and second page;In target pages
In the case that similarity meets the second goal condition, the mode that first page and second page are merged will be extracted from the page
Keyword in weight meet the keyword of first object condition and be determined as the target keyword of the page, to be extracted from the page
The keyword for playing role of delegate to the page out determines two according to first page and the corresponding target keyword of second page
Target pages similarity between person can be by first page if the target pages similarity meets the second goal condition
It is merged with second page, to expire according to the similarity degree for the crucial word judgment page for capableing of representing pages, then by similarity degree
The page of sufficient condition merges, and while the automatic fusion for realizing the page, determines page according to the keyword for capableing of representing pages
Similarity between face so that more accurate to the judgement of page similitude, thus improve the page is merged it is accurate
Degree, to realize the technical effect for improving fusion efficiencies when merging to the page, and then solve it is right in the related technology
Fusion efficiencies lower technical problem when the page is merged.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of schematic diagram of the fusion method of optional page according to an embodiment of the present invention;
Fig. 2 is a kind of application environment schematic diagram of the fusion method of optional page according to an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram one of the fusion method of optional page of optional embodiment according to the present invention;
Fig. 4 is a kind of schematic diagram two of the fusion method of optional page of optional embodiment according to the present invention;
Fig. 5 is a kind of schematic diagram three of the fusion method of optional page of optional embodiment according to the present invention;
Fig. 6 is a kind of schematic diagram four of the fusion method of optional page of optional embodiment according to the present invention;
Fig. 7 is a kind of schematic diagram five of the fusion method of optional page of optional embodiment according to the present invention;
Fig. 8 is a kind of schematic diagram of the fusing device of optional page according to an embodiment of the present invention;
Fig. 9 is a kind of application scenarios schematic diagram of the fusion method of optional page according to an embodiment of the present invention;And
Figure 10 is a kind of schematic diagram of optional electronic device according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
According to an aspect of an embodiment of the present invention, a kind of fusion method of page is provided, as shown in Figure 1, this method
Include:
S102 extracts the first keyword from first page to be fused, and is extracted from second page to be fused
Two keywords;
S104, extracts the first object keyword that the first weight meets first object condition from the first keyword, and from
The second target keyword that the second weight meets first object condition is extracted in second keyword.Optionally, in the present embodiment,
First weight is used to indicate each first keyword to the representativeness of first page, and it is crucial that the second weight is used to indicate each second
Representativeness of the word to second page;
S106 determines the page object of first page and second page according to first object keyword and the second target keyword
Face similarity;
S108 melts first page and second page in the case where target pages similarity meets the second goal condition
It closes.
Optionally, in the present embodiment, the fusion method of the above-mentioned page can be applied to server 202 as shown in Figure 2
In the hardware environment constituted.As shown in Fig. 2, server 202 extracts the first keyword from first page to be fused, and from
The second keyword is extracted in second page to be fused;The first weight is extracted from the first keyword meets first object condition
First object keyword, and the second target keyword that the second weight meets first object condition is extracted from the second keyword,
Wherein, the first weight is used to indicate each first keyword to the representativeness of first page, and the second weight is used to indicate each
Representativeness of two keywords to second page;First page and are determined according to first object keyword and the second target keyword
The target pages similarity of two pages;In the case where target pages similarity meets the second goal condition, by first page and
Second page fusion.
Optionally, in the present embodiment, the fusion method of the above-mentioned page can be, but not limited to be applied to melt the page
In the scene of conjunction.Wherein, the fusion method of the above-mentioned page can be, but not limited to be applied in various types of applications, for example,
It is line educational applications, instant messaging application, community space application, game application, shopping application, browser application, financial application, more
Media application, live streaming application etc..Specifically, can be, but not limited to be applied to carry out Webpage in above-mentioned browser application
In the scene of fusion, or can with but be not limited to be applied to the multimedia resource page is merged in above-mentioned multimedia application
Scene in, with improve the page fusion fusion efficiencies.Above-mentioned is only a kind of example, does not do any limit to this in the present embodiment
It is fixed.
Optionally, in the present embodiment, it can be, but not limited to splice from the brief introduction part of the page or other short texts of the page
Made of extract keyword in text.
Optionally, in the present embodiment, it is extracting between keyword, the text in the page can also segmented, then
Keyword is extracted from the word obtained after participle.
Optionally, in the present embodiment, weight can serve to indicate that a keyword to the representativeness of a webpage.It is crucial
The corresponding weight of word can be, but not limited to be calculated using tf-idf algorithm, or can with but be not limited to use
Textrank algorithm is calculated, alternatively, in order to extract keyword using tf-idf algorithm and textrank algorithm simultaneously
The resulting word weight of tf-idf and the resulting word weight of textrank can be normalized respectively, then take by advantage
Weight of the average value of the two as keyword.
Optionally, in the present embodiment, the first object condition that need to be met using weight sieves the keyword of the page
Choosing is best able to one of representing pages or some keywords as target keyword so as to filter out.First object item
Part can be, but not limited to the condition that need to meet for the weight to the most representative keyword of the page, such as: the representative to the page
Property highest N number of keyword be used as to the most representative keyword of the page, alternatively, falling into certain threshold value to the representativeness of the page
Keyword in range is to the most representative keyword of the page.
Optionally, in the present embodiment, it can be, but not limited to obtain between the page by training deep learning model
Similarity obtains the target keyword that an input parameter is two groups of pages using the sample training deep learning model of mark,
Output parameter is the model of the similarity of two groups of pages, referred to as similarity model.When determining Page resemblance can directly by
The first object keyword of the first page got and the second target keyword of second page are input in similarity model,
The output valve of the similarity model got is the target pages similarity of first page and second page.
Optionally, in the present embodiment, the first keyword and the second keyword can be, but not limited to respectively include multiple passes
Keyword, such as: it extracts in the page all with the keyword of practical significance.First object keyword and the second target keyword
Can be, but not limited to include multiple words, such as: extract all words with practical significance 100 from the page, then this 100
Biggish 30 words of weight are obtained in a word as the corresponding target keyword of the page.
Optionally, in the present embodiment, the fusion of first page and second page can be, but not limited to be for the page
In include entity merged.Entity fusion refers to that the information by multiple entities is integrated, and merging becomes an entity.It is real
Body information can indicate { S, P, O } that S indicates main body with several triples, and P indicates that attribute, O indicate attribute value, and multiple entity fusion is
Refer under conditions of same body S, the process all properties P of all entities, attribute value O being merged under a main body S.
In an optional embodiment, as shown in figure 3, it is (crucial to extract the first keyword from page A to be fused
Word 1, keyword 2 ..., keyword 30), and the second keyword (keyword a, keyword are extracted from page B to be fused
B ..., keyword m), determines corresponding first weight of each first keyword, and determines each second keyword corresponding the
Two weights, from (keyword 1, keyword 2 ..., keyword 30) in extract the first weight meet the first of first object condition
Target keyword has (keyword 3, keyword 6, keyword 17, keyword 22), and from (keyword a, keyword b ..., close
The second target keyword that the second weight meets first object condition is extracted in keyword m) (keyword a, keyword d, keyword
G, keyword k), according to (keyword 3, keyword 6, keyword 17, keyword 22) and (keyword a, keyword d, keyword g,
Keyword k) determines the target pages similarity of first page and second page, meets the second target item in target pages similarity
In the case where part, first page and second page are merged.
As it can be seen that through the above steps, by from the keyword extracted in the page weight meet the key of first object condition
Word is determined as the target keyword of the page, so that the keyword for playing role of delegate to the page is extracted from the page, according to
One page and the corresponding target keyword of second page determine target pages similarity between the two, if the page object
Face similarity meets the second goal condition, then can merge first page and second page, thus according to being capable of representing pages
The crucial word judgment page similarity degree, then by similarity degree meet condition the page merge, realizing the automatic of the page
While fusion, the similarity between the page is determined according to the keyword for capableing of representing pages, so that sentencing to page similitude
It is fixed more accurate, to improve the accuracy merged to the page, improved when being merged to the page to realize
The technical effect of fusion efficiencies, and then solve the lower technology of fusion efficiencies when merging in the related technology to the page and ask
Topic.
As a kind of optional scheme, first page and are determined according to first object keyword and the second target keyword
The target pages similarity of two pages includes:
S1, according to the keyword with corresponding relationship to and Page resemblance, obtain first object keyword and the second mesh
Mark the corresponding Page resemblance of keyword;
It is similar to be determined as target pages by S2 for first object keyword and the corresponding Page resemblance of the second target keyword
Degree.
Optionally, in the present embodiment, keyword can be pre-established to the corresponding relationship between Page resemblance, then
First object keyword and the corresponding Page resemblance of the second target keyword are obtained from the corresponding relationship, determine it as mesh
Mark Page resemblance.
Optionally, in the present embodiment, with corresponding relationship keyword to and Page resemblance can be, but not limited to be
The corresponding relationship stored in a tabular form, with keyword to for key value, with Page resemblance for value value, to store key-
The form of value key-value pair stores above-mentioned corresponding relationship in the table.It is similar to the corresponding page that keyword is searched from table
When spending, can the corresponding key-value pair of first object keyword to be stored in lookup table, then search second in these key-value pairs
The value value is determined as target pages similarity by the corresponding value value of target keyword.
Such as: in an optional embodiment, as shown in table 1, store the keyword with corresponding relationship to
Page resemblance, in the table search keyword to [(A1, A2, A3, A4, A5, A6, A7), (A2, A3, A5, A7, A9, A10,
A19)] when corresponding Page resemblance, it is corresponding to first look for first object keyword (A1, A2, A3, A4, A5, A6, A7)
Multiple related objective keywords, and obtain corresponding Page resemblance, for example, [(A1, A2, A3, A5, A6, A7, A8),
75%], [(A2, A3, A5, A7, A9, A10, A19), 40%], [(A1, A2, A3, A4, A5, A8, A9), 62.5%], then from upper
It states and finds the second target keyword (A2, A3, A5, A7, A9, A10, A19) and corresponding Page resemblance in corresponding relationship
It is 40%, then can be determined as target pages similarity for 40%.
Table 1
As a kind of optional scheme, according to the keyword with corresponding relationship to and Page resemblance, obtain the first mesh
Mark keyword and the corresponding Page resemblance of the second target keyword include:
First object keyword and the second target keyword are inputted similarity model, wherein similarity model is to make by S1
With the keyword with corresponding relationship to and the obtained model of Page resemblance training deep learning model;
S2 obtains the destination probability value of similarity model output, destination probability value is determined as target pages similarity,
In, probability value is used to indicate first page and second page is the probability of similar pages.
Optionally, in the present embodiment, can be used the keyword with corresponding relationship to and Page resemblance as instruction
Practice sample training deep learning model, to obtain with target keyword for input parameter, using probability value as the phase of output parameter
Like degree model, first object keyword and the second target keyword are input in similarity model, similarity model is got
The destination probability value of output, and the destination probability value is determined as target pages similarity.
Optionally, in the present embodiment, above-mentioned deep learning model can be, but not limited to include convolutional neural networks
(Convolutional Neural Network, referred to as CNN) model ZCNN model is a kind of feedforward neural network, including volume
Lamination (convolutional layer) and pond layer (pooling layer), its artificial neuron can respond a part
Surrounding cells in coverage area have large-scale image procossing outstanding performance.Alternatively, above-mentioned deep learning model can with but
It is not limited to include VGG model etc..
As a kind of optional scheme, by first object keyword and the second target keyword input similarity model it
Before, further includes:
S1 obtains page sample set, and according to the entity for including in the page title information of each page to page sample
The page in this set carries out a point bucket, obtains the entity with corresponding relationship and page set;
S2 extracts keyword to each page in page set in each point of bucket respectively;
Third weight in the keyword of each page is met the key of first object condition by S3 in each point of bucket respectively
Word is determined as the corresponding keyword set of each page;
It is similar to be determined as first in each point of bucket by S4 for the matched Page resemblance of keyword set in each page
Angle value, and the Page resemblance of keyword set mismatch or Incomplete matching in each page is determined as the second similarity
Value;
S5 establishes the corresponding relationship of matched keyword set and the first similarity value in each point of bucket respectively, and
The corresponding relationship of the keyword set and the second similarity of mismatch or Incomplete matching;
S6 obtains the matched keyword set and first with corresponding relationship of the first quantity from each point of bucket respectively
The unmatched keyword set and the second similarity value with corresponding relationship of similarity value and the second quantity, are had
Have the keyword of corresponding relationship to and Page resemblance;
S7, using the keyword with corresponding relationship to and Page resemblance training deep learning model obtain similarity mould
Type.
Optionally, in the present embodiment, from the point of view of data processing, objective things in the real world are known as entity,
It is it is any in real world distinguish, identifiable things.Entity can refer to people, such as teacher, student, can also refer to object,
Such as book, warehouse.It can not only refer to the objective objects that can be touched, and can also refer to abstract event, such as performance, football match.
Optionally, in the present embodiment, according to the entity for including in page title information to the page in page sample set
Face carries out a point bucket, by identical entity division into identical point of bucket, obtains the entity with corresponding relationship and page set.Again
Training sample is obtained from each point of bucket respectively, so that obtained training sample is evenly distributed, to improve similarity model
Accuracy rate.
Optionally, in the present embodiment, the page of the first quantity Matching is obtained under each point of bucket respectively to as training
The positive sample of similarity model obtains the unmatched page of the second quantity to the negative sample as training similarity model.
It is as a kind of optional scheme, the corresponding page of matched two pages of keyword set in each page is similar
Degree is determined as the first similarity value, and by the corresponding Page resemblance of unmatched two pages of keyword set in each page
Being determined as the second similarity value includes:
S1 extracts characteristic information from each keyword set;
The identical keyword set of characteristic information is determined as matched keyword set by S2, and characteristic information is different
Keyword set is determined as unmatched keyword set;
S3 determines corresponding first similarity value of matched keyword set, corresponding second phase of unmatched keyword set
Like angle value.
Optionally, in the present embodiment, determining whether two pages match in each point of bucket respectively can be, but not limited to
It is to be matched according to stringent condition.Such as: when matching obtains sibling species subpage, from the physical page of identical point of bucket, extract
Positive sample of the clearly identical physical page of a part as training data.
As shown in figure 4, the brief introduction page one and the brief introduction page two of Liu are the essential informations of Liu.To guarantee page
Face describes same entity really, i.e., should merge, and needs to use the key message of the page as characteristic information, determines that it is
It is no to exactly match.For example, " date of birth " and " blood group " of two pages is all identical, then confirm that the two pages are
Match, using the two pages as the positive sample of similarity calculation.
Optionally, in the present embodiment, under bucket, random negative sampling generates training set.Since similarity calculation is only same
It carries out, is consistent when in order to be predicted with similarity, when selecting negative sample also in physical page of the same name in name physical page
It carries out.With the positive example page for choosing the similarity page to corresponding, when choosing the negative example of the page, with page critical field (i.e. feature
Information) mismatch alternatively condition.Such as: as shown in figure 5, having 4 two pages of the page three and the page, the two pages
In " date of birth " and " blood group " that extracts it is not exactly the same, then confirm the two pages be it is unmatched, by the two pages
Negative sample of the face as similarity calculation.
Optionally, in the present embodiment, the first similarity value can be, but not limited to be 1, the second similarity value can with but not
It is limited to be 0.The output parameter of the similarity model obtained after then training is the probability value between one 0 to 1.Alternatively, can also be with
First similarity value is set as 100, the second similarity value is set as 0, then the output parameter of the similarity model obtained after training is
Numerical value between one 0 to 100, it is also assumed that being probability value.
It should be noted that above-mentioned first similarity value and the second similarity value are an example, it is right in the present embodiment
This is not construed as limiting.
Optionally, in the present embodiment, the keyword set obtained above with corresponding relationship to and similarity value can
To indicate that wherein P1, P2 are the corresponding id of physical page, and Labe representation page is with the form of triple { P1, P2, Label }
No similar, value is 0 (dissmilarity) or 1 (similar), such as { 121,122,0 }, { 121,123,1 }.
As a kind of optional scheme, using the keyword with corresponding relationship to and Page resemblance training deep learning
Model obtains similarity model
S1 will using matched keyword set and unmatched keyword set as the input value of deep learning model
First similarity value and the second similarity value are trained deep learning model as the output valve of deep learning model;
The deep learning model obtained after training is determined as similarity model by S2.
Optionally, in the present embodiment, as shown in fig. 6, by taking deep learning model is CNN model as an example, by above-mentioned ternary
Then the corresponding page of P1 and P2 in group passes through convolution, the pond of CNN respectively as two channel layers of CNN mode input layer
After the processing such as change, full connection, output model predicted value finally calculates backpropagation after error with Label.Model overall architecture with
Based on CNN, it converts physical page to two input channels of CNN.
Optionally, in the present embodiment, as shown in fig. 7, carrying out keyword extraction to two candidate pages, N (N etc. is obtained
In term vector dimension) a keyword filled if page key words deficiency is N number of with spcial character, and the page each in this way can
To be represented with keyword, then, keyword is replaced with the corresponding term vector of keyword, converts two-dimentional term vector square for keyword
Battle array, two-dimensional matrix is finally merged, obtains the three-dimensional matrice of N*N*2, which corresponds to the input of CNN.In this way into
Row conversion can efficiently use the natural language statement habit of people.
As a kind of optional scheme, the first mesh that the first weight meets first object condition is extracted from the first keyword
Keyword is marked, and extracts from the second keyword the second weight and meets the second target keyword of first object condition and include:
S1 extracts the first object keyword that the first weight is higher than target weight from the first keyword, and closes from second
The second target keyword that the second weight is higher than target weight is extracted in keyword;Alternatively,
S2 is from big to small ranked up the first keyword according to the first weight, and right from big to small according to the second weight
Second keyword is ranked up;Ranking is extracted from the first keyword after sequence in the keyword of preceding third quantity as first
Target keyword, and extract ranking from the second keyword after sequence and closed in the keyword of preceding third quantity as the second target
Keyword.
Optionally, in the present embodiment, first object condition can be a threshold range, alternatively, being also possible to one
The range of sequence.For extracting first object keyword, the first keyword (keyword 1, keyword 2 ..., keyword 30)
Corresponding first weight is respectively 0.7,0.4 ..., 0.88, a kind of mode, which can be, extracts the of the first weight higher than 0.65
For one keyword as first object keyword, then the first object keyword got is keyword 1, keyword 5, keyword
10 ..., keyword 30.Another middle mode, which can be, arranges the first keyword by the sequence of the first weight from big to small
Sequence: keyword 30, keyword 7, keyword 28 ..., keyword 2, from the first keyword after sequence extract ranking preceding
For 10 keyword as the first object keyword, then the first object keyword got may are as follows: keyword 30, key
Word 7, keyword 28, keyword 3, keyword 17, keyword 9, keyword 22, keyword 14, keyword 3, keyword 6.
As a kind of optional scheme, target pages similarity meets the second goal condition and includes:
S1 determines target under the higher expression first page of target pages similarity and the more similar situation of second page
It is that target pages similarity meets the second goal condition that Page resemblance, which is higher than first object similarity,;Alternatively,
S2 determines target under the lower expression first page of target pages similarity and the more similar situation of second page
Page resemblance is that target pages similarity meets the second goal condition lower than the second target similarity.
Optionally, in the present embodiment, two more similar pages are merged, then according to target pages similarity
Represented meaning is determined to the second goal condition that the page of fusion need to meet.If target pages similarity height indicates two
A page is similar, then the similarity for the page that can be merged need to be higher than first object similarity, if target pages similarity is low
Indicate that two pages are similar, then the similarity for the page that can be merged need to be lower than the second target similarity.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Other side according to an embodiment of the present invention additionally provides a kind of for implementing the fusion method of the above-mentioned page
The fusing device of the page, as shown in figure 8, the device includes:
First extraction module 82, for extracting the first keyword from first page to be fused, and to be fused
The second keyword is extracted in second page;
Second extraction module 84 meets the first mesh of first object condition for extracting the first weight from the first keyword
Keyword is marked, and extracts from the second keyword the second target keyword that the second weight meets first object condition.Optionally,
In the present embodiment, the first weight is used to indicate each first keyword to the representativeness of first page, and the second weight is for referring to
Show each second keyword to the representativeness of second page;
First determining module 86, for determining first page and according to first object keyword and the second target keyword
The target pages similarity of two pages;
Fusion Module 88, in the case where target pages similarity meets the second goal condition, by first page and
Second page fusion.
Optionally, in the present embodiment, the fusing device of the above-mentioned page can be applied to server 202 as shown in Figure 2
In the hardware environment constituted.As shown in Fig. 2, server 202 extracts the first keyword from first page to be fused, and from
The second keyword is extracted in second page to be fused;The first weight is extracted from the first keyword meets first object condition
First object keyword, and the second target keyword that the second weight meets first object condition is extracted from the second keyword,
Wherein, the first weight is used to indicate each first keyword to the representativeness of first page, and the second weight is used to indicate each
Representativeness of two keywords to second page;First page and are determined according to first object keyword and the second target keyword
The target pages similarity of two pages;In the case where target pages similarity meets the second goal condition, by first page and
Second page fusion.
Optionally, in the present embodiment, the fusing device of the above-mentioned page can be, but not limited to be applied to melt the page
In the scene of conjunction.Wherein, the fusing device of the above-mentioned page can be, but not limited to be applied in various types of applications, for example,
It is line educational applications, instant messaging application, community space application, game application, shopping application, browser application, financial application, more
Media application, live streaming application etc..Specifically, can be, but not limited to be applied to carry out Webpage in above-mentioned browser application
In the scene of fusion, or can with but be not limited to be applied to the multimedia resource page is merged in above-mentioned multimedia application
Scene in, with improve the page fusion fusion efficiencies.Above-mentioned is only a kind of example, does not do any limit to this in the present embodiment
It is fixed.
Optionally, in the present embodiment, it can be, but not limited to splice from the brief introduction part of the page or other short texts of the page
Made of extract keyword in text.
Optionally, in the present embodiment, it is extracting between keyword, the text in the page can also segmented, then
Keyword is extracted from the word obtained after participle.
Optionally, in the present embodiment, weight can serve to indicate that a keyword to the representativeness of a webpage.It is crucial
The corresponding weight of word can be, but not limited to be calculated using tf-idf algorithm, or can with but be not limited to use
Textrank algorithm is calculated, alternatively, in order to extract keyword using tf-idf algorithm and textrank algorithm simultaneously
The resulting word weight of tf-idf and the resulting word weight of textrank can be normalized respectively, then take by advantage
Weight of the average value of the two as keyword.
Optionally, in the present embodiment, the first object condition that need to be met using weight sieves the keyword of the page
Choosing is best able to one of representing pages or some keywords as target keyword so as to filter out.First object item
Part can be, but not limited to the condition that need to meet for the weight to the most representative keyword of the page, such as: the representative to the page
Property highest N number of keyword be used as to the most representative keyword of the page, alternatively, falling into certain threshold value to the representativeness of the page
Keyword in range is to the most representative keyword of the page.
Optionally, in the present embodiment, it can be, but not limited to obtain between the page by training deep learning model
Similarity obtains the target keyword that an input parameter is two groups of pages using the sample training deep learning model of mark,
Output parameter is the model of the similarity of two groups of pages, referred to as similarity model.When determining Page resemblance can directly by
The first object keyword of the first page got and the second target keyword of second page are input in similarity model,
The output valve of the similarity model got is the target pages similarity of first page and second page.
Optionally, in the present embodiment, the first keyword and the second keyword can be, but not limited to respectively include multiple passes
Keyword, such as: it extracts in the page all with the keyword of practical significance.First object keyword and the second target keyword
Can be, but not limited to include multiple words, such as: extract all words with practical significance 100 from the page, then this 100
Biggish 30 words of weight are obtained in a word as the corresponding target keyword of the page.
Optionally, in the present embodiment, the fusion of first page and second page can be, but not limited to be for the page
In include entity merged.Entity fusion refers to that the information by multiple entities is integrated, and merging becomes an entity.It is real
Body information can indicate { S, P, O } that S indicates main body with several triples, and P indicates that attribute, O indicate attribute value, and multiple entity fusion is
Refer under conditions of same body S, the process all properties P of all entities, attribute value O being merged under a main body S.
In an optional embodiment, as shown in figure 3, it is (crucial to extract the first keyword from page A to be fused
Word 1, keyword 2 ..., keyword 30), and the second keyword (keyword a, keyword are extracted from page B to be fused
B ..., keyword m), determines corresponding first weight of each first keyword, and determines each second keyword corresponding the
Two weights, from (keyword 1, keyword 2 ..., keyword 30) in extract the first weight meet the first of first object condition
Target keyword has (keyword 3, keyword 6, keyword 17, keyword 22), and from (keyword a, keyword b ..., close
The second target keyword that the second weight meets first object condition is extracted in keyword m) (keyword a, keyword d, keyword
G, keyword k), according to (keyword 3, keyword 6, keyword 17, keyword 22) and (keyword a, keyword d, keyword g,
Keyword k) determines the target pages similarity of first page and second page, meets the second target item in target pages similarity
In the case where part, first page and second page are merged.
As it can be seen that by above-mentioned apparatus, by from the keyword extracted in the page weight meet the key of first object condition
Word is determined as the target keyword of the page, so that the keyword for playing role of delegate to the page is extracted from the page, according to
One page and the corresponding target keyword of second page determine target pages similarity between the two, if the page object
Face similarity meets the second goal condition, then can merge first page and second page, thus according to being capable of representing pages
The crucial word judgment page similarity degree, then by similarity degree meet condition the page merge, realizing the automatic of the page
While fusion, the similarity between the page is determined according to the keyword for capableing of representing pages, so that sentencing to page similitude
It is fixed more accurate, to improve the accuracy merged to the page, improved when being merged to the page to realize
The technical effect of fusion efficiencies, and then solve the lower technology of fusion efficiencies when merging in the related technology to the page and ask
Topic.
As a kind of optional scheme, the first determining module includes:
Acquiring unit, for according to have the keyword of corresponding relationship to and Page resemblance, it is crucial to obtain first object
Word and the corresponding Page resemblance of the second target keyword;
First determination unit, for determining first object keyword and the corresponding Page resemblance of the second target keyword
For target pages similarity.
Optionally, in the present embodiment, keyword can be pre-established to the corresponding relationship between Page resemblance, then
First object keyword and the corresponding Page resemblance of the second target keyword are obtained from the corresponding relationship, determine it as mesh
Mark Page resemblance.
Optionally, in the present embodiment, with corresponding relationship keyword to and Page resemblance can be, but not limited to be
The corresponding relationship stored in a tabular form, with keyword to for key value, with Page resemblance for value value, to store key-
The form of value key-value pair stores above-mentioned corresponding relationship in the table.It is similar to the corresponding page that keyword is searched from table
When spending, can the corresponding key-value pair of first object keyword to be stored in lookup table, then search second in these key-value pairs
The value value is determined as target pages similarity by the corresponding value value of target keyword.
Such as: in an optional embodiment, as shown in table 2, store the keyword with corresponding relationship to
Page resemblance, in the table search keyword to [(A1, A2, A3, A4, A5, A6, A7), (A2, A3, A5, A7, A9, A10,
A19)] when corresponding Page resemblance, it is corresponding to first look for first object keyword (A1, A2, A3, A4, A5, A6, A7)
Related objective keyword, and obtain corresponding Page resemblance, for example, [(A1, A2, A3, A5, A6, A7, A8),
75%], [(A2, A3, A5, A7, A9, A10, A19), 40%], [(A1, A2, A3, A4, A5, A8, A9), 62.5%], then from upper
It states and finds the second target keyword (A2, A3, A5, A7, A9, A10, A19) and the corresponding page phase of acquisition in corresponding relationship
It is 40% like degree, then can be determined as target pages similarity for 40%.
Table 2
As a kind of optional scheme, acquiring unit includes:
Subelement is inputted, for first object keyword and the second target keyword to be inputted similarity model, wherein phase
Like degree model be using the keyword with corresponding relationship to and the obtained model of Page resemblance training deep learning model;
Subelement is obtained, for obtaining the destination probability value of similarity model output, destination probability value is determined as target
Page resemblance, wherein probability value is used to indicate first page and second page is the probability of similar pages.
Optionally, in the present embodiment, can be used the keyword with corresponding relationship to and Page resemblance as instruction
Practice sample training deep learning model, to obtain with target keyword for input parameter, using probability value as the phase of output parameter
Like degree model, first object keyword and the second target keyword are input in similarity model, similarity model is got
The destination probability value of output, and the destination probability value is determined as target pages similarity.
Optionally, in the present embodiment, above-mentioned deep learning model can be, but not limited to include convolutional neural networks
(Convolutional Neural Network, referred to as CNN) model.CNN model is a kind of feedforward neural network, including volume
Lamination (convolutional layer) and pond layer (pooling layer), its artificial neuron can respond a part
Surrounding cells in coverage area have large-scale image procossing outstanding performance.Alternatively, above-mentioned deep learning model can with but
It is not limited to include VGG model etc..
As a kind of optional scheme, above-mentioned apparatus further include:
Processing module, for obtaining page sample set, and according to the reality for including in the page title information of each page
Body carries out a point bucket to the page in page sample set, obtains the entity with corresponding relationship and page set;
Third extraction module, for extracting keyword to each page in page set in each point of bucket respectively;
Second determining module, for third weight in the keyword of each page to be met first in each point of bucket respectively
The keyword of goal condition is determined as the corresponding keyword set of each page;
Third determining module is used in each point of bucket, by the matched Page resemblance of keyword set in each page
It is determined as the first similarity value, and the Page resemblance of keyword set mismatch or Incomplete matching in each page is determined
For the second similarity value;
Module is established, for establishing the correspondence of matched keyword set and the first similarity value in each point of bucket respectively
Relationship, and mismatch or Incomplete matching keyword set and the second similarity corresponding relationship;
Module is obtained, for obtaining the matched keyword with corresponding relationship of the first quantity from each point of bucket respectively
The unmatched keyword set and the second similarity with corresponding relationship of set and the first similarity value and the second quantity
Value, obtain having the keyword of corresponding relationship to and Page resemblance;
Training module, for use the keyword with corresponding relationship to and Page resemblance training deep learning model obtain
To similarity model.
Optionally, in the present embodiment, from the point of view of data processing, objective things in the real world are known as entity,
It is it is any in real world distinguish, identifiable things.Entity can refer to people, such as teacher, student, can also refer to object,
Such as book, warehouse.It can not only refer to the objective objects that can be touched, and can also refer to abstract event, such as performance, football match.
Optionally, in the present embodiment, according to the entity for including in page title information to the page in page sample set
Face carries out a point bucket, by identical entity division into identical point of bucket, obtains the entity with corresponding relationship and page set.Again
Training sample is obtained from each point of bucket respectively, so that obtained training sample is evenly distributed, to improve similarity model
Accuracy rate.
Optionally, in the present embodiment, the page of the first quantity Matching is obtained under each point of bucket respectively to as training
The positive sample of similarity model obtains the unmatched page of the second quantity to the negative sample as training similarity model.
As a kind of optional scheme, third determining module includes:
Extraction unit, for extracting characteristic information from each keyword set;
Second determination unit will for the identical keyword set of characteristic information to be determined as matched keyword set
The different keyword set of characteristic information is determined as unmatched keyword set;
Third determination unit, for determining corresponding first similarity value of matched keyword set, unmatched keyword
Gather corresponding second similarity value.
Optionally, in the present embodiment, determining whether two pages match in each point of bucket respectively can be, but not limited to
It is to be matched according to stringent condition.Such as: when matching obtains sibling species subpage, from the physical page of identical point of bucket, extract
Positive sample of the clearly identical physical page of a part as training data.As shown in figure 4, the brief introduction page one and the letter of Liu
Jie's page two is the essential information of Liu.To guarantee that the page describes same entity really, i.e., it should merge, need to make
It uses the key message of the page as characteristic information, determines if to exactly match.For example, " date of birth " of two pages and
" blood group " is all identical, then confirm the two pages be it is matched, using the two pages as the positive sample of similarity calculation.
Optionally, in the present embodiment, under bucket, random negative sampling generates training set.Since similarity calculation is only same
It carries out, is consistent when in order to be predicted with similarity, when selecting negative sample also in physical page of the same name in name physical page
It carries out.With the positive example page for choosing the similarity page to corresponding, when choosing the negative example of the page, with page critical field (i.e. feature
Information) mismatch alternatively condition.Such as: as shown in figure 5, having 4 two pages of the page three and the page, the two pages
In " date of birth " and " blood group " that extracts it is not exactly the same, then confirm the two pages be it is unmatched, by the two pages
Negative sample of the face as similarity calculation.
Optionally, in the present embodiment, the first similarity value can be, but not limited to be 1, the second similarity value can with but not
It is limited to be 0.The output parameter of the similarity model obtained after then training is the probability value between one 0 to 1.Alternatively, can also be with
First similarity value is set as 100, the second similarity value is set as 0, then the output parameter of the similarity model obtained after training is
Numerical value between one 0 to 100, it is also assumed that being probability value.
It should be noted that above-mentioned first similarity value and the second similarity value are an example, it is right in the present embodiment
This is not construed as limiting.
Optionally, in the present embodiment, the keyword set obtained above with corresponding relationship to and similarity value can
To indicate that wherein P1, P2 are the corresponding id of physical page, and Labe representation page is with the form of triple { P1, P2, Label }
No similar, value is 0 (dissmilarity) or 1 (similar), such as { 121,122,0 }, { 121,123,1 }.
As a kind of optional scheme, training module includes:
Training unit, for using matched keyword set and unmatched keyword set as deep learning model
Input value carries out deep learning model using the first similarity value and the second similarity value as the output valve of deep learning model
Training;
4th determination unit, the deep learning model for obtaining after training are determined as similarity model.
Optionally, in the present embodiment, as shown in fig. 6, by taking deep learning model is CNN model as an example, by above-mentioned ternary
Then the corresponding page of P1 and P2 in group passes through convolution, the pond of CNN respectively as two channel layers of CNN mode input layer
After the processing such as change, full connection, output model predicted value finally calculates backpropagation after error with Label.Model overall architecture with
Based on CNN, it converts physical page to two input channels of CNN.
Optionally, in the present embodiment, as shown in fig. 7, carrying out keyword extraction to two candidate pages, N (N etc. is obtained
In term vector dimension) a keyword filled if page key words deficiency is N number of with spcial character, and the page each in this way can
To be represented with keyword, then, keyword is replaced with the corresponding term vector of keyword, converts two-dimentional term vector square for keyword
Battle array, two-dimensional matrix is finally merged, obtains the three-dimensional matrice of N*N*2, which corresponds to the input of CNN.In this way into
Row conversion can efficiently use the natural language statement habit of people.
As a kind of optional scheme, the second extraction module includes:
First extraction unit, it is crucial higher than the first object of target weight for extracting the first weight from the first keyword
Word, and second target keyword of second weight higher than target weight is extracted from the second keyword;Alternatively,
Second extraction unit, for being ranked up from big to small to the first keyword according to the first weight, and according to second
Weight is from big to small ranked up the second keyword;Ranking is extracted from the first keyword after sequence in preceding third quantity
Keyword extracts ranking in the keyword of preceding third quantity as first object keyword, and from the second keyword after sequence
As the second target keyword.
Optionally, in the present embodiment, first object condition can be a threshold range, alternatively, being also possible to one
The range of sequence.For extracting first object keyword, the first keyword (keyword 1, keyword 2 ..., keyword 30)
Corresponding first weight is respectively 0.7,0.4 ..., 0.88, a kind of mode, which can be, extracts the of the first weight higher than 0.65
For one keyword as first object keyword, then the first object keyword got is keyword 1, keyword 5, keyword
10 ..., keyword 30.Another middle mode, which can be, arranges the first keyword by the sequence of the first weight from big to small
Sequence: keyword 30, keyword 7, keyword 28 ..., keyword 2, from the first keyword after sequence extract ranking preceding
For 10 keyword as the first object keyword, then the first object keyword got may are as follows: keyword 30, key
Word 7, keyword 28, keyword 3, keyword 17, keyword 9, keyword 22, keyword 14, keyword 3, keyword 6.
As a kind of optional scheme, Fusion Module is used for:
Under the higher expression first page of target pages similarity and the more similar situation of second page, target pages are determined
It is that target pages similarity meets the second goal condition that similarity, which is higher than first object similarity,;Alternatively,
Under the lower expression first page of target pages similarity and the more similar situation of second page, target pages are determined
Similarity is that target pages similarity meets the second goal condition lower than the second target similarity.
Optionally, in the present embodiment, two more similar pages are merged, then according to target pages similarity
Represented meaning is determined to the second goal condition that the page of fusion need to meet.If target pages similarity height indicates two
A page is similar, then the similarity for the page that can be merged need to be higher than first object similarity, if target pages similarity is low
Indicate that two pages are similar, then the similarity for the page that can be merged need to be lower than the second target similarity.
The application environment of the embodiment of the present invention can be, but not limited to referring to the application environment in above-described embodiment, the present embodiment
In this is repeated no more.The embodiment of the invention provides the optional tools of one kind of the connection method for implementing above-mentioned real time communication
Body application example.
As a kind of optional embodiment, the fusion method of the above-mentioned page can be, but not limited to be applied to as shown in Figure 9
In the scene merged to the page.In this scene, as shown in figure 9, the process merged to the page includes the following steps:
Step 1, the page carries out a point bucket according to physical name in title.
Step 2, page text is segmented, extracts keyword.To the brief introduction part of physical page or the page other short texts
The text being spliced takes head primary word as the page using the keyword extraction after participle according to the weight sequencing of word
It represents.
Step 3, according to stringent condition, sibling species subpage is obtained.In order to extract the training data of model, from identical point of bucket
In physical page, positive sample of the clearly identical physical page of a part as training data is extracted.
Step 4, under bucket, random negative sampling generates training set.When selecting negative sample also in physical page of the same name into
Row.It is corresponding with the positive example page pair of the similarity page is chosen, when choosing the negative example of the page, with the mismatch of page critical field
Alternatively condition.
Step 5, training CNN model.There is the positive negative sample of step 3 Yu step 4, so that it may start to carry out the instruction of model
Practice.And the model that can be used after training merges the page.
Another aspect according to an embodiment of the present invention additionally provides a kind of for implementing the electronics of the fusion of the above-mentioned page
Device, as shown in Figure 10, the electronic device include: one or more (one is only shown in figure) processors 1002, memory
1004, sensor 1006, encoder 1008 and transmitting device 1010 are stored with computer program in the memory, the processing
Device is arranged to execute the step in any of the above-described embodiment of the method by computer program.
Optionally, in the present embodiment, above-mentioned electronic device can be located in multiple network equipments of computer network
At least one network equipment.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
S1 extracts the first keyword from first page to be fused, and is extracted from second page to be fused
Two keywords;
S2, extracts the first object keyword that the first weight meets first object condition from the first keyword, and from
The second target keyword that the second weight meets first object condition is extracted in two keywords;
S3 determines the target pages of first page and second page according to first object keyword and the second target keyword
Similarity;
S4 merges first page and second page in the case where target pages similarity meets the second goal condition.
Optionally, it will appreciated by the skilled person that structure shown in Fig. 10 is only to illustrate, electronic device can also
To be smart phone (such as Android phone, iOS mobile phone), tablet computer, palm PC and mobile internet device
The terminal devices such as (Mobile Internet Devices, MID), PAD.Figure 10 it does not make to the structure of above-mentioned electronic device
At restriction.For example, electronic device may also include more or less component (such as network interface, display dress than shown in Figure 10
Set), or with the configuration different from shown in Figure 10.
Wherein, memory 1002 can be used for storing software program and module, such as melting for the page in the embodiment of the present invention
Close the corresponding program instruction/module of method and apparatus, the software journey that processor 1004 is stored in memory 1002 by operation
Sequence and module realize the control method of above-mentioned target element thereby executing various function application and data processing.It deposits
Reservoir 1002 may include high speed random access memory, can also include nonvolatile memory, such as one or more magnetic storage
Device, flash memory or other non-volatile solid state memories.In some instances, memory 1002 can further comprise opposite
In the remotely located memory of processor 1004, these remote memories can pass through network connection to terminal.Above-mentioned network
Example includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Above-mentioned transmitting device 1010 is used to that data to be received or sent via a network.Above-mentioned network specific example
It may include cable network and wireless network.In an example, transmitting device 1010 includes a network adapter (Network
Interface Controller, NIC), can be connected by cable with other network equipments with router so as to interconnection
Net or local area network are communicated.In an example, transmitting device 1010 is radio frequency (Radio Frequency, RF) module,
For wirelessly being communicated with internet.
Wherein, specifically, memory 1002 is for storing application program.
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein
The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
S1 extracts the first keyword from first page to be fused, and is extracted from second page to be fused
Two keywords;
S2, extracts the first object keyword that the first weight meets first object condition from the first keyword, and from
The second target keyword that the second weight meets first object condition is extracted in two keywords;
S3 determines the target pages of first page and second page according to first object keyword and the second target keyword
Similarity;
S4 merges first page and second page in the case where target pages similarity meets the second goal condition.
Optionally, storage medium is also configured to store for executing step included in the method in above-described embodiment
Computer program, this is repeated no more in the present embodiment.
Optionally, in the present embodiment, those of ordinary skill in the art will appreciate that in the various methods of above-described embodiment
All or part of the steps be that the relevant hardware of terminal device can be instructed to complete by program, the program can store in
In one computer readable storage medium, storage medium may include: flash disk, read-only memory (Read-Only Memory,
ROM), random access device (Random Access Memory, RAM), disk or CD etc..
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product
When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention
Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme
The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one
Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention
State all or part of the steps of method.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed client, it can be by others side
Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one
Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (15)
1. a kind of fusion method of the page characterized by comprising
The first keyword is extracted from first page to be fused, and the second key is extracted from second page to be fused
Word;
The first object keyword that the first weight meets first object condition is extracted from first keyword, and is closed from second
The second target keyword that the second weight meets the first object condition is extracted in keyword;
The first page and the second page are determined according to the first object keyword and second target keyword
Target pages similarity;
In the case where the target pages similarity meets the second goal condition, by the first page and the second page
Fusion.
2. the method according to claim 1, wherein according to the first object keyword and second target
Keyword determines that the target pages similarity of the first page and the second page includes:
According to the keyword with corresponding relationship to and Page resemblance, obtain the first object keyword and second mesh
Mark the corresponding Page resemblance of keyword;
The first object keyword and the corresponding Page resemblance of second target keyword are determined as the page object
Face similarity.
3. according to the method described in claim 2, it is characterized in that, according to the keyword with corresponding relationship to similar with the page
Degree, obtains the first object keyword and the corresponding Page resemblance of second target keyword includes:
The first object keyword and second target keyword are inputted into similarity model, wherein the similarity mould
Type be using the keyword with corresponding relationship to and the obtained model of Page resemblance training deep learning model;
The destination probability value for obtaining the similarity model output, it is similar to be determined as the target pages for the destination probability value
Degree, wherein the probability value is used to indicate the first page and the second page is the probability of similar pages.
4. according to the method described in claim 3, it is characterized in that, by the first object keyword and second target
Keyword inputs before similarity model, the method also includes:
Page sample set is obtained, and according to the entity for including in the page title information of each page to the page sample set
The page in conjunction carries out a point bucket, obtains the entity with corresponding relationship and page set;
Keyword is extracted to each page in the page set in each point of bucket respectively;
Each divide in bucket third weight in the keyword of each page is met into the first object condition described respectively
Keyword be determined as the corresponding keyword set of each page;
Each divide in bucket described, the matched Page resemblance of keyword set described in each page is determined as first
Similarity value, and the Page resemblance of the mismatch of keyword set described in each page or Incomplete matching is determined as
Second similarity value;
Respectively it is described each divide in bucket establish the corresponding relationship of matched keyword set and the first similarity value, and not
Match or the corresponding relationship of the keyword set of Incomplete matching and the second similarity;
Respectively from it is described each divide in bucket obtain the matched keyword set and the first phase with corresponding relationship of the first quantity
Like angle value and the unmatched keyword set and the second similarity value with corresponding relationship of the second quantity, obtain described
Keyword with corresponding relationship to and Page resemblance;
Using the keyword with corresponding relationship to and Page resemblance training deep learning model obtain the similarity
Model.
5. according to the method described in claim 4, it is characterized in that, keyword set described in each page is matched
The corresponding Page resemblance of two pages is determined as the first similarity value, and not by keyword set described in each page
The corresponding Page resemblance of matched two pages is determined as the second similarity value and includes:
Characteristic information is extracted from each keyword set;
The identical keyword set of the characteristic information is determined as matched keyword set, the characteristic information is different
Keyword set is determined as unmatched keyword set;
Determine that matched keyword set corresponds to first similarity value, unmatched keyword set corresponds to second phase
Like angle value.
6. according to the method described in claim 4, it is characterized in that, using the keyword with corresponding relationship to and the page
Similarity training deep learning model obtains the similarity model and includes:
Using the matched keyword set and the unmatched keyword set as the input of the deep learning model
Value, using first similarity value and second similarity value as the output valve of the deep learning model to the depth
Learning model is trained;
The deep learning model obtained after training is determined as the similarity model.
7. the method according to claim 1, wherein extracting the first weight from first keyword meets the
The first object keyword of one goal condition, and extract the second weight from the second keyword and meet the first object condition
Second target keyword includes:
The first object keyword that first weight is higher than target weight is extracted from first keyword, and is closed from second
The second target keyword that the second weight is higher than the target weight is extracted in keyword;Alternatively,
First keyword is ranked up from big to small according to first weight, and according to second weight from greatly to
It is small that second keyword is ranked up;Ranking is extracted from first keyword after sequence in the pass of preceding third quantity
Keyword extracts ranking in the preceding third number as the first object keyword, and from second keyword after sequence
The keyword of amount is as second target keyword.
8. the method according to claim 1, wherein the target pages similarity meets the second target item
Part includes:
Under the higher expression first page and the more similar situation of the second page of the target pages similarity, determine
It is that the target pages similarity meets second goal condition that the target pages similarity, which is higher than first object similarity,;
Alternatively,
Under the lower expression first page and the more similar situation of the second page of the target pages similarity, determine
The target pages similarity is that the target pages similarity meets second goal condition lower than the second target similarity.
9. a kind of fusing device of the page characterized by comprising
First extraction module, for extracting the first keyword from first page to be fused, and from second page to be fused
The second keyword is extracted in face;
Second extraction module, the first object for meeting first object condition for extracting the first weight from first keyword
Keyword, and extract from the second keyword the second target keyword that the second weight meets the first object condition;
First determining module, for determining the first page according to the first object keyword and second target keyword
The target pages similarity in face and the second page;
Fusion Module is used in the case where the target pages similarity meets the second goal condition, by the first page
It is merged with the second page.
10. device according to claim 9, which is characterized in that first determining module includes:
Acquiring unit, for according to have the keyword of corresponding relationship to and Page resemblance, it is crucial to obtain the first object
Word and the corresponding Page resemblance of second target keyword;
First determination unit is used for the first object keyword and the corresponding Page resemblance of second target keyword
It is determined as the target pages similarity.
11. device according to claim 10, which is characterized in that the acquiring unit includes:
Subelement is inputted, for the first object keyword and second target keyword to be inputted similarity model,
In, the similarity model be using the keyword with corresponding relationship to and Page resemblance training deep learning model obtain
Model;
Subelement is obtained, for obtaining the destination probability value of the similarity model output, the destination probability value is determined as
The target pages similarity, wherein the probability value is used to indicate the first page and the second page is similar page
The probability in face.
12. device according to claim 11, which is characterized in that described device further include:
Processing module, for obtaining page sample set, and according to the entity pair for including in the page title information of each page
The page in the page sample set carries out a point bucket, obtains the entity with corresponding relationship and page set;
Third extraction module, for extracting keyword to each page in the page set in each point of bucket respectively;
Second determining module, for each dividing in bucket described respectively and meeting third weight in the keyword of each page
The keyword of the first object condition is determined as the corresponding keyword set of each page;
Third determining module, for respectively it is described each divide it is in bucket that keyword set described in each page is matched
The corresponding Page resemblance of two pages is determined as the first similarity value, and not by keyword set described in each page
The corresponding Page resemblance of matched two pages is determined as the second similarity value;
Establish module, for respectively it is described each divide in bucket establish the correspondence of matched keyword set and the first similarity value
The corresponding relationship of relationship and unmatched keyword set and the second similarity;
Obtain module, for respectively from it is described each divide in bucket obtain the matched keyword with corresponding relationship of the first quantity
The unmatched keyword set and the second similarity with corresponding relationship of set and the first similarity value and the second quantity
Value, obtain the keyword with corresponding relationship to and Page resemblance;
Training module, for use the keyword with corresponding relationship to and Page resemblance training deep learning model obtain
To the similarity model.
13. device according to claim 12, which is characterized in that third determining module includes:
Extraction unit, for extracting characteristic information from each keyword set;
Second determination unit will for the identical keyword set of the characteristic information to be determined as matched keyword set
The different keyword set of the characteristic information is determined as unmatched keyword set;
Third determination unit, for determining that matched keyword set corresponds to first similarity value, unmatched keyword
Corresponding second similarity value of set.
14. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer
Program is arranged to execute method described in any one of claim 1 to 8 when operation.
15. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory
Sequence, the processor are arranged to execute side described in any one of claim 1 to 8 by the computer program
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810456491.9A CN110162356B (en) | 2018-05-14 | 2018-05-14 | Page fusion method and device, storage medium and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810456491.9A CN110162356B (en) | 2018-05-14 | 2018-05-14 | Page fusion method and device, storage medium and electronic device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110162356A true CN110162356A (en) | 2019-08-23 |
CN110162356B CN110162356B (en) | 2021-09-28 |
Family
ID=67644902
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810456491.9A Active CN110162356B (en) | 2018-05-14 | 2018-05-14 | Page fusion method and device, storage medium and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110162356B (en) |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4431744B2 (en) * | 2004-06-07 | 2010-03-17 | 独立行政法人情報通信研究機構 | Web page information fusion display device, web page information fusion display method, web page information fusion display program, and computer-readable recording medium recording the program |
CN101706790A (en) * | 2009-09-18 | 2010-05-12 | 浙江大学 | Clustering method of WEB objects in search engine |
CN102323954A (en) * | 2011-09-14 | 2012-01-18 | 杨继能 | Search engine technology for integrating webpage resource through internally installing auxiliary browser windows |
CN102693304A (en) * | 2012-05-22 | 2012-09-26 | 北京邮电大学 | Search engine feedback information processing method and search engine |
CN103246719A (en) * | 2013-04-27 | 2013-08-14 | 北京交通大学 | Web-based network information resource integration method |
CN103345476A (en) * | 2013-06-09 | 2013-10-09 | 北京百度网讯科技有限公司 | Method and device for determining present information corresponding to destination page |
CN103744683A (en) * | 2014-01-24 | 2014-04-23 | 中科创达软件股份有限公司 | Information fusion method and device |
CN103902596A (en) * | 2012-12-28 | 2014-07-02 | 中国电信股份有限公司 | High-frequency page content clustering method and system |
CN103955529A (en) * | 2014-05-12 | 2014-07-30 | 中国科学院计算机网络信息中心 | Internet information searching and aggregating presentation method |
CN105159881A (en) * | 2015-08-28 | 2015-12-16 | 北京奇艺世纪科技有限公司 | Method and device for polymerizing data module in page |
CN106303613A (en) * | 2015-06-29 | 2017-01-04 | 中兴通讯股份有限公司 | page fusion method and device |
CN106407195A (en) * | 2015-07-28 | 2017-02-15 | 北京京东尚科信息技术有限公司 | Method and system for eliminating duplication of webpage |
CN107577671A (en) * | 2017-09-19 | 2018-01-12 | 中央民族大学 | A kind of key phrases extraction method based on multi-feature fusion |
-
2018
- 2018-05-14 CN CN201810456491.9A patent/CN110162356B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4431744B2 (en) * | 2004-06-07 | 2010-03-17 | 独立行政法人情報通信研究機構 | Web page information fusion display device, web page information fusion display method, web page information fusion display program, and computer-readable recording medium recording the program |
CN101706790A (en) * | 2009-09-18 | 2010-05-12 | 浙江大学 | Clustering method of WEB objects in search engine |
CN102323954A (en) * | 2011-09-14 | 2012-01-18 | 杨继能 | Search engine technology for integrating webpage resource through internally installing auxiliary browser windows |
CN102693304A (en) * | 2012-05-22 | 2012-09-26 | 北京邮电大学 | Search engine feedback information processing method and search engine |
CN103902596A (en) * | 2012-12-28 | 2014-07-02 | 中国电信股份有限公司 | High-frequency page content clustering method and system |
CN103246719A (en) * | 2013-04-27 | 2013-08-14 | 北京交通大学 | Web-based network information resource integration method |
CN103345476A (en) * | 2013-06-09 | 2013-10-09 | 北京百度网讯科技有限公司 | Method and device for determining present information corresponding to destination page |
CN103744683A (en) * | 2014-01-24 | 2014-04-23 | 中科创达软件股份有限公司 | Information fusion method and device |
CN103955529A (en) * | 2014-05-12 | 2014-07-30 | 中国科学院计算机网络信息中心 | Internet information searching and aggregating presentation method |
CN106303613A (en) * | 2015-06-29 | 2017-01-04 | 中兴通讯股份有限公司 | page fusion method and device |
CN106407195A (en) * | 2015-07-28 | 2017-02-15 | 北京京东尚科信息技术有限公司 | Method and system for eliminating duplication of webpage |
CN105159881A (en) * | 2015-08-28 | 2015-12-16 | 北京奇艺世纪科技有限公司 | Method and device for polymerizing data module in page |
CN107577671A (en) * | 2017-09-19 | 2018-01-12 | 中央民族大学 | A kind of key phrases extraction method based on multi-feature fusion |
Also Published As
Publication number | Publication date |
---|---|
CN110162356B (en) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111125422B (en) | Image classification method, device, electronic equipment and storage medium | |
CN111611436B (en) | Label data processing method and device and computer readable storage medium | |
CN108268441A (en) | Sentence similarity computational methods and apparatus and system | |
CN110737783A (en) | method, device and computing equipment for recommending multimedia content | |
CN111506820B (en) | Recommendation model, recommendation method, recommendation device, recommendation equipment and recommendation storage medium | |
CN112580352B (en) | Keyword extraction method, device and equipment and computer storage medium | |
CN111046158B (en) | Question-answer matching method, model training method, device, equipment and storage medium | |
CN110377789A (en) | For by text summaries and the associated system and method for content media | |
CN113761105A (en) | Text data processing method, device, equipment and medium | |
CN113486173B (en) | Text labeling neural network model and labeling method thereof | |
CN111625715A (en) | Information extraction method and device, electronic equipment and storage medium | |
CN114201516B (en) | User portrait construction method, information recommendation method and related devices | |
CN112749556B (en) | Multi-language model training method and device, storage medium and electronic equipment | |
CN114398973B (en) | Media content tag identification method, device, equipment and storage medium | |
JP7181999B2 (en) | SEARCH METHOD AND SEARCH DEVICE, STORAGE MEDIUM | |
CN114298122A (en) | Data classification method, device, equipment, storage medium and computer program product | |
CN115114395A (en) | Content retrieval and model training method and device, electronic equipment and storage medium | |
CN111460783A (en) | Data processing method and device, computer equipment and storage medium | |
Zhang et al. | Online modeling of esthetic communities using deep perception graph analytics | |
CN110209860B (en) | Template-guided interpretable garment matching method and device based on garment attributes | |
CN110110218A (en) | A kind of Identity Association method and terminal | |
CN114049174A (en) | Method and device for commodity recommendation, electronic equipment and storage medium | |
CN114090880A (en) | Method and device for commodity recommendation, electronic equipment and storage medium | |
Lu et al. | Web multimedia object classification using cross-domain correlation knowledge | |
CN114330476A (en) | Model training method for media content recognition and media content recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |