CN110162356A - Fusion method, device, storage medium and the electronic device of the page - Google Patents

Fusion method, device, storage medium and the electronic device of the page Download PDF

Info

Publication number
CN110162356A
CN110162356A CN201810456491.9A CN201810456491A CN110162356A CN 110162356 A CN110162356 A CN 110162356A CN 201810456491 A CN201810456491 A CN 201810456491A CN 110162356 A CN110162356 A CN 110162356A
Authority
CN
China
Prior art keywords
keyword
page
similarity
target
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810456491.9A
Other languages
Chinese (zh)
Other versions
CN110162356B (en
Inventor
高航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810456491.9A priority Critical patent/CN110162356B/en
Publication of CN110162356A publication Critical patent/CN110162356A/en
Application granted granted Critical
Publication of CN110162356B publication Critical patent/CN110162356B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of fusion method of page, device, storage medium and electronic devices.Wherein, this method comprises: extracting the first keyword from first page to be fused, and the second keyword is extracted from second page to be fused;The first object keyword that the first weight meets first object condition is extracted from the first keyword, and extracts from the second keyword the second target keyword that the second weight meets first object condition;The target pages similarity of first page and second page is determined according to first object keyword and the second target keyword;In the case where target pages similarity meets the second goal condition, first page and second page are merged.The present invention solves the lower technical problem of fusion efficiencies when merging in the related technology to the page.

Description

Fusion method, device, storage medium and the electronic device of the page
Technical field
The present invention relates to computer field, in particular to a kind of fusion method of page, device, storage medium and Electronic device.
Background technique
Since internet page is that user edits, i.e. user's original content (User Generated Content, letter Referred to as UGC) mode.So being possible to that there are the redundancy pages for the page of entity identical under website.Such as: the encyclopaedia page In the information of certain star edited by user A and form a page, while being edited again by user B and foring another page.By It when constructing knowledge base, needs to integrate page info, with library entity information of enriching one's knowledge, so just needing to carry out page fusion. Existing page integration program judges whether the page should merge using the exact matching mode of critical field.
If using fields match scheme, firstly, it is necessary to extract critical field to all pages;Then, it is according to the page The no same keyword section that possesses carries out a point bucket;Finally, judging whether the page should merge according to other several auxiliary informations.It is this For mode based on human configuration, the fusion efficiencies for resulting in the page are lower.
For above-mentioned problem, currently no effective solution has been proposed.
Summary of the invention
The embodiment of the invention provides a kind of fusion method of page, device, storage medium and electronic devices, at least to solve Fusion efficiencies lower technical problem when certainly being merged in the related technology to the page.
According to an aspect of an embodiment of the present invention, a kind of fusion method of page is provided, comprising: to be fused The first keyword is extracted in one page, and the second keyword is extracted from second page to be fused;From first keyword Middle first weight of extracting meets the first object keyword of first object condition, and it is full to extract from the second keyword the second weight Second target keyword of the foot first object condition, wherein it is crucial that first weight is used to indicate each described first Word is used to indicate each second keyword to the second page to the representativeness of the first page, second weight Representativeness;The first page and described second are determined according to the first object keyword and second target keyword The target pages similarity of the page;In the case where the target pages similarity meets the second goal condition, by described first The page and second page fusion.
According to another aspect of an embodiment of the present invention, a kind of fusing device of page is additionally provided, comprising: first extracts mould Block for extracting the first keyword from first page to be fused, and extracts the second key from second page to be fused Word;Second extraction module, the first object for meeting first object condition for extracting the first weight from first keyword Keyword, and extract from the second keyword the second target keyword that the second weight meets the first object condition, wherein First weight is used to indicate each first keyword to the representativeness of the first page, and second weight is used for Indicate each second keyword to the representativeness of the second page;First determining module, for according to first mesh Mark keyword and second target keyword determine the target pages similarity of the first page and the second page;Melt Block is molded, in the case where the target pages similarity meets the second goal condition, by the first page and described Second page fusion.
According to another aspect of an embodiment of the present invention, a kind of storage medium is additionally provided, which is characterized in that the storage is situated between Computer program is stored in matter, wherein the computer program is arranged to execute described in any of the above-described when operation Method.
According to another aspect of an embodiment of the present invention, a kind of electronic device, including memory and processor are additionally provided, It is characterized in that, computer program is stored in the memory, and the processor is arranged to hold by the computer program Method described in row any of the above-described.
In embodiments of the present invention, using extracting the first keyword from first page to be fused, and to be fused Second page in extract the second keyword;The first mesh that the first weight meets first object condition is extracted from the first keyword Keyword is marked, and extracts from the second keyword the second target keyword that the second weight meets first object condition;According to One target keyword and the second target keyword determine the target pages similarity of first page and second page;In target pages In the case that similarity meets the second goal condition, the mode that first page and second page are merged will be extracted from the page Keyword in weight meet the keyword of first object condition and be determined as the target keyword of the page, to be extracted from the page The keyword for playing role of delegate to the page out determines two according to first page and the corresponding target keyword of second page Target pages similarity between person can be by first page if the target pages similarity meets the second goal condition It is merged with second page, to expire according to the similarity degree for the crucial word judgment page for capableing of representing pages, then by similarity degree The page of sufficient condition merges, and while the automatic fusion for realizing the page, determines page according to the keyword for capableing of representing pages Similarity between face so that more accurate to the judgement of page similitude, thus improve the page is merged it is accurate Degree, to realize the technical effect for improving fusion efficiencies when merging to the page, and then solve it is right in the related technology Fusion efficiencies lower technical problem when the page is merged.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of schematic diagram of the fusion method of optional page according to an embodiment of the present invention;
Fig. 2 is a kind of application environment schematic diagram of the fusion method of optional page according to an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram one of the fusion method of optional page of optional embodiment according to the present invention;
Fig. 4 is a kind of schematic diagram two of the fusion method of optional page of optional embodiment according to the present invention;
Fig. 5 is a kind of schematic diagram three of the fusion method of optional page of optional embodiment according to the present invention;
Fig. 6 is a kind of schematic diagram four of the fusion method of optional page of optional embodiment according to the present invention;
Fig. 7 is a kind of schematic diagram five of the fusion method of optional page of optional embodiment according to the present invention;
Fig. 8 is a kind of schematic diagram of the fusing device of optional page according to an embodiment of the present invention;
Fig. 9 is a kind of application scenarios schematic diagram of the fusion method of optional page according to an embodiment of the present invention;And
Figure 10 is a kind of schematic diagram of optional electronic device according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.
According to an aspect of an embodiment of the present invention, a kind of fusion method of page is provided, as shown in Figure 1, this method Include:
S102 extracts the first keyword from first page to be fused, and is extracted from second page to be fused Two keywords;
S104, extracts the first object keyword that the first weight meets first object condition from the first keyword, and from The second target keyword that the second weight meets first object condition is extracted in second keyword.Optionally, in the present embodiment, First weight is used to indicate each first keyword to the representativeness of first page, and it is crucial that the second weight is used to indicate each second Representativeness of the word to second page;
S106 determines the page object of first page and second page according to first object keyword and the second target keyword Face similarity;
S108 melts first page and second page in the case where target pages similarity meets the second goal condition It closes.
Optionally, in the present embodiment, the fusion method of the above-mentioned page can be applied to server 202 as shown in Figure 2 In the hardware environment constituted.As shown in Fig. 2, server 202 extracts the first keyword from first page to be fused, and from The second keyword is extracted in second page to be fused;The first weight is extracted from the first keyword meets first object condition First object keyword, and the second target keyword that the second weight meets first object condition is extracted from the second keyword, Wherein, the first weight is used to indicate each first keyword to the representativeness of first page, and the second weight is used to indicate each Representativeness of two keywords to second page;First page and are determined according to first object keyword and the second target keyword The target pages similarity of two pages;In the case where target pages similarity meets the second goal condition, by first page and Second page fusion.
Optionally, in the present embodiment, the fusion method of the above-mentioned page can be, but not limited to be applied to melt the page In the scene of conjunction.Wherein, the fusion method of the above-mentioned page can be, but not limited to be applied in various types of applications, for example, It is line educational applications, instant messaging application, community space application, game application, shopping application, browser application, financial application, more Media application, live streaming application etc..Specifically, can be, but not limited to be applied to carry out Webpage in above-mentioned browser application In the scene of fusion, or can with but be not limited to be applied to the multimedia resource page is merged in above-mentioned multimedia application Scene in, with improve the page fusion fusion efficiencies.Above-mentioned is only a kind of example, does not do any limit to this in the present embodiment It is fixed.
Optionally, in the present embodiment, it can be, but not limited to splice from the brief introduction part of the page or other short texts of the page Made of extract keyword in text.
Optionally, in the present embodiment, it is extracting between keyword, the text in the page can also segmented, then Keyword is extracted from the word obtained after participle.
Optionally, in the present embodiment, weight can serve to indicate that a keyword to the representativeness of a webpage.It is crucial The corresponding weight of word can be, but not limited to be calculated using tf-idf algorithm, or can with but be not limited to use Textrank algorithm is calculated, alternatively, in order to extract keyword using tf-idf algorithm and textrank algorithm simultaneously The resulting word weight of tf-idf and the resulting word weight of textrank can be normalized respectively, then take by advantage Weight of the average value of the two as keyword.
Optionally, in the present embodiment, the first object condition that need to be met using weight sieves the keyword of the page Choosing is best able to one of representing pages or some keywords as target keyword so as to filter out.First object item Part can be, but not limited to the condition that need to meet for the weight to the most representative keyword of the page, such as: the representative to the page Property highest N number of keyword be used as to the most representative keyword of the page, alternatively, falling into certain threshold value to the representativeness of the page Keyword in range is to the most representative keyword of the page.
Optionally, in the present embodiment, it can be, but not limited to obtain between the page by training deep learning model Similarity obtains the target keyword that an input parameter is two groups of pages using the sample training deep learning model of mark, Output parameter is the model of the similarity of two groups of pages, referred to as similarity model.When determining Page resemblance can directly by The first object keyword of the first page got and the second target keyword of second page are input in similarity model, The output valve of the similarity model got is the target pages similarity of first page and second page.
Optionally, in the present embodiment, the first keyword and the second keyword can be, but not limited to respectively include multiple passes Keyword, such as: it extracts in the page all with the keyword of practical significance.First object keyword and the second target keyword Can be, but not limited to include multiple words, such as: extract all words with practical significance 100 from the page, then this 100 Biggish 30 words of weight are obtained in a word as the corresponding target keyword of the page.
Optionally, in the present embodiment, the fusion of first page and second page can be, but not limited to be for the page In include entity merged.Entity fusion refers to that the information by multiple entities is integrated, and merging becomes an entity.It is real Body information can indicate { S, P, O } that S indicates main body with several triples, and P indicates that attribute, O indicate attribute value, and multiple entity fusion is Refer under conditions of same body S, the process all properties P of all entities, attribute value O being merged under a main body S.
In an optional embodiment, as shown in figure 3, it is (crucial to extract the first keyword from page A to be fused Word 1, keyword 2 ..., keyword 30), and the second keyword (keyword a, keyword are extracted from page B to be fused B ..., keyword m), determines corresponding first weight of each first keyword, and determines each second keyword corresponding the Two weights, from (keyword 1, keyword 2 ..., keyword 30) in extract the first weight meet the first of first object condition Target keyword has (keyword 3, keyword 6, keyword 17, keyword 22), and from (keyword a, keyword b ..., close The second target keyword that the second weight meets first object condition is extracted in keyword m) (keyword a, keyword d, keyword G, keyword k), according to (keyword 3, keyword 6, keyword 17, keyword 22) and (keyword a, keyword d, keyword g, Keyword k) determines the target pages similarity of first page and second page, meets the second target item in target pages similarity In the case where part, first page and second page are merged.
As it can be seen that through the above steps, by from the keyword extracted in the page weight meet the key of first object condition Word is determined as the target keyword of the page, so that the keyword for playing role of delegate to the page is extracted from the page, according to One page and the corresponding target keyword of second page determine target pages similarity between the two, if the page object Face similarity meets the second goal condition, then can merge first page and second page, thus according to being capable of representing pages The crucial word judgment page similarity degree, then by similarity degree meet condition the page merge, realizing the automatic of the page While fusion, the similarity between the page is determined according to the keyword for capableing of representing pages, so that sentencing to page similitude It is fixed more accurate, to improve the accuracy merged to the page, improved when being merged to the page to realize The technical effect of fusion efficiencies, and then solve the lower technology of fusion efficiencies when merging in the related technology to the page and ask Topic.
As a kind of optional scheme, first page and are determined according to first object keyword and the second target keyword The target pages similarity of two pages includes:
S1, according to the keyword with corresponding relationship to and Page resemblance, obtain first object keyword and the second mesh Mark the corresponding Page resemblance of keyword;
It is similar to be determined as target pages by S2 for first object keyword and the corresponding Page resemblance of the second target keyword Degree.
Optionally, in the present embodiment, keyword can be pre-established to the corresponding relationship between Page resemblance, then First object keyword and the corresponding Page resemblance of the second target keyword are obtained from the corresponding relationship, determine it as mesh Mark Page resemblance.
Optionally, in the present embodiment, with corresponding relationship keyword to and Page resemblance can be, but not limited to be The corresponding relationship stored in a tabular form, with keyword to for key value, with Page resemblance for value value, to store key- The form of value key-value pair stores above-mentioned corresponding relationship in the table.It is similar to the corresponding page that keyword is searched from table When spending, can the corresponding key-value pair of first object keyword to be stored in lookup table, then search second in these key-value pairs The value value is determined as target pages similarity by the corresponding value value of target keyword.
Such as: in an optional embodiment, as shown in table 1, store the keyword with corresponding relationship to Page resemblance, in the table search keyword to [(A1, A2, A3, A4, A5, A6, A7), (A2, A3, A5, A7, A9, A10, A19)] when corresponding Page resemblance, it is corresponding to first look for first object keyword (A1, A2, A3, A4, A5, A6, A7) Multiple related objective keywords, and obtain corresponding Page resemblance, for example, [(A1, A2, A3, A5, A6, A7, A8), 75%], [(A2, A3, A5, A7, A9, A10, A19), 40%], [(A1, A2, A3, A4, A5, A8, A9), 62.5%], then from upper It states and finds the second target keyword (A2, A3, A5, A7, A9, A10, A19) and corresponding Page resemblance in corresponding relationship It is 40%, then can be determined as target pages similarity for 40%.
Table 1
As a kind of optional scheme, according to the keyword with corresponding relationship to and Page resemblance, obtain the first mesh Mark keyword and the corresponding Page resemblance of the second target keyword include:
First object keyword and the second target keyword are inputted similarity model, wherein similarity model is to make by S1 With the keyword with corresponding relationship to and the obtained model of Page resemblance training deep learning model;
S2 obtains the destination probability value of similarity model output, destination probability value is determined as target pages similarity, In, probability value is used to indicate first page and second page is the probability of similar pages.
Optionally, in the present embodiment, can be used the keyword with corresponding relationship to and Page resemblance as instruction Practice sample training deep learning model, to obtain with target keyword for input parameter, using probability value as the phase of output parameter Like degree model, first object keyword and the second target keyword are input in similarity model, similarity model is got The destination probability value of output, and the destination probability value is determined as target pages similarity.
Optionally, in the present embodiment, above-mentioned deep learning model can be, but not limited to include convolutional neural networks (Convolutional Neural Network, referred to as CNN) model ZCNN model is a kind of feedforward neural network, including volume Lamination (convolutional layer) and pond layer (pooling layer), its artificial neuron can respond a part Surrounding cells in coverage area have large-scale image procossing outstanding performance.Alternatively, above-mentioned deep learning model can with but It is not limited to include VGG model etc..
As a kind of optional scheme, by first object keyword and the second target keyword input similarity model it Before, further includes:
S1 obtains page sample set, and according to the entity for including in the page title information of each page to page sample The page in this set carries out a point bucket, obtains the entity with corresponding relationship and page set;
S2 extracts keyword to each page in page set in each point of bucket respectively;
Third weight in the keyword of each page is met the key of first object condition by S3 in each point of bucket respectively Word is determined as the corresponding keyword set of each page;
It is similar to be determined as first in each point of bucket by S4 for the matched Page resemblance of keyword set in each page Angle value, and the Page resemblance of keyword set mismatch or Incomplete matching in each page is determined as the second similarity Value;
S5 establishes the corresponding relationship of matched keyword set and the first similarity value in each point of bucket respectively, and The corresponding relationship of the keyword set and the second similarity of mismatch or Incomplete matching;
S6 obtains the matched keyword set and first with corresponding relationship of the first quantity from each point of bucket respectively The unmatched keyword set and the second similarity value with corresponding relationship of similarity value and the second quantity, are had Have the keyword of corresponding relationship to and Page resemblance;
S7, using the keyword with corresponding relationship to and Page resemblance training deep learning model obtain similarity mould Type.
Optionally, in the present embodiment, from the point of view of data processing, objective things in the real world are known as entity, It is it is any in real world distinguish, identifiable things.Entity can refer to people, such as teacher, student, can also refer to object, Such as book, warehouse.It can not only refer to the objective objects that can be touched, and can also refer to abstract event, such as performance, football match.
Optionally, in the present embodiment, according to the entity for including in page title information to the page in page sample set Face carries out a point bucket, by identical entity division into identical point of bucket, obtains the entity with corresponding relationship and page set.Again Training sample is obtained from each point of bucket respectively, so that obtained training sample is evenly distributed, to improve similarity model Accuracy rate.
Optionally, in the present embodiment, the page of the first quantity Matching is obtained under each point of bucket respectively to as training The positive sample of similarity model obtains the unmatched page of the second quantity to the negative sample as training similarity model.
It is as a kind of optional scheme, the corresponding page of matched two pages of keyword set in each page is similar Degree is determined as the first similarity value, and by the corresponding Page resemblance of unmatched two pages of keyword set in each page Being determined as the second similarity value includes:
S1 extracts characteristic information from each keyword set;
The identical keyword set of characteristic information is determined as matched keyword set by S2, and characteristic information is different Keyword set is determined as unmatched keyword set;
S3 determines corresponding first similarity value of matched keyword set, corresponding second phase of unmatched keyword set Like angle value.
Optionally, in the present embodiment, determining whether two pages match in each point of bucket respectively can be, but not limited to It is to be matched according to stringent condition.Such as: when matching obtains sibling species subpage, from the physical page of identical point of bucket, extract Positive sample of the clearly identical physical page of a part as training data.
As shown in figure 4, the brief introduction page one and the brief introduction page two of Liu are the essential informations of Liu.To guarantee page Face describes same entity really, i.e., should merge, and needs to use the key message of the page as characteristic information, determines that it is It is no to exactly match.For example, " date of birth " and " blood group " of two pages is all identical, then confirm that the two pages are Match, using the two pages as the positive sample of similarity calculation.
Optionally, in the present embodiment, under bucket, random negative sampling generates training set.Since similarity calculation is only same It carries out, is consistent when in order to be predicted with similarity, when selecting negative sample also in physical page of the same name in name physical page It carries out.With the positive example page for choosing the similarity page to corresponding, when choosing the negative example of the page, with page critical field (i.e. feature Information) mismatch alternatively condition.Such as: as shown in figure 5, having 4 two pages of the page three and the page, the two pages In " date of birth " and " blood group " that extracts it is not exactly the same, then confirm the two pages be it is unmatched, by the two pages Negative sample of the face as similarity calculation.
Optionally, in the present embodiment, the first similarity value can be, but not limited to be 1, the second similarity value can with but not It is limited to be 0.The output parameter of the similarity model obtained after then training is the probability value between one 0 to 1.Alternatively, can also be with First similarity value is set as 100, the second similarity value is set as 0, then the output parameter of the similarity model obtained after training is Numerical value between one 0 to 100, it is also assumed that being probability value.
It should be noted that above-mentioned first similarity value and the second similarity value are an example, it is right in the present embodiment This is not construed as limiting.
Optionally, in the present embodiment, the keyword set obtained above with corresponding relationship to and similarity value can To indicate that wherein P1, P2 are the corresponding id of physical page, and Labe representation page is with the form of triple { P1, P2, Label } No similar, value is 0 (dissmilarity) or 1 (similar), such as { 121,122,0 }, { 121,123,1 }.
As a kind of optional scheme, using the keyword with corresponding relationship to and Page resemblance training deep learning Model obtains similarity model
S1 will using matched keyword set and unmatched keyword set as the input value of deep learning model First similarity value and the second similarity value are trained deep learning model as the output valve of deep learning model;
The deep learning model obtained after training is determined as similarity model by S2.
Optionally, in the present embodiment, as shown in fig. 6, by taking deep learning model is CNN model as an example, by above-mentioned ternary Then the corresponding page of P1 and P2 in group passes through convolution, the pond of CNN respectively as two channel layers of CNN mode input layer After the processing such as change, full connection, output model predicted value finally calculates backpropagation after error with Label.Model overall architecture with Based on CNN, it converts physical page to two input channels of CNN.
Optionally, in the present embodiment, as shown in fig. 7, carrying out keyword extraction to two candidate pages, N (N etc. is obtained In term vector dimension) a keyword filled if page key words deficiency is N number of with spcial character, and the page each in this way can To be represented with keyword, then, keyword is replaced with the corresponding term vector of keyword, converts two-dimentional term vector square for keyword Battle array, two-dimensional matrix is finally merged, obtains the three-dimensional matrice of N*N*2, which corresponds to the input of CNN.In this way into Row conversion can efficiently use the natural language statement habit of people.
As a kind of optional scheme, the first mesh that the first weight meets first object condition is extracted from the first keyword Keyword is marked, and extracts from the second keyword the second weight and meets the second target keyword of first object condition and include:
S1 extracts the first object keyword that the first weight is higher than target weight from the first keyword, and closes from second The second target keyword that the second weight is higher than target weight is extracted in keyword;Alternatively,
S2 is from big to small ranked up the first keyword according to the first weight, and right from big to small according to the second weight Second keyword is ranked up;Ranking is extracted from the first keyword after sequence in the keyword of preceding third quantity as first Target keyword, and extract ranking from the second keyword after sequence and closed in the keyword of preceding third quantity as the second target Keyword.
Optionally, in the present embodiment, first object condition can be a threshold range, alternatively, being also possible to one The range of sequence.For extracting first object keyword, the first keyword (keyword 1, keyword 2 ..., keyword 30) Corresponding first weight is respectively 0.7,0.4 ..., 0.88, a kind of mode, which can be, extracts the of the first weight higher than 0.65 For one keyword as first object keyword, then the first object keyword got is keyword 1, keyword 5, keyword 10 ..., keyword 30.Another middle mode, which can be, arranges the first keyword by the sequence of the first weight from big to small Sequence: keyword 30, keyword 7, keyword 28 ..., keyword 2, from the first keyword after sequence extract ranking preceding For 10 keyword as the first object keyword, then the first object keyword got may are as follows: keyword 30, key Word 7, keyword 28, keyword 3, keyword 17, keyword 9, keyword 22, keyword 14, keyword 3, keyword 6.
As a kind of optional scheme, target pages similarity meets the second goal condition and includes:
S1 determines target under the higher expression first page of target pages similarity and the more similar situation of second page It is that target pages similarity meets the second goal condition that Page resemblance, which is higher than first object similarity,;Alternatively,
S2 determines target under the lower expression first page of target pages similarity and the more similar situation of second page Page resemblance is that target pages similarity meets the second goal condition lower than the second target similarity.
Optionally, in the present embodiment, two more similar pages are merged, then according to target pages similarity Represented meaning is determined to the second goal condition that the page of fusion need to meet.If target pages similarity height indicates two A page is similar, then the similarity for the page that can be merged need to be higher than first object similarity, if target pages similarity is low Indicate that two pages are similar, then the similarity for the page that can be merged need to be lower than the second target similarity.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Other side according to an embodiment of the present invention additionally provides a kind of for implementing the fusion method of the above-mentioned page The fusing device of the page, as shown in figure 8, the device includes:
First extraction module 82, for extracting the first keyword from first page to be fused, and to be fused The second keyword is extracted in second page;
Second extraction module 84 meets the first mesh of first object condition for extracting the first weight from the first keyword Keyword is marked, and extracts from the second keyword the second target keyword that the second weight meets first object condition.Optionally, In the present embodiment, the first weight is used to indicate each first keyword to the representativeness of first page, and the second weight is for referring to Show each second keyword to the representativeness of second page;
First determining module 86, for determining first page and according to first object keyword and the second target keyword The target pages similarity of two pages;
Fusion Module 88, in the case where target pages similarity meets the second goal condition, by first page and Second page fusion.
Optionally, in the present embodiment, the fusing device of the above-mentioned page can be applied to server 202 as shown in Figure 2 In the hardware environment constituted.As shown in Fig. 2, server 202 extracts the first keyword from first page to be fused, and from The second keyword is extracted in second page to be fused;The first weight is extracted from the first keyword meets first object condition First object keyword, and the second target keyword that the second weight meets first object condition is extracted from the second keyword, Wherein, the first weight is used to indicate each first keyword to the representativeness of first page, and the second weight is used to indicate each Representativeness of two keywords to second page;First page and are determined according to first object keyword and the second target keyword The target pages similarity of two pages;In the case where target pages similarity meets the second goal condition, by first page and Second page fusion.
Optionally, in the present embodiment, the fusing device of the above-mentioned page can be, but not limited to be applied to melt the page In the scene of conjunction.Wherein, the fusing device of the above-mentioned page can be, but not limited to be applied in various types of applications, for example, It is line educational applications, instant messaging application, community space application, game application, shopping application, browser application, financial application, more Media application, live streaming application etc..Specifically, can be, but not limited to be applied to carry out Webpage in above-mentioned browser application In the scene of fusion, or can with but be not limited to be applied to the multimedia resource page is merged in above-mentioned multimedia application Scene in, with improve the page fusion fusion efficiencies.Above-mentioned is only a kind of example, does not do any limit to this in the present embodiment It is fixed.
Optionally, in the present embodiment, it can be, but not limited to splice from the brief introduction part of the page or other short texts of the page Made of extract keyword in text.
Optionally, in the present embodiment, it is extracting between keyword, the text in the page can also segmented, then Keyword is extracted from the word obtained after participle.
Optionally, in the present embodiment, weight can serve to indicate that a keyword to the representativeness of a webpage.It is crucial The corresponding weight of word can be, but not limited to be calculated using tf-idf algorithm, or can with but be not limited to use Textrank algorithm is calculated, alternatively, in order to extract keyword using tf-idf algorithm and textrank algorithm simultaneously The resulting word weight of tf-idf and the resulting word weight of textrank can be normalized respectively, then take by advantage Weight of the average value of the two as keyword.
Optionally, in the present embodiment, the first object condition that need to be met using weight sieves the keyword of the page Choosing is best able to one of representing pages or some keywords as target keyword so as to filter out.First object item Part can be, but not limited to the condition that need to meet for the weight to the most representative keyword of the page, such as: the representative to the page Property highest N number of keyword be used as to the most representative keyword of the page, alternatively, falling into certain threshold value to the representativeness of the page Keyword in range is to the most representative keyword of the page.
Optionally, in the present embodiment, it can be, but not limited to obtain between the page by training deep learning model Similarity obtains the target keyword that an input parameter is two groups of pages using the sample training deep learning model of mark, Output parameter is the model of the similarity of two groups of pages, referred to as similarity model.When determining Page resemblance can directly by The first object keyword of the first page got and the second target keyword of second page are input in similarity model, The output valve of the similarity model got is the target pages similarity of first page and second page.
Optionally, in the present embodiment, the first keyword and the second keyword can be, but not limited to respectively include multiple passes Keyword, such as: it extracts in the page all with the keyword of practical significance.First object keyword and the second target keyword Can be, but not limited to include multiple words, such as: extract all words with practical significance 100 from the page, then this 100 Biggish 30 words of weight are obtained in a word as the corresponding target keyword of the page.
Optionally, in the present embodiment, the fusion of first page and second page can be, but not limited to be for the page In include entity merged.Entity fusion refers to that the information by multiple entities is integrated, and merging becomes an entity.It is real Body information can indicate { S, P, O } that S indicates main body with several triples, and P indicates that attribute, O indicate attribute value, and multiple entity fusion is Refer under conditions of same body S, the process all properties P of all entities, attribute value O being merged under a main body S.
In an optional embodiment, as shown in figure 3, it is (crucial to extract the first keyword from page A to be fused Word 1, keyword 2 ..., keyword 30), and the second keyword (keyword a, keyword are extracted from page B to be fused B ..., keyword m), determines corresponding first weight of each first keyword, and determines each second keyword corresponding the Two weights, from (keyword 1, keyword 2 ..., keyword 30) in extract the first weight meet the first of first object condition Target keyword has (keyword 3, keyword 6, keyword 17, keyword 22), and from (keyword a, keyword b ..., close The second target keyword that the second weight meets first object condition is extracted in keyword m) (keyword a, keyword d, keyword G, keyword k), according to (keyword 3, keyword 6, keyword 17, keyword 22) and (keyword a, keyword d, keyword g, Keyword k) determines the target pages similarity of first page and second page, meets the second target item in target pages similarity In the case where part, first page and second page are merged.
As it can be seen that by above-mentioned apparatus, by from the keyword extracted in the page weight meet the key of first object condition Word is determined as the target keyword of the page, so that the keyword for playing role of delegate to the page is extracted from the page, according to One page and the corresponding target keyword of second page determine target pages similarity between the two, if the page object Face similarity meets the second goal condition, then can merge first page and second page, thus according to being capable of representing pages The crucial word judgment page similarity degree, then by similarity degree meet condition the page merge, realizing the automatic of the page While fusion, the similarity between the page is determined according to the keyword for capableing of representing pages, so that sentencing to page similitude It is fixed more accurate, to improve the accuracy merged to the page, improved when being merged to the page to realize The technical effect of fusion efficiencies, and then solve the lower technology of fusion efficiencies when merging in the related technology to the page and ask Topic.
As a kind of optional scheme, the first determining module includes:
Acquiring unit, for according to have the keyword of corresponding relationship to and Page resemblance, it is crucial to obtain first object Word and the corresponding Page resemblance of the second target keyword;
First determination unit, for determining first object keyword and the corresponding Page resemblance of the second target keyword For target pages similarity.
Optionally, in the present embodiment, keyword can be pre-established to the corresponding relationship between Page resemblance, then First object keyword and the corresponding Page resemblance of the second target keyword are obtained from the corresponding relationship, determine it as mesh Mark Page resemblance.
Optionally, in the present embodiment, with corresponding relationship keyword to and Page resemblance can be, but not limited to be The corresponding relationship stored in a tabular form, with keyword to for key value, with Page resemblance for value value, to store key- The form of value key-value pair stores above-mentioned corresponding relationship in the table.It is similar to the corresponding page that keyword is searched from table When spending, can the corresponding key-value pair of first object keyword to be stored in lookup table, then search second in these key-value pairs The value value is determined as target pages similarity by the corresponding value value of target keyword.
Such as: in an optional embodiment, as shown in table 2, store the keyword with corresponding relationship to Page resemblance, in the table search keyword to [(A1, A2, A3, A4, A5, A6, A7), (A2, A3, A5, A7, A9, A10, A19)] when corresponding Page resemblance, it is corresponding to first look for first object keyword (A1, A2, A3, A4, A5, A6, A7) Related objective keyword, and obtain corresponding Page resemblance, for example, [(A1, A2, A3, A5, A6, A7, A8), 75%], [(A2, A3, A5, A7, A9, A10, A19), 40%], [(A1, A2, A3, A4, A5, A8, A9), 62.5%], then from upper It states and finds the second target keyword (A2, A3, A5, A7, A9, A10, A19) and the corresponding page phase of acquisition in corresponding relationship It is 40% like degree, then can be determined as target pages similarity for 40%.
Table 2
As a kind of optional scheme, acquiring unit includes:
Subelement is inputted, for first object keyword and the second target keyword to be inputted similarity model, wherein phase Like degree model be using the keyword with corresponding relationship to and the obtained model of Page resemblance training deep learning model;
Subelement is obtained, for obtaining the destination probability value of similarity model output, destination probability value is determined as target Page resemblance, wherein probability value is used to indicate first page and second page is the probability of similar pages.
Optionally, in the present embodiment, can be used the keyword with corresponding relationship to and Page resemblance as instruction Practice sample training deep learning model, to obtain with target keyword for input parameter, using probability value as the phase of output parameter Like degree model, first object keyword and the second target keyword are input in similarity model, similarity model is got The destination probability value of output, and the destination probability value is determined as target pages similarity.
Optionally, in the present embodiment, above-mentioned deep learning model can be, but not limited to include convolutional neural networks (Convolutional Neural Network, referred to as CNN) model.CNN model is a kind of feedforward neural network, including volume Lamination (convolutional layer) and pond layer (pooling layer), its artificial neuron can respond a part Surrounding cells in coverage area have large-scale image procossing outstanding performance.Alternatively, above-mentioned deep learning model can with but It is not limited to include VGG model etc..
As a kind of optional scheme, above-mentioned apparatus further include:
Processing module, for obtaining page sample set, and according to the reality for including in the page title information of each page Body carries out a point bucket to the page in page sample set, obtains the entity with corresponding relationship and page set;
Third extraction module, for extracting keyword to each page in page set in each point of bucket respectively;
Second determining module, for third weight in the keyword of each page to be met first in each point of bucket respectively The keyword of goal condition is determined as the corresponding keyword set of each page;
Third determining module is used in each point of bucket, by the matched Page resemblance of keyword set in each page It is determined as the first similarity value, and the Page resemblance of keyword set mismatch or Incomplete matching in each page is determined For the second similarity value;
Module is established, for establishing the correspondence of matched keyword set and the first similarity value in each point of bucket respectively Relationship, and mismatch or Incomplete matching keyword set and the second similarity corresponding relationship;
Module is obtained, for obtaining the matched keyword with corresponding relationship of the first quantity from each point of bucket respectively The unmatched keyword set and the second similarity with corresponding relationship of set and the first similarity value and the second quantity Value, obtain having the keyword of corresponding relationship to and Page resemblance;
Training module, for use the keyword with corresponding relationship to and Page resemblance training deep learning model obtain To similarity model.
Optionally, in the present embodiment, from the point of view of data processing, objective things in the real world are known as entity, It is it is any in real world distinguish, identifiable things.Entity can refer to people, such as teacher, student, can also refer to object, Such as book, warehouse.It can not only refer to the objective objects that can be touched, and can also refer to abstract event, such as performance, football match.
Optionally, in the present embodiment, according to the entity for including in page title information to the page in page sample set Face carries out a point bucket, by identical entity division into identical point of bucket, obtains the entity with corresponding relationship and page set.Again Training sample is obtained from each point of bucket respectively, so that obtained training sample is evenly distributed, to improve similarity model Accuracy rate.
Optionally, in the present embodiment, the page of the first quantity Matching is obtained under each point of bucket respectively to as training The positive sample of similarity model obtains the unmatched page of the second quantity to the negative sample as training similarity model.
As a kind of optional scheme, third determining module includes:
Extraction unit, for extracting characteristic information from each keyword set;
Second determination unit will for the identical keyword set of characteristic information to be determined as matched keyword set The different keyword set of characteristic information is determined as unmatched keyword set;
Third determination unit, for determining corresponding first similarity value of matched keyword set, unmatched keyword Gather corresponding second similarity value.
Optionally, in the present embodiment, determining whether two pages match in each point of bucket respectively can be, but not limited to It is to be matched according to stringent condition.Such as: when matching obtains sibling species subpage, from the physical page of identical point of bucket, extract Positive sample of the clearly identical physical page of a part as training data.As shown in figure 4, the brief introduction page one and the letter of Liu Jie's page two is the essential information of Liu.To guarantee that the page describes same entity really, i.e., it should merge, need to make It uses the key message of the page as characteristic information, determines if to exactly match.For example, " date of birth " of two pages and " blood group " is all identical, then confirm the two pages be it is matched, using the two pages as the positive sample of similarity calculation.
Optionally, in the present embodiment, under bucket, random negative sampling generates training set.Since similarity calculation is only same It carries out, is consistent when in order to be predicted with similarity, when selecting negative sample also in physical page of the same name in name physical page It carries out.With the positive example page for choosing the similarity page to corresponding, when choosing the negative example of the page, with page critical field (i.e. feature Information) mismatch alternatively condition.Such as: as shown in figure 5, having 4 two pages of the page three and the page, the two pages In " date of birth " and " blood group " that extracts it is not exactly the same, then confirm the two pages be it is unmatched, by the two pages Negative sample of the face as similarity calculation.
Optionally, in the present embodiment, the first similarity value can be, but not limited to be 1, the second similarity value can with but not It is limited to be 0.The output parameter of the similarity model obtained after then training is the probability value between one 0 to 1.Alternatively, can also be with First similarity value is set as 100, the second similarity value is set as 0, then the output parameter of the similarity model obtained after training is Numerical value between one 0 to 100, it is also assumed that being probability value.
It should be noted that above-mentioned first similarity value and the second similarity value are an example, it is right in the present embodiment This is not construed as limiting.
Optionally, in the present embodiment, the keyword set obtained above with corresponding relationship to and similarity value can To indicate that wherein P1, P2 are the corresponding id of physical page, and Labe representation page is with the form of triple { P1, P2, Label } No similar, value is 0 (dissmilarity) or 1 (similar), such as { 121,122,0 }, { 121,123,1 }.
As a kind of optional scheme, training module includes:
Training unit, for using matched keyword set and unmatched keyword set as deep learning model Input value carries out deep learning model using the first similarity value and the second similarity value as the output valve of deep learning model Training;
4th determination unit, the deep learning model for obtaining after training are determined as similarity model.
Optionally, in the present embodiment, as shown in fig. 6, by taking deep learning model is CNN model as an example, by above-mentioned ternary Then the corresponding page of P1 and P2 in group passes through convolution, the pond of CNN respectively as two channel layers of CNN mode input layer After the processing such as change, full connection, output model predicted value finally calculates backpropagation after error with Label.Model overall architecture with Based on CNN, it converts physical page to two input channels of CNN.
Optionally, in the present embodiment, as shown in fig. 7, carrying out keyword extraction to two candidate pages, N (N etc. is obtained In term vector dimension) a keyword filled if page key words deficiency is N number of with spcial character, and the page each in this way can To be represented with keyword, then, keyword is replaced with the corresponding term vector of keyword, converts two-dimentional term vector square for keyword Battle array, two-dimensional matrix is finally merged, obtains the three-dimensional matrice of N*N*2, which corresponds to the input of CNN.In this way into Row conversion can efficiently use the natural language statement habit of people.
As a kind of optional scheme, the second extraction module includes:
First extraction unit, it is crucial higher than the first object of target weight for extracting the first weight from the first keyword Word, and second target keyword of second weight higher than target weight is extracted from the second keyword;Alternatively,
Second extraction unit, for being ranked up from big to small to the first keyword according to the first weight, and according to second Weight is from big to small ranked up the second keyword;Ranking is extracted from the first keyword after sequence in preceding third quantity Keyword extracts ranking in the keyword of preceding third quantity as first object keyword, and from the second keyword after sequence As the second target keyword.
Optionally, in the present embodiment, first object condition can be a threshold range, alternatively, being also possible to one The range of sequence.For extracting first object keyword, the first keyword (keyword 1, keyword 2 ..., keyword 30) Corresponding first weight is respectively 0.7,0.4 ..., 0.88, a kind of mode, which can be, extracts the of the first weight higher than 0.65 For one keyword as first object keyword, then the first object keyword got is keyword 1, keyword 5, keyword 10 ..., keyword 30.Another middle mode, which can be, arranges the first keyword by the sequence of the first weight from big to small Sequence: keyword 30, keyword 7, keyword 28 ..., keyword 2, from the first keyword after sequence extract ranking preceding For 10 keyword as the first object keyword, then the first object keyword got may are as follows: keyword 30, key Word 7, keyword 28, keyword 3, keyword 17, keyword 9, keyword 22, keyword 14, keyword 3, keyword 6.
As a kind of optional scheme, Fusion Module is used for:
Under the higher expression first page of target pages similarity and the more similar situation of second page, target pages are determined It is that target pages similarity meets the second goal condition that similarity, which is higher than first object similarity,;Alternatively,
Under the lower expression first page of target pages similarity and the more similar situation of second page, target pages are determined Similarity is that target pages similarity meets the second goal condition lower than the second target similarity.
Optionally, in the present embodiment, two more similar pages are merged, then according to target pages similarity Represented meaning is determined to the second goal condition that the page of fusion need to meet.If target pages similarity height indicates two A page is similar, then the similarity for the page that can be merged need to be higher than first object similarity, if target pages similarity is low Indicate that two pages are similar, then the similarity for the page that can be merged need to be lower than the second target similarity.
The application environment of the embodiment of the present invention can be, but not limited to referring to the application environment in above-described embodiment, the present embodiment In this is repeated no more.The embodiment of the invention provides the optional tools of one kind of the connection method for implementing above-mentioned real time communication Body application example.
As a kind of optional embodiment, the fusion method of the above-mentioned page can be, but not limited to be applied to as shown in Figure 9 In the scene merged to the page.In this scene, as shown in figure 9, the process merged to the page includes the following steps:
Step 1, the page carries out a point bucket according to physical name in title.
Step 2, page text is segmented, extracts keyword.To the brief introduction part of physical page or the page other short texts The text being spliced takes head primary word as the page using the keyword extraction after participle according to the weight sequencing of word It represents.
Step 3, according to stringent condition, sibling species subpage is obtained.In order to extract the training data of model, from identical point of bucket In physical page, positive sample of the clearly identical physical page of a part as training data is extracted.
Step 4, under bucket, random negative sampling generates training set.When selecting negative sample also in physical page of the same name into Row.It is corresponding with the positive example page pair of the similarity page is chosen, when choosing the negative example of the page, with the mismatch of page critical field Alternatively condition.
Step 5, training CNN model.There is the positive negative sample of step 3 Yu step 4, so that it may start to carry out the instruction of model Practice.And the model that can be used after training merges the page.
Another aspect according to an embodiment of the present invention additionally provides a kind of for implementing the electronics of the fusion of the above-mentioned page Device, as shown in Figure 10, the electronic device include: one or more (one is only shown in figure) processors 1002, memory 1004, sensor 1006, encoder 1008 and transmitting device 1010 are stored with computer program in the memory, the processing Device is arranged to execute the step in any of the above-described embodiment of the method by computer program.
Optionally, in the present embodiment, above-mentioned electronic device can be located in multiple network equipments of computer network At least one network equipment.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
S1 extracts the first keyword from first page to be fused, and is extracted from second page to be fused Two keywords;
S2, extracts the first object keyword that the first weight meets first object condition from the first keyword, and from The second target keyword that the second weight meets first object condition is extracted in two keywords;
S3 determines the target pages of first page and second page according to first object keyword and the second target keyword Similarity;
S4 merges first page and second page in the case where target pages similarity meets the second goal condition.
Optionally, it will appreciated by the skilled person that structure shown in Fig. 10 is only to illustrate, electronic device can also To be smart phone (such as Android phone, iOS mobile phone), tablet computer, palm PC and mobile internet device The terminal devices such as (Mobile Internet Devices, MID), PAD.Figure 10 it does not make to the structure of above-mentioned electronic device At restriction.For example, electronic device may also include more or less component (such as network interface, display dress than shown in Figure 10 Set), or with the configuration different from shown in Figure 10.
Wherein, memory 1002 can be used for storing software program and module, such as melting for the page in the embodiment of the present invention Close the corresponding program instruction/module of method and apparatus, the software journey that processor 1004 is stored in memory 1002 by operation Sequence and module realize the control method of above-mentioned target element thereby executing various function application and data processing.It deposits Reservoir 1002 may include high speed random access memory, can also include nonvolatile memory, such as one or more magnetic storage Device, flash memory or other non-volatile solid state memories.In some instances, memory 1002 can further comprise opposite In the remotely located memory of processor 1004, these remote memories can pass through network connection to terminal.Above-mentioned network Example includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Above-mentioned transmitting device 1010 is used to that data to be received or sent via a network.Above-mentioned network specific example It may include cable network and wireless network.In an example, transmitting device 1010 includes a network adapter (Network Interface Controller, NIC), can be connected by cable with other network equipments with router so as to interconnection Net or local area network are communicated.In an example, transmitting device 1010 is radio frequency (Radio Frequency, RF) module, For wirelessly being communicated with internet.
Wherein, specifically, memory 1002 is for storing application program.
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
S1 extracts the first keyword from first page to be fused, and is extracted from second page to be fused Two keywords;
S2, extracts the first object keyword that the first weight meets first object condition from the first keyword, and from The second target keyword that the second weight meets first object condition is extracted in two keywords;
S3 determines the target pages of first page and second page according to first object keyword and the second target keyword Similarity;
S4 merges first page and second page in the case where target pages similarity meets the second goal condition.
Optionally, storage medium is also configured to store for executing step included in the method in above-described embodiment Computer program, this is repeated no more in the present embodiment.
Optionally, in the present embodiment, those of ordinary skill in the art will appreciate that in the various methods of above-described embodiment All or part of the steps be that the relevant hardware of terminal device can be instructed to complete by program, the program can store in In one computer readable storage medium, storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), disk or CD etc..
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention State all or part of the steps of method.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed client, it can be by others side Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (15)

1. a kind of fusion method of the page characterized by comprising
The first keyword is extracted from first page to be fused, and the second key is extracted from second page to be fused Word;
The first object keyword that the first weight meets first object condition is extracted from first keyword, and is closed from second The second target keyword that the second weight meets the first object condition is extracted in keyword;
The first page and the second page are determined according to the first object keyword and second target keyword Target pages similarity;
In the case where the target pages similarity meets the second goal condition, by the first page and the second page Fusion.
2. the method according to claim 1, wherein according to the first object keyword and second target Keyword determines that the target pages similarity of the first page and the second page includes:
According to the keyword with corresponding relationship to and Page resemblance, obtain the first object keyword and second mesh Mark the corresponding Page resemblance of keyword;
The first object keyword and the corresponding Page resemblance of second target keyword are determined as the page object Face similarity.
3. according to the method described in claim 2, it is characterized in that, according to the keyword with corresponding relationship to similar with the page Degree, obtains the first object keyword and the corresponding Page resemblance of second target keyword includes:
The first object keyword and second target keyword are inputted into similarity model, wherein the similarity mould Type be using the keyword with corresponding relationship to and the obtained model of Page resemblance training deep learning model;
The destination probability value for obtaining the similarity model output, it is similar to be determined as the target pages for the destination probability value Degree, wherein the probability value is used to indicate the first page and the second page is the probability of similar pages.
4. according to the method described in claim 3, it is characterized in that, by the first object keyword and second target Keyword inputs before similarity model, the method also includes:
Page sample set is obtained, and according to the entity for including in the page title information of each page to the page sample set The page in conjunction carries out a point bucket, obtains the entity with corresponding relationship and page set;
Keyword is extracted to each page in the page set in each point of bucket respectively;
Each divide in bucket third weight in the keyword of each page is met into the first object condition described respectively Keyword be determined as the corresponding keyword set of each page;
Each divide in bucket described, the matched Page resemblance of keyword set described in each page is determined as first Similarity value, and the Page resemblance of the mismatch of keyword set described in each page or Incomplete matching is determined as Second similarity value;
Respectively it is described each divide in bucket establish the corresponding relationship of matched keyword set and the first similarity value, and not Match or the corresponding relationship of the keyword set of Incomplete matching and the second similarity;
Respectively from it is described each divide in bucket obtain the matched keyword set and the first phase with corresponding relationship of the first quantity Like angle value and the unmatched keyword set and the second similarity value with corresponding relationship of the second quantity, obtain described Keyword with corresponding relationship to and Page resemblance;
Using the keyword with corresponding relationship to and Page resemblance training deep learning model obtain the similarity Model.
5. according to the method described in claim 4, it is characterized in that, keyword set described in each page is matched The corresponding Page resemblance of two pages is determined as the first similarity value, and not by keyword set described in each page The corresponding Page resemblance of matched two pages is determined as the second similarity value and includes:
Characteristic information is extracted from each keyword set;
The identical keyword set of the characteristic information is determined as matched keyword set, the characteristic information is different Keyword set is determined as unmatched keyword set;
Determine that matched keyword set corresponds to first similarity value, unmatched keyword set corresponds to second phase Like angle value.
6. according to the method described in claim 4, it is characterized in that, using the keyword with corresponding relationship to and the page Similarity training deep learning model obtains the similarity model and includes:
Using the matched keyword set and the unmatched keyword set as the input of the deep learning model Value, using first similarity value and second similarity value as the output valve of the deep learning model to the depth Learning model is trained;
The deep learning model obtained after training is determined as the similarity model.
7. the method according to claim 1, wherein extracting the first weight from first keyword meets the The first object keyword of one goal condition, and extract the second weight from the second keyword and meet the first object condition Second target keyword includes:
The first object keyword that first weight is higher than target weight is extracted from first keyword, and is closed from second The second target keyword that the second weight is higher than the target weight is extracted in keyword;Alternatively,
First keyword is ranked up from big to small according to first weight, and according to second weight from greatly to It is small that second keyword is ranked up;Ranking is extracted from first keyword after sequence in the pass of preceding third quantity Keyword extracts ranking in the preceding third number as the first object keyword, and from second keyword after sequence The keyword of amount is as second target keyword.
8. the method according to claim 1, wherein the target pages similarity meets the second target item Part includes:
Under the higher expression first page and the more similar situation of the second page of the target pages similarity, determine It is that the target pages similarity meets second goal condition that the target pages similarity, which is higher than first object similarity,; Alternatively,
Under the lower expression first page and the more similar situation of the second page of the target pages similarity, determine The target pages similarity is that the target pages similarity meets second goal condition lower than the second target similarity.
9. a kind of fusing device of the page characterized by comprising
First extraction module, for extracting the first keyword from first page to be fused, and from second page to be fused The second keyword is extracted in face;
Second extraction module, the first object for meeting first object condition for extracting the first weight from first keyword Keyword, and extract from the second keyword the second target keyword that the second weight meets the first object condition;
First determining module, for determining the first page according to the first object keyword and second target keyword The target pages similarity in face and the second page;
Fusion Module is used in the case where the target pages similarity meets the second goal condition, by the first page It is merged with the second page.
10. device according to claim 9, which is characterized in that first determining module includes:
Acquiring unit, for according to have the keyword of corresponding relationship to and Page resemblance, it is crucial to obtain the first object Word and the corresponding Page resemblance of second target keyword;
First determination unit is used for the first object keyword and the corresponding Page resemblance of second target keyword It is determined as the target pages similarity.
11. device according to claim 10, which is characterized in that the acquiring unit includes:
Subelement is inputted, for the first object keyword and second target keyword to be inputted similarity model, In, the similarity model be using the keyword with corresponding relationship to and Page resemblance training deep learning model obtain Model;
Subelement is obtained, for obtaining the destination probability value of the similarity model output, the destination probability value is determined as The target pages similarity, wherein the probability value is used to indicate the first page and the second page is similar page The probability in face.
12. device according to claim 11, which is characterized in that described device further include:
Processing module, for obtaining page sample set, and according to the entity pair for including in the page title information of each page The page in the page sample set carries out a point bucket, obtains the entity with corresponding relationship and page set;
Third extraction module, for extracting keyword to each page in the page set in each point of bucket respectively;
Second determining module, for each dividing in bucket described respectively and meeting third weight in the keyword of each page The keyword of the first object condition is determined as the corresponding keyword set of each page;
Third determining module, for respectively it is described each divide it is in bucket that keyword set described in each page is matched The corresponding Page resemblance of two pages is determined as the first similarity value, and not by keyword set described in each page The corresponding Page resemblance of matched two pages is determined as the second similarity value;
Establish module, for respectively it is described each divide in bucket establish the correspondence of matched keyword set and the first similarity value The corresponding relationship of relationship and unmatched keyword set and the second similarity;
Obtain module, for respectively from it is described each divide in bucket obtain the matched keyword with corresponding relationship of the first quantity The unmatched keyword set and the second similarity with corresponding relationship of set and the first similarity value and the second quantity Value, obtain the keyword with corresponding relationship to and Page resemblance;
Training module, for use the keyword with corresponding relationship to and Page resemblance training deep learning model obtain To the similarity model.
13. device according to claim 12, which is characterized in that third determining module includes:
Extraction unit, for extracting characteristic information from each keyword set;
Second determination unit will for the identical keyword set of the characteristic information to be determined as matched keyword set The different keyword set of the characteristic information is determined as unmatched keyword set;
Third determination unit, for determining that matched keyword set corresponds to first similarity value, unmatched keyword Corresponding second similarity value of set.
14. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer Program is arranged to execute method described in any one of claim 1 to 8 when operation.
15. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to execute side described in any one of claim 1 to 8 by the computer program Method.
CN201810456491.9A 2018-05-14 2018-05-14 Page fusion method and device, storage medium and electronic device Active CN110162356B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810456491.9A CN110162356B (en) 2018-05-14 2018-05-14 Page fusion method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810456491.9A CN110162356B (en) 2018-05-14 2018-05-14 Page fusion method and device, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN110162356A true CN110162356A (en) 2019-08-23
CN110162356B CN110162356B (en) 2021-09-28

Family

ID=67644902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810456491.9A Active CN110162356B (en) 2018-05-14 2018-05-14 Page fusion method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN110162356B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4431744B2 (en) * 2004-06-07 2010-03-17 独立行政法人情報通信研究機構 Web page information fusion display device, web page information fusion display method, web page information fusion display program, and computer-readable recording medium recording the program
CN101706790A (en) * 2009-09-18 2010-05-12 浙江大学 Clustering method of WEB objects in search engine
CN102323954A (en) * 2011-09-14 2012-01-18 杨继能 Search engine technology for integrating webpage resource through internally installing auxiliary browser windows
CN102693304A (en) * 2012-05-22 2012-09-26 北京邮电大学 Search engine feedback information processing method and search engine
CN103246719A (en) * 2013-04-27 2013-08-14 北京交通大学 Web-based network information resource integration method
CN103345476A (en) * 2013-06-09 2013-10-09 北京百度网讯科技有限公司 Method and device for determining present information corresponding to destination page
CN103744683A (en) * 2014-01-24 2014-04-23 中科创达软件股份有限公司 Information fusion method and device
CN103902596A (en) * 2012-12-28 2014-07-02 中国电信股份有限公司 High-frequency page content clustering method and system
CN103955529A (en) * 2014-05-12 2014-07-30 中国科学院计算机网络信息中心 Internet information searching and aggregating presentation method
CN105159881A (en) * 2015-08-28 2015-12-16 北京奇艺世纪科技有限公司 Method and device for polymerizing data module in page
CN106303613A (en) * 2015-06-29 2017-01-04 中兴通讯股份有限公司 page fusion method and device
CN106407195A (en) * 2015-07-28 2017-02-15 北京京东尚科信息技术有限公司 Method and system for eliminating duplication of webpage
CN107577671A (en) * 2017-09-19 2018-01-12 中央民族大学 A kind of key phrases extraction method based on multi-feature fusion

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4431744B2 (en) * 2004-06-07 2010-03-17 独立行政法人情報通信研究機構 Web page information fusion display device, web page information fusion display method, web page information fusion display program, and computer-readable recording medium recording the program
CN101706790A (en) * 2009-09-18 2010-05-12 浙江大学 Clustering method of WEB objects in search engine
CN102323954A (en) * 2011-09-14 2012-01-18 杨继能 Search engine technology for integrating webpage resource through internally installing auxiliary browser windows
CN102693304A (en) * 2012-05-22 2012-09-26 北京邮电大学 Search engine feedback information processing method and search engine
CN103902596A (en) * 2012-12-28 2014-07-02 中国电信股份有限公司 High-frequency page content clustering method and system
CN103246719A (en) * 2013-04-27 2013-08-14 北京交通大学 Web-based network information resource integration method
CN103345476A (en) * 2013-06-09 2013-10-09 北京百度网讯科技有限公司 Method and device for determining present information corresponding to destination page
CN103744683A (en) * 2014-01-24 2014-04-23 中科创达软件股份有限公司 Information fusion method and device
CN103955529A (en) * 2014-05-12 2014-07-30 中国科学院计算机网络信息中心 Internet information searching and aggregating presentation method
CN106303613A (en) * 2015-06-29 2017-01-04 中兴通讯股份有限公司 page fusion method and device
CN106407195A (en) * 2015-07-28 2017-02-15 北京京东尚科信息技术有限公司 Method and system for eliminating duplication of webpage
CN105159881A (en) * 2015-08-28 2015-12-16 北京奇艺世纪科技有限公司 Method and device for polymerizing data module in page
CN107577671A (en) * 2017-09-19 2018-01-12 中央民族大学 A kind of key phrases extraction method based on multi-feature fusion

Also Published As

Publication number Publication date
CN110162356B (en) 2021-09-28

Similar Documents

Publication Publication Date Title
CN111125422B (en) Image classification method, device, electronic equipment and storage medium
CN111611436B (en) Label data processing method and device and computer readable storage medium
CN108268441A (en) Sentence similarity computational methods and apparatus and system
CN110737783A (en) method, device and computing equipment for recommending multimedia content
CN111506820B (en) Recommendation model, recommendation method, recommendation device, recommendation equipment and recommendation storage medium
CN112580352B (en) Keyword extraction method, device and equipment and computer storage medium
CN111046158B (en) Question-answer matching method, model training method, device, equipment and storage medium
CN110377789A (en) For by text summaries and the associated system and method for content media
CN113761105A (en) Text data processing method, device, equipment and medium
CN113486173B (en) Text labeling neural network model and labeling method thereof
CN111625715A (en) Information extraction method and device, electronic equipment and storage medium
CN114201516B (en) User portrait construction method, information recommendation method and related devices
CN112749556B (en) Multi-language model training method and device, storage medium and electronic equipment
CN114398973B (en) Media content tag identification method, device, equipment and storage medium
JP7181999B2 (en) SEARCH METHOD AND SEARCH DEVICE, STORAGE MEDIUM
CN114298122A (en) Data classification method, device, equipment, storage medium and computer program product
CN115114395A (en) Content retrieval and model training method and device, electronic equipment and storage medium
CN111460783A (en) Data processing method and device, computer equipment and storage medium
Zhang et al. Online modeling of esthetic communities using deep perception graph analytics
CN110209860B (en) Template-guided interpretable garment matching method and device based on garment attributes
CN110110218A (en) A kind of Identity Association method and terminal
CN114049174A (en) Method and device for commodity recommendation, electronic equipment and storage medium
CN114090880A (en) Method and device for commodity recommendation, electronic equipment and storage medium
Lu et al. Web multimedia object classification using cross-domain correlation knowledge
CN114330476A (en) Model training method for media content recognition and media content recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant