CN110020303A - Determine the alternative method, apparatus and storage medium for showing content - Google Patents

Determine the alternative method, apparatus and storage medium for showing content Download PDF

Info

Publication number
CN110020303A
CN110020303A CN201711188237.7A CN201711188237A CN110020303A CN 110020303 A CN110020303 A CN 110020303A CN 201711188237 A CN201711188237 A CN 201711188237A CN 110020303 A CN110020303 A CN 110020303A
Authority
CN
China
Prior art keywords
vocabulary
page
content
offline
lexicon
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711188237.7A
Other languages
Chinese (zh)
Inventor
张军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201711188237.7A priority Critical patent/CN110020303A/en
Publication of CN110020303A publication Critical patent/CN110020303A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Abstract

The alternative method, apparatus and storage medium for showing content is determined the invention discloses a kind of, belongs to Internet technical field.Method includes: the page key words for including in the page for obtain current presentation;Obtain off-line content database, off-line content database includes each designated key word and synonymous keyword for showing content, it is the corresponding keyword for showing curriculum offering that designated key word, which is in off-line phase, and synonymous keyword is the keyword for obtain after synonym extension to designated key word in off-line phase;Off-line content database is inquired according to page key words, obtains the corresponding displaying content of page key words, the alternative displaying content as the page.The present invention carries out synonym extension without online, had both extended the range of keyword, and had enhanced diversity, and had also saved the time, improves and determines the alternative efficiency for showing content, and then improves the efficiency launched and show content.

Description

Determine the alternative method, apparatus and storage medium for showing content
Technical field
The present invention relates to Internet technical fields, in particular to a kind of to determine the alternative method, apparatus for showing content and deposit Storage media.
Background technique
The information spread scope of internet is wide, at low cost, high-efficient, has become most common dispensing advertising channel at present, Many advertisers, which can select to launch on internet page, shows content.And for each page, facing to numerous advertisements When the displaying content mainly launched, how to determine that suitable alternative displaying content becomes urgent problem to be solved.
Determine that alternative displaying content is that a kind of common mode can when advertiser will launch displaying content according to keyword With at least one keyword of appointment display content, it is stored in and shows in content data base, each page can also be preset In include page key words, be stored in page data library.So, in application process, available currently to show Any page multiple page key words for being included, and using fuzzy matching algorithm multiple page key words are carried out same Adopted word extension, to get more keywords.Content data base is shown according to these keyword queries, obtains these passes The corresponding displaying content of keyword, alternately shows content, these alternatively show that content may be considered and the page in the page Keyword relevant displaying content in face that is to say and the matched displaying content of the page.
During realizing the embodiment of the present invention, inventor has found the relevant technologies the prior art has at least the following problems: above-mentioned true It needs to carry out synonym extension, synonym expansion process consumption to page key words online during fixed alternative displaying content The time taken is longer, can reduce and determine the alternative efficiency for showing content, and then reduce the efficiency launched and show content.
Summary of the invention
The alternative method, apparatus and storage medium for showing content is determined the embodiment of the invention provides a kind of, can solve The problems in the relevant technologies.The technical solution is as follows:
In a first aspect, providing a kind of method for determining alternative displaying content, which comprises
Obtain the page key words for including in the page of current presentation;
Obtain off-line content database, the off-line content database include it is each show content designated key word and Synonymous keyword, it is the corresponding keyword for showing curriculum offering, the synonymous key that the designated key word, which is in off-line phase, Word is the keyword for obtain after synonym extension to the designated key word in the off-line phase;
The off-line content database is inquired according to the page key words, obtains the corresponding exhibition of the page key words Show content, the alternative displaying content as the page.
Second aspect, provides a kind of device for determining alternative displaying content, and described device includes:
Page key words obtain module, the page key words for including in the page for obtaining current presentation;
Database obtains module, and for obtaining off-line content database, the off-line content database includes each displaying The designated key word and synonymous keyword of content, the designated key word are to show curriculum offering in off-line phase to be corresponding Keyword, the synonymous keyword are obtained after the off-line phase carries out synonym extension to the designated key word Keyword;
Enquiry module obtains the page and closes for inquiring the off-line content database according to the page key words The corresponding displaying content of keyword, the alternative displaying content as the page.
The third aspect provides a kind of device for determining alternative displaying content, and described device includes processor and storage Device, is stored at least one instruction, at least a Duan Chengxu, code set or instruction set in the memory, described instruction, described Program, the code set or described instruction collection are loaded by the processor and are executed to realize determination as described in relation to the first aspect Performed operation in the alternative method for showing content.
Fourth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium At least one instruction, at least a Duan Chengxu, code set or instruction set, described instruction, described program, the code set or described Instruction set is loaded by processor and is executed to realize that it is performed in the method for content that determination as described in relation to the first aspect is alternatively shown Operation.
Technical solution provided in an embodiment of the present invention has the benefit that
Method, apparatus provided in an embodiment of the present invention and storage medium, by advance to the designated key word for showing content Synonym extension is carried out, off-line content database is established, it is ensured that on-line stage can include according in the current presentation page Page key words directly inquire corresponding displaying content, carry out synonym extension without online, both extend keyword Range enhances diversity, also saves and determines the time consumed by alternative displaying content process, improves and determine alternative exhibition Show the efficiency of content, and then improves the efficiency launched and show content.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, required in being described below to embodiment The attached drawing used is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings Other attached drawings.
Fig. 1 is a kind of structural schematic diagram of implementation environment provided in an embodiment of the present invention;
Fig. 2 is the structural schematic diagram of another implementation environment provided in an embodiment of the present invention;
Fig. 3 is a kind of flow chart of method for establishing off-line content database provided in an embodiment of the present invention;
Fig. 4 is a kind of flow chart for determining the alternative method for showing content provided in an embodiment of the present invention;
Fig. 5 is a kind of illustrative operational flowchart provided in an embodiment of the present invention;
Fig. 6 is a kind of structural schematic diagram of server provided in an embodiment of the present invention;
Fig. 7 is a kind of operational flowchart of offline index module provided in an embodiment of the present invention;
Fig. 8 is a kind of operational flowchart of default training pattern provided in an embodiment of the present invention;
Fig. 9 is a kind of operational flowchart of online query module provided in an embodiment of the present invention;
Figure 10 is a kind of device for determining alternative displaying content provided in an embodiment of the present invention;
Figure 11 is a kind of structural schematic diagram of server provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other reality obtained by those of ordinary skill in the art without making creative efforts Example is applied, shall fall within the protection scope of the present invention.
Before being described in detail, first to the present embodiments relate to concept carry out description below:
ADX (Advertisement Exchange Platform, advertisement transaction platform): being a kind of internet of opening Advertisement transaction platform can contact the owning side of advertiser and advertisement position, and two sides is helped to complete to launch the transaction of advertisement.When having When advertisement position is exposed to user, advertisement can be launched onto the advertisement position, show user.
DSP (Demand-Side Platform, party in request's platform): being a kind of online advertisement platform, DSP is served extensively Master is accused, provides for advertiser across medium, cross-platform advertisement launching platform, advertiser is helped to carry out advertisement throwing on the internet It puts, can be precisely calculated according to the multiple advertisements and multiple advertisement positions being collected into, formulate and launch strategy, launch advertisement And form dispensing report.
Show content: can be used as advertisement launch in a certain display location of the page, as video content, audio content, Content of text etc..
Fig. 1 is a kind of structural schematic diagram of implementation environment provided in an embodiment of the present invention, referring to Fig. 1, the implementation environment packet Server 101 and at least one terminal 102 are included, passes through network connection between server 101 and at least one terminal 102.
Wherein, each terminal 102 is for showing a plurality of types of pages, such as Webpage, the video playing page, institute It may include display location in the page of displaying, displaying content can be launched in these display locations, such as the sky in the page lower right corner White region can be used as display location, can show the video content of dispensing in the display location.
Server 101 is used to collect displaying content, the designated key word for showing content and the synonymous key of supplier's offer Word, and server 101 is also used in 102 displayed page of terminal, according to the page key words and exhibition for including in the page Show the corresponding keyword of content, chooses the alternative displaying content that launch on the page.
In a kind of possible implementation, referring to fig. 2, server 101 may include launching server 1011, strategy clothes Business device 1012 and keyword server 1013 are launched and are serviced by network connection between terminal 102 and dispensing server 1011 By network connection between device 1011 and strategic server 1012, between strategic server 1012 and keyword server 1013 Pass through network connection.
Wherein, server 1011 is launched to be used to show content, strategic server for the page dispensing currently shown 1012 can launch the alternative displaying content on the page for determination, are sent to and launch server 1011, by dispensing server 1011 determine which to be launched on the page shows content.Keyword server 1013 is used to store the exhibition of supplier's offer Show the page key words for including in the keyword and each page of content and displaying content.
When user opens a certain page on the terminal 102, terminal 102 sends displaying request to server 1011 is launched, Displaying request carries page iden-tity, and launching server can will show that request is transmitted to strategic server 1012, by policy service Device 1012 is inquired according to the keyword stored in keyword server 1013, is determined in the page of current presentation and is included Page key words, and determine the corresponding displaying content of the page key words as the alternative displaying content of the page and return to throwing Server 1011 is put, shows that selection will launch the displaying content onto the page in content from alternative by dispensing server 1011.
Wherein, strategic server 1012 can be DSP platform server, and launching server 1011 can take for ADX platform Business device or strategic server 1012 and dispensing server 1011 or other servers.
The embodiment of the present invention can be applied under the several scenes for showing content for web page recommendation, such as in page upslide Putting advertisement, being the recommendation page is user's recommendation etc..
Fig. 3 is a kind of flow chart of method for establishing off-line content database provided in an embodiment of the present invention.The present invention is real The executing subject of example is applied as the server 101 in implementation environment shown in FIG. 1, establishes off-line content data in off-line phase The process in library is illustrated, referring to Fig. 3, this method comprises:
301, in off-line phase, server obtains the designated key word for showing content and showing content.
Wherein, which may include the multiple types such as content of text, video content, audio content, in the displaying Hold and the designated key word of the displaying content is determined by supplier, can be selected from the multiple keywords provided by supplier Select, can also be customized, the designated key word can for show the relevant keyword of content, or with show mentioning for content The relevant keyword of donor.For example, the displaying content is a kind of advertisement of beverage, the designated key word can for " beverage " or Person may be the title of drink producers.
In a kind of possible implementation, supplier makes show content after, the terminal that can be held by supplier The designated key word of the displaying content is set, and will show content and designated key word upload onto the server in or supplier It will show that content and designated key word are supplied to the administrative staff of server, be configured in the server by administrative staff.
It should be noted that being launched on the page before showing content, carries out acquisition and show that content, setting are crucial Word, foundation show a series of preparation operation such as content data base, and the stage for carrying out these preparation operations can be referred to as offline Stage.
302, server obtains offline lexicon.
Wherein, offline lexicon includes pre-set multiple vocabulary, can be to any vocabulary according to the offline lexicon Synonym extension is carried out, the synonymous vocabulary of the vocabulary is obtained.
In a kind of possible implementation, includes not only multiple vocabulary in offline lexicon, further include the word of each vocabulary The term vector of vector, different vocabulary is different, and the term vector of two vocabulary is more similar, indicates that the meaning of two vocabulary is more close, The term vector of two vocabulary is more dissimilar, indicates that the meaning difference of two vocabulary is bigger.Therefore, according to the word of any vocabulary to The term vector of amount and other vocabulary, can determine the synonymous vocabulary of the vocabulary.
In the embodiment of the present invention, which can be carried out a large amount of vocabulary by using default training algorithm Training obtains, which can be gradient descent algorithm, deep neural network algorithm, be based on Hierarchical CBOW (Continuous Bag-of-Words, continuous bag of words) model of Softmax (layered structure) etc., can be used as one Training pattern.In practical application, can first get a large amount of keyword, using default training algorithm be trained to obtain from Line lexicon can also obtain newly-increased vocabulary in addition during using the offline lexicon, at this time without re-starting Training, but can instruct existing offline lexicon together with newly-increased vocabulary using newly-increased vocabulary as increment sample Practice, obtains updated offline lexicon.Wherein, after newly-increased vocabulary can be for from the offline lexicon last time, training be completed Collected vocabulary, such as emerging hot topic vocabulary in network.
Optionally, by taking offline lexicon is the first offline lexicon as an example, the first offline lexicon includes the first quantity The term vector of vocabulary and each vocabulary is calculated when getting the first offline lexicon and newly-increased vocabulary using default training Method is trained the first offline lexicon and newly-increased vocabulary, obtains the updated second offline lexicon, the second offline word Remittance library includes the vocabulary of the second quantity and the term vector of each vocabulary, and the vocabulary of the second quantity includes the vocabulary of the first quantity With newly-increased vocabulary, the term vector of newly-increased vocabulary can be both trained, but also can be to the word of the vocabulary of original first quantity Vector is modified, and improves the accuracy of term vector.
By taking the CBOW model based on Hierarchical Softmax as an example, which includes three layers: input layer, projection Layer and output layer obtain vocabulary w by input layer first for any vocabulary w in the first offline lexicon and newly-increased vocabulary The term vector of c vocabulary before and c vocabulary later, c is positive integer, and the initial term vector of the newly-increased vocabulary is set It is set to random vector;By projection layer, adds up, accumulation result is input to defeated to 2c term vector of input layer output Out in layer;The term vector of vocabulary w is updated in output layer, is continued after updating offline to first using aforesaid way Next vocabulary in lexicon and newly-increased vocabulary continues to update, until objective function convergence.
Wherein, input layer includes the term vector V of 2c vocabularyw-c、Vw-c+1……Vw+c-1、Vw+c, projection layer is by input layer 2c term vector carry out that summation is cumulative, output layer is the vocabulary that occurred using in corpus as leaf node, with each vocabulary The frequency occurred in corpus has constructed a Huffman tree (Huffman tree) as weight, in this Huffman tree middle period Child node shares N number of, respectively corresponds N word in corpus, for any word w in sample corpus Context (w), Huffman tree must have one from root node to the path of word w, and each branch in Huffman tree regard as one time two points Class, left-hand branch indicate that negative class, right-hand branch indicate positive class, and every subseries can generate a probability.
Therefore, the conditional probability of vocabulary w is exactly even to multiply from root node to the probability of branch each on the path of vocabulary w, i.e.,Wherein,p (w) path that the corresponding leaf node of vocabulary w is reached from root node, l are indicatedwIndicate the node for including on path p (w) Number,Indicate the corresponding Huffman coding of j-th of node in path p (w) (the not corresponding coding of root node),Indicate path The corresponding vector of j-th of non-leaf nodes in p (w),Indicate that j-th of non-leaf nodes is divided in path p (w) The probability for the class that is positive, XwIndicate that projection layer is directed to the accumulation result of vocabulary w output;
Objective function becomes:
C indicates that first is offline The word finder that vocabulary and newly-increased vocabulary in lexicon are constituted;
It is denoted asIt was training Cheng Zhong, using stochastic gradient rise method, each Sample Refreshment primary parameter:
Then,More new formula can be written as:
Then, the more new formula of term vector can be written as:
Therefore, for vocabulary w, using the first more new formulaMore new term w's Term vector applies the second more new formula laterIt updatesBase again later In updatedThe term vector of next vocabulary, such iteration are updated using the first more new formula, until objective function is received Until holding back.
303, server carries out synonym extension to designated key word according to offline lexicon, obtains designated key word Synonymous keyword.
In a kind of possible implementation, the word of each vocabulary in offline lexicon in addition to designated key word is calculated The similarity of the term vector of vector and designated key word refers to according to the sequence of similarity from big to small to removing in offline lexicon Determine the vocabulary other than keyword to be ranked up, the vocabulary of preset quantity, that is to say more similar with designated key word before choosing Vocabulary, the synonymous keyword as designated key word.Wherein, the similarity of two term vectors is for indicating the two term vectors Similarity degree, the similarity of the term vector of two vocabulary two vocabulary of bigger expression are more similar, and meaning is more close, two words The similarity of the term vector of remittance two vocabulary of smaller expression are more dissimilar, and meaning difference is bigger.Similarity can be with the two words Cosine similarity of the inverse ratio of the Euclidean distance of vector or the two term vectors etc. indicates.In addition, each keyword can be with The synonymous keyword of preset quantity is expanded, which can determine according to the extension demand of keyword.
304, server will show that content, designated key word and synonymous keyword correspondence are stored in off-line content data In library.
After determining the synonymous keyword of designated key word, due to the meaning phase of the synonymous keyword and designated key word Seemingly, therefore synonymous keyword is it is also assumed that be and show the relevant keyword of content, therefore, by designated key word and synonymous Keyword as the corresponding keyword of content is shown, is stored in off-line content database.
In the embodiment of the present invention, server shows the designated key word of content by collecting, and carries out synonym extension, To establish off-line content database.The corresponding keyword of each displaying content is stored in off-line content database, these Keyword can represent the meaning for showing content, subsequent when showing a certain page, can be closed according to the page for including in the page Keyword, inquires corresponding displaying content in off-line content database, launches onto the page, thus guarantee to show content with The page is associated, avoids launching the displaying content unrelated with the current presentation page for user.So when user is in browsing pages When, it can be launched according to the context for the content or these contents that user is currently browsing and show content, for example user is just The relevant article of game is being read, keyword is " game ", then the related displaying content of game is launched on the page to user.
305, server is according to the corresponding designated key word of content and synonymous keyword is shown, in off-line content database In establish inverted index, include the corresponding displaying content of each keyword in inverted index.
In order to subsequent easy-to-look-up a certain keyword it is corresponding it is one or more show contents, in step 304 will be shown It, can be according to displaying content pair after appearance, designated key word and synonymous keyword correspondence are stored in off-line content database The designated key word and synonymous keyword answered establish inverted index in off-line content database, include each in inverted index The corresponding displaying content of keyword, that is to say using keyword as index, using show content as index object, it is subsequent Corresponding displaying content is obtained according to any keyword query.
For example, it shows that content, designated key word and the synonymous keyword expanded can be as shown in table 1 below, then builds After vertical inverted index, the information stored in off-line content database can be as shown in table 2 below.
Table 1
Show content Designated key word and synonymous keyword
Show content 1 Keyword A, keyword B
Show content 2 Keyword B, keyword C
Table 2
Keyword Show content
Keyword A Show content 1
Keyword B It shows content 1, show content 2
Keyword C Show content 2
The method provided in an embodiment of the present invention for establishing off-line content database, by off-line phase is collected and shown The designated key word of appearance, and synonym extension is carried out, to establish off-line content database, it is ensured that on-line stage energy It is enough directly to inquire corresponding displaying content according to the page key words for including in the current presentation page, it is carried out together without online Adopted word extension, had both extended the range of keyword, and had enhanced diversity, also saves and determines that alternative displaying content process is disappeared The time of consumption improves and determines the alternative efficiency for showing content, and then improves the efficiency launched and show content.
Fig. 4 is a kind of flow chart for determining the alternative method for showing content provided in an embodiment of the present invention.The present invention is implemented The executing subject of example is the server 101 in implementation environment shown in FIG. 1, is carried out to the process for determining alternative displaying content Illustrate, referring to fig. 4, this method comprises:
401, server obtains the page iden-tity of the page of current presentation.
Wherein, which, for determining unique corresponding page, can be the chain of Page Name or page ground connection Location etc..
When a certain terminal is in displayed page, it can be sent to server and show request, displaying request carries the page Page iden-tity, server receives displaying request, and obtains the page iden-tity.In practical applications, server may receive Displaying to multiple pages is requested, so that it is determined that the page iden-tity of multiple pages.Alternatively, server can also be to each terminal The page of displaying is monitored, and determines the page iden-tity of the page of current presentation.
402, server inquires off-line page database according to page iden-tity, and it is crucial to obtain the corresponding page of page iden-tity Word.
For the ease of launching associated displaying content on the page to current presentation, server is used based on key Word determines the alternative strategy for showing content, i.e., related to these keywords for page determination according to the keyword for including in the page Alternative displaying content.Therefore, it when determining alternative displaying content based on the strategy, first has to determine the key for including in the page Word.
Server has pre-established off-line page database, and off-line page database includes the page iden-tity of each page And corresponding page key words, it that is to say page key words included in the page, therefore when server determines current presentation When the page iden-tity of the page, off-line page database is inquired according to the page iden-tity, the corresponding page of the page iden-tity can be obtained Face keyword, these page key words are the keyword for including in the page, can show containing for content with original in representing pages Justice.
Certainly, in addition to the mode of above-mentioned inquiry off-line page database, current exhibition can also be obtained using other modes The page key words of the page shown, such as grab current presentation the page, to the page carry out semantic analysis, obtain include in the page Page key words.
403, server obtains established off-line content database.
Off-line content database include it is each show content designated key word and synonymous keyword, designated key word by The corresponding supplier for showing content determines, is in the keyword that off-line phase is corresponding displaying curriculum offering, synonymous keyword is The keyword obtained after synonym extension is carried out to designated key word when getting designated key word in off-line phase.
In practical application, by using the above-mentioned method shown in Fig. 3 for establishing off-line content database, can establish from Line content data base, therefore in on-line stage, server can directly use established off-line content database.
404, server inquires off-line content database according to the corresponding page key words of page iden-tity, obtains page pass The corresponding displaying content of keyword, the alternative displaying content as the page.
Server passes through inquiry off-line content database, the corresponding displaying content of available page key words, these exhibitions Show that content is corresponding with page key words, it can be considered that it is associated with the original displaying content of the page, these are shown into content It launches and will not influence the original displaying content of user's normal browsing page on the page, but also user can be attracted to click this It is a little to show content, therefore content can be shown as the alternative displaying content of the page these.
Wherein, when having obtained multiple page key words, off-line content number is inquired respectively according to multiple page key words According to library, the query result of each page key words is taken into union, multiple alternative displaying contents can be obtained.
It should be noted that the embodiment of the present invention is only that alternative displaying content has been determined using the strategy based on keyword, Determine it is alternative show content after, directly will can alternatively show that content is launched onto the page, can also using other strategies after Continuous to be screened, final determination will launch the displaying content on the page.It that is to say, it can be only with being based in practical application The tactful determination of keyword will launch the displaying content on the page, can also be by strategy and other strategies based on keyword It combines, determination will launch the displaying content on the page.Other strategies may include strategy based on issuing time, be based on The strategy etc. of supplier's priority.
The method that determination provided in an embodiment of the present invention alternatively shows content, by advance in off-line phase in displaying The designated key word of appearance carries out synonym extension, establishes off-line content database, it is ensured that on-line stage can be according to current The page key words for including in displayed page directly inquire corresponding displaying content, carry out synonym extension without online, both The range for extending keyword, enhances diversity, also saves and determines the time consumed by alternative displaying content process, mentions It is high to determine the alternative efficiency for showing content, and then improve the efficiency launched and show content.
In the related technology, it needs to extend synonymous keyword online, take a long time, or even will appear the situation of receipt time-out. In order to solve the problems, such as the time-consuming number that would generally control the synonymous keyword expanded, it is unfavorable for selection and shows the accurate of content Property, cause some qualified displaying contents that can not be selected and comes.And existing dictionary is generally used when extending online, Instantaneity is poor, and some newly-increased vocabulary, which can not expand, to be come.
And the embodiment of the present invention proposes a kind of displaying commending contents scheme based on keyword orientation.In off-line phase A large amount of corpus are trained by machine learning method, obtain offline lexicon, then to the designated key word for showing content Synonym extension is carried out, and then generates the orientation index of keyword.On-line stage obtains page key words, then according to the page Keyword query orientation index gets corresponding displaying content.Offline expanded keyword is compared to online expanded keyword, energy Online time-consuming is greatly reduced, it is more efficient, and also online expanded keyword is synonymous since time-consuming problem needs control to extend Keyword number, and offline expanded keyword can increase considerably the number of synonymous keyword then without this limitation, increase The diversity of keyword orientation, and then promote the accuracy for showing content.Correspondingly, it can reduce and mention when single under supplier The specified keyword number of donor, reduces the requirement to supplier, improves the user experience of lower single efficiency and advertiser.
Also, generate the process of offline lexicon for off-line phase, by the way of incremental training, have from The problem of being trained plus newly-increased vocabulary on the basis of line lexicon, constitute new offline lexicon, avoid repetition training, Trained speed is accelerated, the case where corpus increases at any time can be coped with, convenient for the extension of offline lexicon, the offline word of generation Library instantaneity of converging is more preferable.
Fig. 5 is a kind of operating process schematic diagram provided in an embodiment of the present invention, which combines above-mentioned Fig. 3 institute The method for establishing off-line content database and determination shown in Fig. 4 shown alternatively show the process of the method for content.Referring to figure 5, which includes:
One, off-line phase:
1, it is trained according to a large amount of corpus, generates offline lexicon;
2, advertiser places an order, and sets the displaying content to be launched and N number of designated key word, to each designated key word M-1 synonymous keyword is expanded, M*N keyword is finally obtained;
3, inverted index is generated, inverted index includes the corresponding order of each keyword.
Two, on-line stage:
1, it receives the displaying to the page to request, the page iden-tity of the page is obtained, so that it is corresponding multiple to obtain page iden-tity Page key words;
2, multiple page key words are traversed, inverted index is inquired for each page key words, determines corresponding order, Determining all orders are taken into union, the displaying content in these orders is the alternative displaying content determined based on keyword;
3, determining order when the traversal completes, is taken into intersection with the order determined based on other strategies, is selected final Order, the displaying content in order are the displaying content that launch on the page.
Other strategies can be the displaying content less according to the impressions selection impressions of displaying content, or The displaying content etc. that the higher advertiser of priority provides is chosen according to the priority of advertiser.
In addition, after determining order is taken intersection with the order determined based on other strategies, it can be with further progress Filtering screening, such as each displaying content can be launched to the clicking rate on the page and be predicted, it is lesser to will click on rate Show that information filtering is fallen.
In alternatively possible implementation, according to the different function of server 101, server 101 can be divided into Multiple modules.Referring to Fig. 6, server 101 includes offline index module 110 and online query module 120.
Offline index module 110 includes dictionary generation unit 1101 and keyword expansion unit 1102, dictionary generation unit 1101, for being trained by machine learning method to a large amount of corpus, generate offline lexicon, keyword expansion unit 1102 For utilizing trained offline lexicon, keyword expansion is carried out to the designated key word for the displaying content that advertiser provides, Off-line content database is generated, i.e. advertiser need to only specify several keywords, can be extended to more semantic similarities Keyword.
Referring to Fig. 7, offline index module 110 execute the step of may include:
1, interested corpus, such as " XX news ", " XX game " are chosen as needed.
2, corpus is input to default training pattern Model_Engine to be trained, obtains offline lexicon Model.
3, off-line content database Order is obtained, displaying content and displaying content including advertiser's offer are specified Keyword carries out synonym to designated key word existing in off-line content database Order according to offline lexicon Model Extension.
4, the keyword after extension is re-write into off-line content database Order.
5, new off-line content database Order is generated into inverted index Index, is used for online query module 120.
Referring to Fig. 8, dictionary generation unit 1101 execute the step of may include:
1, existing offline lexicon Model and newly-increased corpus are input to default training pattern Model_ together It is trained in Engine.
2, it presets training pattern Model_Engine and Chinese word segmentation, English tense conversion, traditional font word reduction is carried out to corpus Deng the word frequency for pre-processing and counting corpus, the initial term vector of corpus is obtained.
3, the training that training pattern Model_Engine carries out term vector according to newly-increased corpus and existing dictionary is preset, The new offline lexicon Model of final output, include in new offline lexicon original corpus and newly-increased corpus word to Amount.
Online query module 120 is used for according to the page key words for including in the page of current presentation, then by giving birth to offline At off-line content database be indexed, determine and alternative show content.
By taking above-mentioned implementation environment shown in Fig. 2 as an example, strategic server 1012 is DSP, and dispensing server 1011 is ADX, Keyword server 1013 is KeyWordServer.Referring to Fig. 9, online query module 120 execute the step of may include:
1, ADX sends to DSP and shows request, wherein showing that request includes the page iden-tity of the current presentation page article_id。
2, DSP sends the request for obtaining page key words to KeyWordServer, which carries article_id.
3, KeyWordServer gets corresponding multiple page key words keywords according to article_id, returns To DSP.
4, DSP is inquired in the inverted index Index that offline index module 110 generates according to each keyword, The corresponding displaying content of each keyword is inquired, union is finally taken.
5, the alternative displaying content inquired is returned to ADX by DSP.
Figure 10 is a kind of device for determining alternative displaying content provided in an embodiment of the present invention, referring to Figure 10, the device packet It includes:
Page key words obtain module 1001, include for executing in the page for obtaining current presentation in above-described embodiment Page key words the step of;
Database obtains module 1002, for executing the step of obtaining off-line content database in above-described embodiment;
Enquiry module 1003 is obtained for executing in above-described embodiment according to page key words inquiry off-line content database The step of to corresponding displaying content.
In a kind of possible implementation, device further include:
Content obtains module, and the step of showing content and designated key word is obtained in above-described embodiment for executing;
Dictionary obtains module, for executing the step of obtaining offline lexicon in above-described embodiment;
Expansion module, for executing the step of carrying out synonym extension to designated key word in above-described embodiment;
Memory module will show that content, designated key word and synonymous keyword are corresponding for executing in above-described embodiment The step of storage.
It further include the term vector of each vocabulary in alternatively possible implementation, in offline lexicon, expansion module, The vocabulary of preceding preset quantity is chosen as synonymous keyword according to similarity after calculating similarity for executing in above-described embodiment The step of.
In alternatively possible implementation, dictionary obtains module, and first is obtained in above-described embodiment offline for executing Lexicon and newly-increased vocabulary;The step of being trained using default training algorithm.
In alternatively possible implementation, device further include:
Inverted index module, for executing the step for establishing inverted index in above-described embodiment in off-line content database Suddenly.
It should be understood that determination provided by the above embodiment alternatively shows that the device of content is determining in alternative show Rong Shi only the example of the division of the above functional modules in practical application, can according to need and by above-mentioned function Can distribution be completed by different functional modules, i.e., the internal structure of server is divided into different functional modules, with complete with The all or part of function of upper description.In addition, determination provided by the above embodiment alternatively shows that the device of content and determination are standby Choosing shows that the embodiment of the method for content belongs to same design, and specific implementation process is detailed in embodiment of the method, no longer superfluous here It states.
Figure 11 is a kind of structural schematic diagram of server provided in an embodiment of the present invention, which can be because of configuration Or performance is different and generate bigger difference, may include one or more central processing units (central Processing units, CPU) 1122 (for example, one or more processors) and memory 1132, one or one (such as one or more mass memories are set the storage medium 1130 of application program 1142 or data 1144 stored above It is standby).Wherein, memory 1132 and storage medium 1130 can be of short duration storage or persistent storage.It is stored in storage medium 1130 Program may include one or more modules (diagram does not mark), each module may include to one in server Series of instructions operation.Further, central processing unit 1122 can be set to communicate with storage medium 1130, in server The series of instructions operation in storage medium 1130 is executed on 1100.
Server 1100 can also include one or more power supplys 1126, one or more are wired or wireless Network interface 1150, one or more input/output interfaces 1158, one or more keyboards 1156, and/or, one A or more than one operating system 1141, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTMEtc..
The server 1100, which can be used for executing determination provided by the above embodiment and alternatively show in the method for content, to be serviced Step performed by device.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (15)

1. a kind of determine the alternative method for showing content, which is characterized in that the described method includes:
Obtain the page key words for including in the page of current presentation;
Off-line content database is obtained, the off-line content database includes each designated key word for showing content and synonymous pass Keyword, it is the corresponding keyword for showing curriculum offering that the designated key word, which is in off-line phase, the synonymous keyword be The off-line phase carries out the keyword obtained after synonym extension to the designated key word;
The off-line content database is inquired according to the page key words, is obtained in the corresponding displaying of the page key words Hold, the alternative displaying content as the page.
2. the method according to claim 1, wherein before the acquisition off-line content database, the method Further include:
In off-line phase, obtains and show content and the designated key word for showing content;
Offline lexicon is obtained, the offline lexicon includes pre-set multiple vocabulary;
Synonym extension is carried out to the designated key word according to the offline lexicon, obtains the synonymous of the designated key word Keyword;
The displaying content, the designated key word and the synonymous keyword correspondence are stored in the off-line content data In library.
3. according to the method described in claim 2, it is characterized in that, further include in the offline lexicon each vocabulary word to Amount, it is described that synonym extension is carried out to the designated key word according to the offline lexicon, obtain the designated key word Synonymous keyword, comprising:
Calculate the term vector and the designated key of each vocabulary in the offline lexicon in addition to the designated key word The similarity of the term vector of word;
According to the sequence of similarity from big to small, the vocabulary in the offline lexicon in addition to the designated key word is carried out Sequence, the vocabulary of preset quantity, the synonymous keyword as the designated key word before choosing.
4. according to the method described in claim 3, it is characterized in that, described calculate removes the specified pass in the offline lexicon The similarity of the term vector of the term vector of each vocabulary other than keyword and the designated key word, comprising:
Calculate the term vector and the designated key of each vocabulary in the offline lexicon in addition to the designated key word The Euclidean distance of the term vector of word;
The sequence according to similarity from big to small, to the vocabulary in the offline lexicon in addition to the designated key word It is ranked up, comprising:
According to the sequence of Euclidean distance from small to large, to the vocabulary in the offline lexicon in addition to the designated key word into Row sequence.
5. according to the method described in claim 3, it is characterized in that, described obtain offline lexicon, comprising:
Obtain the first offline lexicon and newly-increased vocabulary, the first offline lexicon includes the vocabulary of the first quantity and each The term vector of vocabulary;
Using default training algorithm, the described first offline lexicon and the newly-increased vocabulary are trained, obtained updated Second offline lexicon, the second offline lexicon include the vocabulary of the second quantity and the term vector of each vocabulary, described The vocabulary of second quantity includes the vocabulary and the newly-increased vocabulary of first quantity, and the default training algorithm is gradient decline Algorithm, deep neural network algorithm or the continuous bag of words CBOW model based on layered structure Hierarchical Softmax.
6. according to the method described in claim 5, it is characterized in that, the default training algorithm is based on Hierarchical The CBOW model of Softmax, the model include input layer, projection layer and output layer, and the output layer is Huffman tree, described Using the vocabulary that occurred as leaf node in Huffman tree;
It is described to use default training algorithm, the described first offline lexicon and the newly-increased vocabulary are trained, updated The offline lexicon of second afterwards, comprising:
For any vocabulary w in the described first offline lexicon and the newly-increased vocabulary, by described in input layer acquisition The term vector of c vocabulary before vocabulary w and c vocabulary later, c is positive integer, the initial word of the newly-increased vocabulary to Amount is set as random vector;
It by the projection layer, adds up, accumulation result is input to described to 2c term vector of input layer output In output layer;
The term vector of the vocabulary w is updated using the following first more new formula by the output layer:
Wherein,
It is right using the following second more new formula by the output layerIt is updated:
Using above-mentioned update mode, continue to carry out next vocabulary in the described first offline lexicon and the newly-increased vocabulary It updates, until stopping when objective function convergence;
The objective function isIts In, C indicates that the word finder of vocabulary and the newly-increased vocabulary composition in the described first offline lexicon, V (w) indicate vocabulary w's Term vector, η indicate that learning rate, p (w) indicate to reach the corresponding leaf node of vocabulary w from the root node of the Huffman tree Path, lwIndicate the node number for including on path p (w), XwIndicate that the projection layer is directed to the accumulation result of vocabulary w output,Indicate the corresponding Huffman encoding of j-th of node in path p (w),Indicate that j-th of non-leaf nodes is corresponding in path p (w) Vector,J-th of non-leaf nodes is divided the probability for the class that is positive in expression path p (w).
7. according to the method described in claim 2, it is characterized in that, it is described by the displaying content, the designated key word with And the synonymous keyword corresponds to after being stored in the off-line content database, the method also includes:
According to the corresponding designated key word of displaying content and the synonymous keyword, in the off-line content database In establish inverted index, include the corresponding displaying content of each keyword in the inverted index.
8. the method according to claim 1, wherein the page for including in the page for obtaining current presentation closes Keyword, comprising:
Obtain the page iden-tity of the page of current presentation;
Off-line page database is inquired according to the page iden-tity, obtains the corresponding page key words of the page iden-tity, it is described Off-line page database include each page page iden-tity and corresponding page key words.
9. a kind of determine the alternative device for showing content, which is characterized in that described device includes:
Page key words obtain module, the page key words for including in the page for obtaining current presentation;
Database obtains module, and for obtaining off-line content database, the off-line content database includes each displaying content Designated key word and synonymous keyword, the designated key word be the key that show curriculum offering in off-line phase to be corresponding Word, the synonymous keyword are obtained after the off-line phase carries out synonym extension to the designated key word Word;
Enquiry module obtains the page key words for inquiring the off-line content database according to the page key words Corresponding displaying content, the alternative displaying content as the page.
10. device according to claim 9, which is characterized in that described device further include:
Content obtains module, for obtaining displaying content to be put and the specified pass for showing content in off-line phase Keyword;
Dictionary obtains module, and for obtaining offline lexicon, the offline lexicon includes pre-set multiple vocabulary;
Expansion module obtains the finger for carrying out synonym extension to the designated key word according to the offline lexicon Determine the synonymous keyword of keyword;
Memory module, for the displaying content, the designated key word and the synonymous keyword correspondence to be stored in institute It states in off-line content database.
11. device according to claim 10, which is characterized in that further include the word of each vocabulary in the offline lexicon Vector, the expansion module, for calculating the word of each vocabulary in the offline lexicon in addition to the designated key word The similarity of the term vector of vector and the designated key word;According to the sequence of similarity from big to small, to the offline vocabulary Vocabulary in library in addition to the designated key word is ranked up, the vocabulary of preset quantity before choosing, as the designated key The synonymous keyword of word.
12. device according to claim 11, which is characterized in that the dictionary obtains module, offline for obtaining first Lexicon and newly-increased vocabulary, the first offline lexicon include the vocabulary of the first quantity and the term vector of each vocabulary;It adopts With default training algorithm, the described first offline lexicon and the newly-increased vocabulary are trained, obtain updated second from Line lexicon, the second offline lexicon include the vocabulary of the second quantity and the term vector of each vocabulary, second number The vocabulary of amount includes the vocabulary and the newly-increased vocabulary of first quantity, the default training algorithm be gradient descent algorithm, Deep neural network algorithm or continuous bag of words CBOW model based on layered structure Hierarchical Softmax.
13. device according to claim 10, which is characterized in that described device further include:
Inverted index module, for according to the corresponding designated key word of displaying content and the synonymous keyword, Inverted index is established in the off-line content database, includes the corresponding displaying content of each keyword in the inverted index.
14. a kind of determine the alternative device for showing content, which is characterized in that described device includes processor and memory, described It is stored at least one instruction, at least a Duan Chengxu, code set or instruction set in memory, it is described instruction, described program, described Code set or described instruction collection are loaded as the processor and are executed to realize as described in claim 1 to 8 any claim Determination alternatively show in the method for content performed operation.
15. a kind of computer readable storage medium, which is characterized in that be stored at least one in the computer readable storage medium Item instruction, at least a Duan Chengxu, code set or instruction set, described instruction, described program, the code set or described instruction collection by Processor loads and executes the method for alternatively showing content with determination of the realization as described in claim 1 to 8 any claim In performed operation.
CN201711188237.7A 2017-11-24 2017-11-24 Determine the alternative method, apparatus and storage medium for showing content Pending CN110020303A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711188237.7A CN110020303A (en) 2017-11-24 2017-11-24 Determine the alternative method, apparatus and storage medium for showing content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711188237.7A CN110020303A (en) 2017-11-24 2017-11-24 Determine the alternative method, apparatus and storage medium for showing content

Publications (1)

Publication Number Publication Date
CN110020303A true CN110020303A (en) 2019-07-16

Family

ID=67185930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711188237.7A Pending CN110020303A (en) 2017-11-24 2017-11-24 Determine the alternative method, apparatus and storage medium for showing content

Country Status (1)

Country Link
CN (1) CN110020303A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110958218A (en) * 2019-10-16 2020-04-03 平安国际智慧城市科技股份有限公司 Data transmission method based on multi-network communication and related equipment
CN112070586A (en) * 2020-09-09 2020-12-11 腾讯科技(深圳)有限公司 Article recommendation method and device based on semantic recognition, computer equipment and medium
CN113177116A (en) * 2021-04-28 2021-07-27 中国工商银行股份有限公司 Information display method and device, electronic equipment, storage medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902671A (en) * 2011-07-25 2013-01-30 腾讯科技(深圳)有限公司 Search method and device for advertising system
CN106709747A (en) * 2015-11-17 2017-05-24 北京奇虎科技有限公司 Method and device for recalling ad
CN106897265A (en) * 2017-01-12 2017-06-27 北京航空航天大学 Term vector training method and device
CN107102981A (en) * 2016-02-19 2017-08-29 腾讯科技(深圳)有限公司 Term vector generation method and device
US20170315676A1 (en) * 2016-04-28 2017-11-02 Linkedln Corporation Dynamic content insertion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902671A (en) * 2011-07-25 2013-01-30 腾讯科技(深圳)有限公司 Search method and device for advertising system
CN106709747A (en) * 2015-11-17 2017-05-24 北京奇虎科技有限公司 Method and device for recalling ad
CN107102981A (en) * 2016-02-19 2017-08-29 腾讯科技(深圳)有限公司 Term vector generation method and device
US20170315676A1 (en) * 2016-04-28 2017-11-02 Linkedln Corporation Dynamic content insertion
CN106897265A (en) * 2017-01-12 2017-06-27 北京航空航天大学 Term vector training method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110958218A (en) * 2019-10-16 2020-04-03 平安国际智慧城市科技股份有限公司 Data transmission method based on multi-network communication and related equipment
CN112070586A (en) * 2020-09-09 2020-12-11 腾讯科技(深圳)有限公司 Article recommendation method and device based on semantic recognition, computer equipment and medium
CN112070586B (en) * 2020-09-09 2023-11-28 腾讯科技(深圳)有限公司 Item recommendation method and device based on semantic recognition, computer equipment and medium
CN113177116A (en) * 2021-04-28 2021-07-27 中国工商银行股份有限公司 Information display method and device, electronic equipment, storage medium and program product
CN113177116B (en) * 2021-04-28 2024-03-29 中国工商银行股份有限公司 Information display method and device, electronic equipment, storage medium and program product

Similar Documents

Publication Publication Date Title
Qi et al. Finding all you need: web APIs recommendation in web of things through keywords search
Skupin The world of geography: Visualizing a knowledge domain with cartographic means
CN111444394B (en) Method, system and equipment for obtaining relation expression between entities and advertisement recall system
CN105787767A (en) Method and system for obtaining advertisement click-through rate pre-estimation model
CN111143680B (en) Route recommendation method, system, electronic equipment and computer storage medium
CN110147882A (en) Training method, crowd's method of diffusion, device and the equipment of neural network model
CN110020303A (en) Determine the alternative method, apparatus and storage medium for showing content
Ke et al. TabNN: A universal neural network solution for tabular data
CN109819015A (en) Information-pushing method, device, equipment and storage medium based on user's portrait
Pratt et al. WORK AND THE CITY IN THE e-SOCIETY A critical investigation of the sociospatially situated character of economic production in the digital content industries in the UK
CN109493136A (en) A kind of clicking rate predictor method and system based on Xgboost algorithm
CN112084413A (en) Information recommendation method and device and storage medium
Lang et al. Movie recommendation system for educational purposes based on field-aware factorization machine
Sasongko The development of the creative industries to create a competitive advantage: Studies in small business sector
CN108173958A (en) Data-optimized storage method based on ant group algorithm under a kind of cloudy environment
Han et al. DeepRouting: A deep neural network approach for ticket routing in expert network
Yu et al. The personalized recommendation algorithms in educational application
CN111177411A (en) Knowledge graph construction method based on NLP
CN109711653B (en) Weike task recommendation method based on Weike-task-label three-square diagram
Xie Construction and promotion of reading service platform of university library based on computer network cloud platform
CN104424217A (en) MP4 format technology based ultra-capacity travel information system
Ma et al. Context aware feature interaction based recommendation system
CN114579860B (en) User behavior portrait generation method, device, electronic equipment and storage medium
Borhani-Fard et al. Applying clustering approach in blog recommendation
CN103500219B (en) The control method that a kind of label is adaptively precisely matched

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190716

RJ01 Rejection of invention patent application after publication