CN108182200A - Keyword expanding method and device based on semantic similarity - Google Patents

Keyword expanding method and device based on semantic similarity Download PDF

Info

Publication number
CN108182200A
CN108182200A CN201711229082.7A CN201711229082A CN108182200A CN 108182200 A CN108182200 A CN 108182200A CN 201711229082 A CN201711229082 A CN 201711229082A CN 108182200 A CN108182200 A CN 108182200A
Authority
CN
China
Prior art keywords
keyword
candidate
app
semantic similarity
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711229082.7A
Other languages
Chinese (zh)
Other versions
CN108182200B (en
Inventor
翁永金
李百川
陈第
蔡锐涛
李展铿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Umi-Tech Co Ltd
Original Assignee
Umi-Tech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Umi-Tech Co Ltd filed Critical Umi-Tech Co Ltd
Priority to CN201711229082.7A priority Critical patent/CN108182200B/en
Publication of CN108182200A publication Critical patent/CN108182200A/en
Application granted granted Critical
Publication of CN108182200B publication Critical patent/CN108182200B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to keyword expanding methods and device based on semantic similarity.The method includes:Keyword to be expanded is received, calculates keyword to be expanded and the semantic similarity of each candidate keywords in predetermined candidate key set of words;Multiple candidate keywords are included in the candidate key set of words;Obtain searchable index of each candidate keywords in application library platform, according to the searchable index of the semantic similarity, preset semantic similarity weight and each candidate keywords, similarity score of each candidate keywords relative to the keyword to be expanded is calculated;According to the sequence of similarity score, the candidate keywords of setting quantity are chosen from the candidate key set of words, obtain the expansion keyword of the keyword to be expanded.The present invention can Automatic sieve select the higher keyword of similitude, both realize volume production, while ensure that expansion quality.

Description

Keyword expanding method and device based on semantic similarity
Technical field
The present invention relates to technical field of information retrieval, more particularly to the keyword expanding method based on semantic similarity and Device.
Background technology
With the rapid development of intelligent terminal, the development of mobile Internet Software Industry has been driven.More and more users exist Application library platform (i.e. using shop) in intelligent terminal downloads various APP (application, using), according to wikipedia Data show that 65% user downloads required apply by the search of application shop.So APP developer exists to improve itself APP Using the search quality in shop, need to carry out the Optimization Work using shop.One of its key job is that carry out APP keywords excellent Change, and the core content of the optimization of keyword is to need to expand the emphasis keyword of APP.
At present, based on intelligent terminal apply shop specific industry knowledge background, keyword expand it is more by manually into Row judges to expand, and for manually expanding, expands quality and is affected by human subjective's human-subject test, there are keyword expansions As a result the defects of quality is unstable.
Invention content
Based on this, the present invention provides keyword expanding methods and device based on semantic similarity, can overcome existing The defects of quality existing for keyword expansion mode is unstable.
Scheme provided in an embodiment of the present invention includes:
A kind of keyword expanding method based on semantic similarity, including:
Keyword to be expanded is received, calculates keyword to be expanded and each candidate in predetermined candidate key set of words The semantic similarity of keyword;Multiple candidate keywords are included in the candidate key set of words;
Searchable index of each candidate keywords in application library platform is obtained, according to the semantic similarity, preset The searchable index of semantic similarity weight and each candidate keywords calculates each candidate keywords and waits to expand relative to described The similarity score of keyword;
According to the sequence of similarity score, the candidate key of setting quantity is chosen from the candidate key set of words Word obtains the expansion keyword of the keyword to be expanded.
In one embodiment, keyword to be expanded is received, calculates keyword to be expanded and predetermined candidate keywords In set before the semantic similarity of each candidate keywords, further include:
The historical search record information of application library platform is obtained, information is recorded according to the historical search and determines each keyword Corresponding first mapping relations;Wherein, the historical search record information includes the key word information for searching for and respectively closes The search result information of keyword;First mapping relations include the corresponding candidate APP set of keyword, further include candidate The frequency of occurrence information of each APP in APP set;
First mapping relations of multiple keywords in information are recorded according to the historical search, determine that each APP is corresponded to The second mapping relations;Second mapping relations include the corresponding keyword sets of APP;
The candidate key set of words of the application library platform is obtained according to first mapping relations and the second mapping relations.
In one embodiment, recording information according to the historical search determines each keyword with covering the first mapping of APP Relationship, including:
Multiple search result of the same keyword in setting historical period in information is recorded according to the historical search, Obtain the APP sequencing informations in the corresponding multiple search result of the keyword;
It sorts successively according to APP, the APP of setting quantity is chosen from each search result of the keyword, obtain described The corresponding candidate APP set of keyword;
Frequency of occurrences of each APP in the multiple search result in candidate APP set is counted, obtains the keyword Corresponding feature vector;Each element in described eigenvector corresponds to the appearance of each APP in the candidate APP set respectively The frequency;
According to the corresponding candidate APP set of the keyword and feature vector, obtain the keyword corresponding first and reflect Penetrate relationship.
In one embodiment, the time of the application library platform is obtained according to first mapping relations and the second mapping relations Keyword set is selected, including:
A keyword matrix, the row of the keyword matrix are obtained according to first mapping relations and the second mapping relations Number is equal to APP number in the corresponding candidate APP set of keyword in the first mapping relations, columns of the keyword matrix etc. Keyword number in the second mapping relations in the corresponding keyword sets of APP;
According to the frequency of occurrence of keyword each in the keyword matrix, chosen from the keyword matrix and frequency occur The secondary keyword for being greater than or equal to the setting frequency, obtains interim key set of words;
The searchable index of each keyword in the interim key set of words is obtained, is chosen from the interim key set of words Searchable index is greater than or equal to the keyword of setting searchable index value, obtains candidate key set of words.
In one embodiment, by equation below calculate described in each time in keyword to be expanded and candidate key set of words Select the semantic similarity of keyword:
Wherein, ki、kjI-th of keyword and j-th of keyword, V (k are represented respectivelyi)、V(kj) i-th of pass is represented respectively The feature vector of keyword, the feature vector of j-th keyword, V (ki)·V(kj) represent two vectorial inner products, | | V (ki)||2 Represent vector V (ki) 2- norms, | | V (ki)||2||V(kj)||2Represent feature vector V (ki) and V (kj) 2- norms multiply Product, sim (ki,kj) represent the semantic similarity of i-th of keyword and j-th of keyword.
In one embodiment, according to the semantic similarity, preset semantic similarity weight and each candidate key The searchable index of word calculates each candidate keywords by equation below and is obtained relative to the similarity of the keyword to be expanded Point:
Wherein, K ' represents keyword to be expanded, kiRepresent i-th of candidate keywords in candidate key set of words, Score (ki) represent similarity score of candidate i-th of the candidate keywords relative to the keyword to be expanded;W represents the semanteme of setting Similarity weight, (1-w) represent searchable index weight;sim(K′,ki) represent keyword to be expanded and i-th of candidate keywords Semantic similarity;piRepresent the searchable index of i-th of candidate keywords, pminRepresent all candidates in candidate key set of words The minimum searchable index value of keyword, pmaxIt is then maximum search exponential quantity, Scorei∈[0,100]。
In one embodiment, the historical search record information of application library platform is obtained, including:
By the interface of application library platform, the application library platform historical search of nearest one week record information is obtained.
A kind of keyword expanding device based on semantic similarity, including:
Semantic Similarity Measurement module, for receiving keyword to be expanded, calculate keyword to be expanded with it is predetermined The semantic similarity of each candidate keywords in candidate key set of words;Multiple candidate passes are included in the candidate key set of words Keyword;
Similarity score computing module, for obtaining searchable index of each candidate keywords in application library platform, root According to the searchable index of the semantic similarity, preset semantic similarity weight and each candidate keywords, each time is calculated Select similarity score of the keyword relative to the keyword to be expanded;
And selected ci poem modulus block is expanded, for according to the sequence of similarity score, from the candidate key set of words The middle candidate keywords for choosing setting quantity obtain the expansion keyword of the keyword to be expanded.
A kind of computer readable storage medium, is stored thereon with computer program, which realizes when being executed by processor The step of method described above.
A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage The step of computer program, the processor realizes method described above when performing described program.
Implement above-described embodiment, receiving when keyword is expanded, calculate keyword to be expanded first with predefining Candidate key set of words in each candidate keywords semantic similarity;Multiple candidates are included in the candidate key set of words Keyword;Then searchable index of each candidate keywords in application library platform is obtained, according to the semantic similarity, is preset Semantic similarity weight and each candidate keywords searchable index, calculate each candidate keywords and wait to open up relative to described Open up the similarity score of keyword;Finally according to the sequence of similarity score, chosen from the candidate key set of words Set the candidate keywords of quantity, obtain the keyword to be expanded expansion keyword above-mentioned technical proposals can according to The keyword of family input, selects the higher keyword of similitude based on semantic analysis Automatic sieve and provides its semantic similarity simultaneously and comment Point, so as to improve the efficiency of APP operation personnel;In addition, by the keyword expanding method of above-described embodiment, it is also convenient for batch and leads Go out similar key, realize that efficiency is also highly improved;Both it realizes volume production, while ensures that expansion quality.
Description of the drawings
Fig. 1 is the schematic flow chart of the keyword expanding method based on semantic similarity of an embodiment;
Fig. 2 is the schematic flow chart of the keyword expanding method based on semantic similarity of another embodiment;
Fig. 3 is the schematic diagram of the keyword expanding device based on semantic similarity of an embodiment.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Although the step in various embodiments of the present invention is arranged with label, it is not used to limit the priority time of step Sequence, based on the order of step or the execution of certain step need other steps unless expressly stated, the otherwise phase of step Order is adjustable.
Fig. 1 is the schematic flow chart of the keyword expanding method based on semantic similarity of an embodiment;Such as Fig. 1 institutes Show, the keyword expanding method based on semantic similarity in the present embodiment includes step:
S11 receives keyword to be expanded, calculate keyword to be expanded with it is each in predetermined candidate key set of words The semantic similarity of candidate keywords;Multiple candidate keywords are included in the candidate key set of words;
Keyword in the embodiment of the present invention includes all characters that can be used in application library platform search APP, such as the Chinese Word, English word either letter, number or other letter symbols, can also be the combining form of several characters.
S12 obtains searchable index of each candidate keywords in application library platform, according to the semantic similarity, in advance If semantic similarity weight and each candidate keywords searchable index, calculate each candidate keywords and treated relative to described Expand the similarity score of keyword;
S13 according to the sequence of similarity score, chooses the candidate of setting quantity from the candidate key set of words Keyword obtains the expansion keyword of the keyword to be expanded.
In an alternative embodiment, keyword to be expanded is received, calculates keyword to be expanded and predetermined candidate pass In keyword set before the semantic similarity of each candidate keywords, the corresponding candidate keywords of determining application library platform are further included The step of set.Comprising multiple candidate keywords in the candidate key set of words, the multiple candidate keywords are application Be previously used in the platform of library for searching for the keyword of APP, form include Chinese character, English word or letter, number or Other letter symbols of person etc..
The candidate key set of words can record information based on the historical search of the application library platform and obtain, such as based on The search occurred in the application library platform in nearest one week records information, and described search record information includes the key for search Word information and the corresponding search result information of each keyword, may also include the searchable index information of each keyword, searchable index Be according to setting timing statistics in using the keyword application library platform carry out APP search cumulative number (volumes of searches), together When consider what the search factors such as magnitude were calculated, both searchable index and volumes of searches are that positive relationship is presented, from empirically greatly Cause estimation:(1) searchable index<4605 keyword, substantially volumes of searches is no more than 1 time daily;(2) searchable index>=4605 And<8000 keyword, daily volumes of searches ≈ searchable indexs -4604;(3) searchable index is more than 8000 keyword, daily Volumes of searches ≈ (searchable index -4604) * f (x), f (x) represent searchable index and the non-simple linear increasing of volumes of searches both sides relation Long relationship.
In an alternative embodiment, the step of determining application library platform corresponding candidate key set of words, may include:
First, the historical search record information of application library platform is obtained, information is recorded according to the historical search and is determined respectively Corresponding first mapping relations of keyword;Wherein, historical search record information include the key word information for search with And the search result information of each keyword;First mapping relations include the corresponding candidate APP set of keyword, further include The frequency of occurrence information of each APP in candidate APP set.
Then, first mapping relations of multiple candidate keywords in information are recorded according to the historical search got, Determine each APP with covering the second mapping relations of keyword;Second mapping relations include the corresponding keyword sets of APP It closes.Further, the candidate key of the application library platform can be obtained according to first mapping relations and the second mapping relations Set of words.
In an alternative embodiment, information is recorded according to historical search and determines each keyword with covering the first mapping of APP The realization process of relationship includes:It is more in setting historical period that same keyword in information is recorded according to the historical search Secondary search result obtains the APP sequencing informations in the corresponding multiple search result of the keyword;It sorts according to APP successively, from this The APP of setting quantity is chosen in each search result of keyword, obtains the corresponding candidate APP set of the keyword;Statistics Frequency of occurrences of each APP in the multiple search result in candidate APP set, obtain the corresponding feature of the keyword to Amount;Each element in described eigenvector corresponds to the frequency of occurrence of each APP in the candidate APP set respectively;According to institute The corresponding candidate APP set of keyword and feature vector are stated, obtains corresponding first mapping relations of the keyword.
It is above-mentioned to obtain the application according to first mapping relations and the second mapping relations in an alternative embodiment The realization process of the candidate key set of words of library platform includes:
A keyword matrix, the row of the keyword matrix are obtained according to first mapping relations and the second mapping relations Number is equal to APP number in the corresponding candidate APP set of keyword in the first mapping relations, columns of the keyword matrix etc. Keyword number in the second mapping relations in the corresponding keyword sets of APP.According to pass each in the keyword matrix The frequency of occurrence of keyword is chosen the keyword that frequency of occurrence is greater than or equal to the setting frequency from the keyword matrix, is obtained Interim key set of words.The searchable index of each keyword in the interim key set of words is obtained, from the interim key word set The keyword that searchable index is greater than or equal to setting searchable index value is chosen in conjunction, obtains candidate key set of words.
With reference to the logical schematic described in Fig. 2, the keyword based on semantic similarity of the embodiment of the present invention is opened up Exhibition method is described further.
First, based on keyword in application library platform the search result information of nearest one week, for example, using i-th of pass The APP that keyword scans for result covering represents as follows:
S(ki)=(appid1,appid2,…,appidn) (2-1)
In formula, i, n ∈ Z, Z are Positive Integer Set, kiRepresent i-th of keyword, n expressions are obtained by i-th of keyword search N by there is tactic APP (can be identified with APPid), may not for the size of different keywords its corresponding n Together.
Further, it is determined that the Direct mapping relationship of keyword and APP in application library platform:
Due to same keyword, (such as in one week) may repeatedly be searched for, and search plain knot in setting historical period Fruit changes with the variation of search time.Statistics is carried out to search result to summarize, and it is corresponding to finally obtain i-th of keyword APP set A (ki) and its feature vector V (ki),
A(ki)=(appid1,appid2,…,appidn) (2-2)
V(ki)=(count1,count2,…,countn) (2-3)
Wherein i, n ∈ Z, kiRepresent i-th of keyword, countnIt represents in setting historical period with the keyword search There is the frequency of some app.
Further, it is determined that the reverse Mapping relationship of keyword and APP, i.e. corresponding second mapping relations of APP:
According to above-mentioned mapping relations S (ki) Inverted List is established, the corresponding keyword set K (a of i-th of APP can be obtainedi):
K(ai)=(keyword1,...,keywordn) (2-4)
Wherein i, n ∈ Z, aiRepresent i-th of APP, while difference APP corresponds to different n, i.e., the corresponding K (a of different APPi) Dimension it is different.
Further, it is determined that candidate key set of words:
First, it obtains the corresponding APP of the keyword according to formula (2-2), formula (2-3) to gather, be chosen from APP set The APP of m before frequency of occurrence ranking obtains the corresponding candidate's APP set S of the keywordapp
Sapp=(appid(1),…,appid(m)) (3-1)
Wherein m ∈ Z.
According to candidate's APP set SappAnd its corresponding feature vector, obtain corresponding first mapping of the keyword Relationship.
Further, the APP set in formula (3-1) is mapped according to formula (2-4), obtains a keyword matrix, remember For Mkw
Wherein m, n ∈ Z.
Further, the keyword matrix is screened:
(1) to keyword matrix MkwThe frequency of occurrence that merger counts wherein each keyword is carried out, frequency collating occurs in selection The keyword of preceding n obtains interim key word set;
(2) keyword that the searchable index in the interim key word set is less than β is rejected, obtains candidate key word set It closes, is denoted as,
Skw=(keyword1,keyword2,…,keywordn) (4-1)
Wherein n ∈ Z.
It should be noted that the determining process of above-mentioned candidate key set of words can be off-line calculation, and regularly update, For example update a candidate key set of words again weekly, to ensure the expansion keyword obtained based on the candidate key set of words Quality.
In an alternative embodiment, can be calculated by equation below described in keyword to be expanded and candidate key set of words The semantic similarity of each candidate keywords:
Wherein, ki、kjI-th of keyword and j-th of keyword, V (k are represented respectivelyi)、V(kj) i-th of pass is represented respectively The feature vector of keyword, the feature vector of j-th keyword, V (ki)·V(kj) represent two vectorial inner products, | | V (ki)||2 Represent vector V (ki) 2- norms, i.e. the quadratic sum of element absolute value evolution again, | | V (ki)||2||V(kj)||2Represent feature to Measure V (ki) and V (kj) 2- norms product, sim (ki,kj) represent that the semanteme of i-th of keyword and j-th of keyword is similar Degree.
It should be understood that between two keywords semantic similarity computational methods, it is including but not limited to above-mentioned based on The algorithm of cosine similarity computing semantic similarity, the algorithm that other can also be used to be used for computing semantic similarity.
In an alternative embodiment, each candidate keywords can be calculated by equation below and wait to expand key relative to described The similarity score of word:
Wherein, K ' represents keyword to be expanded, kiRepresent i-th of candidate keywords in candidate key set of words, Score (ki) representing similarity score of i-th of candidate keywords relative to the keyword to be expanded, w represents that the semanteme of setting is similar Weight is spent, (1-w) represents searchable index weight;piRepresent the searchable index of i-th of candidate keywords, pminRepresent candidate key The minimum searchable index value of all candidate keywords, p in set of wordsmaxIt is then all candidate keywords in candidate key set of words Maximum search exponential quantity;Wherein, Scorei∈[0,100]。
It should be understood that the computational methods of above-mentioned similarity score include but not limited to the above-mentioned calculating based under hundred-mark system Formula can also determine other calculation formula, if can to a certain extent Technique Using Both Text similarity, keyword searchable index And the influence power of the two.
By the above process, each keyword to be expanded input by user can be expanded out automatically in real time based on semantic phase As lists of keywords;It expands high-quality and efficient.
It is right for applying shop (App store optimization, ASO) below with apple with reference to above-described embodiment Application of the keyword expanding method in ASO based on semantic similarity of the present invention is illustrated:
1) APP historical search information of the apple using shop, such as the pass of nearest one week are obtained using apple developer API Keyword search result, APP information (may include the dimensions such as APPID, APP title), key word information are (including keyword ID, key The dimensions such as word, searchable index, search result)
2) historical search information in step 1) is pre-processed, arranges following mapping relations, be expressed as with hash tables Shown in table 1.
Represent 1:
3) according to the mapping relations arranged in 2), the keyword ID of keyword to be expanded is inquired first, then based on the pass Keyword ID inquires its corresponding APPID, chooses the APPID of 200 (can be based on actual conditions to set) before wherein ranking as crucial Word corresponds to candidate APP set, and determines and the corresponding feature vector of the candidate APP set.
4) traversal step 3) in obtained feature vector, the mapping relations arranged in inquiry 2) obtain each APPID institutes The keyword ID of covering obtains the corresponding keyword matrix of the keyword to be expanded.
5) frequency of occurrence that merger counts each keyword, selection wherein frequency of occurrence are carried out to keyword matrix in step 4) The keyword that (can be based on actual conditions to set) 1000 before ranking, obtains interim key word matrix;Further to the intermediate pass Keyword matrix is screened, and is rejected wherein searchable index and, less than the keyword of 4605 (can be based on actual conditions to set), is obtained institute State the corresponding candidate key set of words of keyword to be expanded.
6) mapping relations obtained in inquiry 2), each candidate keywords are corresponding in candidate key set of words in obtaining 5) Feature vector.
7) keyword to be expanded and the cosine similarity of each candidate keywords in candidate key set of words are calculated, with this As pure semantic similarity.
8) semantic similarity weight is set as 0.9 (actual conditions can be based on to set), then searchable index weight is 1-0.9= 0.1, similarity score of each candidate keywords relative to the keyword to be expanded is calculated according to above-mentioned formula (6-1), from height First 200 (actual conditions can be based on to set) are taken to low, obtain the expansion lists of keywords of the keyword to be expanded.
Under above-mentioned concrete application, i.e., by the keyword expanding method application of above-described embodiment based on semantic similarity Word is opened up using upper in ASO, and that tests 5 keywords opens up word effect.10 keywords manually have been expanded for each keyword first, Then first 50 are automatically determined for each keyword using the keyword expanding method based on semantic similarity of above-described embodiment Similar key.Comparing result finds that preceding 50 keywords that the keyword 80% manually selected is selected automatically cover, it was demonstrated that The validity of the keyword expanding method based on semantic similarity of above-described embodiment.Also, it compares and manually opens up word, above-mentioned reality 200 expansion keywords before applying the keyword expanding method based on semantic similarity of example and can providing within 3 seconds, speed is substantially It is promoted.
It should be noted that for aforementioned each method embodiment, describe, it is all expressed as a series of for simplicity Combination of actions, but those skilled in the art should know, the present invention is not limited by described sequence of movement, because according to According to the present invention, certain steps may be used other sequences or be carried out at the same time.In addition, can also arbitrary group be carried out to above-described embodiment It closes, obtains other embodiments.
Based on the thought identical with the keyword expanding method based on semantic similarity in above-described embodiment, the present invention is also The keyword expanding device based on semantic similarity is provided, which can be used for performing the above-mentioned keyword based on semantic similarity Expanding method.For convenience of description, in the structure diagram of the keyword expanding device embodiment based on semantic similarity, only Show with the relevant part of the embodiment of the present invention, it will be understood by those skilled in the art that schematic structure not structure twin installation Restriction, can include that more or fewer components either combine certain components or different components is arranged than illustrating.
Fig. 3 is the schematic diagram of the keyword expanding device based on semantic similarity of one embodiment of the invention;Such as Shown in Fig. 3, the keyword expanding device based on semantic similarity of the present embodiment includes:
Semantic Similarity Measurement module 310 for receiving keyword to be expanded, calculates keyword to be expanded with predefining Candidate key set of words in each candidate keywords semantic similarity;Multiple candidates are included in the candidate key set of words Keyword;
Similarity score computing module 320, for obtaining searchable index of each candidate keywords in application library platform, According to the searchable index of the semantic similarity, preset semantic similarity weight and each candidate keywords, calculate each Candidate keywords relative to the keyword to be expanded similarity score;
And selected ci poem modulus block 330 is expanded, for according to the sequence of similarity score, from the candidate keywords The candidate keywords of setting quantity are chosen in set, obtain the expansion keyword of the keyword to be expanded.
In an alternative embodiment, the keyword expanding device based on semantic similarity further includes:
Candidate key set of words determining module, for obtaining the historical search of application library platform record information, according to described Historical search record information determines corresponding first mapping relations of each keyword;Wherein, the historical search record information includes For the key word information of search and the search result information of each keyword;First mapping relations include keyword pair The candidate APP set answered further includes the frequency of occurrence information of each APP in candidate APP set;It is recorded according to the historical search First mapping relations of multiple keywords in information, determine corresponding second mapping relations of each APP;Second mapping is closed System includes the corresponding keyword sets of APP;The application library is obtained according to first mapping relations and the second mapping relations The candidate key set of words of platform.
In an alternative embodiment, the candidate key set of words determining module includes:
Candidate APP determination sub-modules are gone through for recording same keyword in information according to the historical search in setting Multiple search result in the history period obtains the APP sequencing informations in the corresponding multiple search result of the keyword;According to APP Sequence successively, the APP of setting quantity is chosen from each search result of the keyword, obtains the corresponding candidate of the keyword APP gathers.
Feature vector determination sub-module, for counting in candidate APP set each APP in the multiple search result Frequency of occurrence obtains the corresponding feature vector of the keyword;Each element in described eigenvector corresponds to the time respectively Select the frequency of occurrence of each APP in APP set;
And mapping relations determination sub-module, for according to the keyword corresponding candidate APP set and feature to Amount, obtains corresponding first mapping relations of the keyword.
In an alternative embodiment, the candidate key set of words determining module further includes:
Gather determination sub-module, put down for obtaining the application library according to first mapping relations and the second mapping relations The candidate key set of words of platform;It is specifically used for:A keyword square is obtained according to first mapping relations and the second mapping relations Battle array, the line number of the keyword matrix are equal to APP number in the corresponding candidate APP set of keyword in the first mapping relations, The columns of the keyword matrix is equal to the keyword number in the corresponding keyword sets of APP in the second mapping relations;According to The frequency of occurrence of each keyword in the keyword matrix, selection frequency of occurrence is greater than or equal to from the keyword matrix The keyword of the frequency is set, obtains interim key set of words;The search for obtaining each keyword in the interim key set of words refers to Number is chosen the keyword that searchable index is greater than or equal to setting searchable index value from the interim key set of words, is waited Select keyword set.
It should be noted that in the embodiment of the keyword expanding device based on semantic similarity of above-mentioned example, respectively The contents such as information exchange, implementation procedure between module, due to being based on same design, band with preceding method embodiment of the present invention The technique effect come is identical with preceding method embodiment of the present invention, and particular content can be found in chatting in the method for the present invention embodiment It states, details are not described herein again.
In addition, in the embodiment of the keyword expanding device based on semantic similarity of above-mentioned example, each program module Logical partitioning be merely illustrative of, can be as needed in practical application, for example, for corresponding hardware configuration requirement or The convenient of the realization of software considers, above-mentioned function distribution is completed by different program modules, i.e., will be described based on semantic similar The internal structure of the keyword expanding device of degree is divided into different program modules, described above all or part of to complete Function.
It will appreciated by the skilled person that realizing all or part of flow in above-described embodiment method, being can It is completed with instructing relevant hardware by computer program, the program can be stored in a computer-readable storage and be situated between In matter, it is independent product sale or uses.Described program when being executed, can perform the complete of such as method of the various embodiments described above Portion or part steps.Wherein, the storage medium can be magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
In addition, the storage medium can be also set in a kind of computer equipment, place is further included in the computer equipment Manage device, when the processor performs the program in the storage medium, can realize the method for the various embodiments described above whole or Part steps.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.It is appreciated that wherein used term " first ", " second " etc. are at this It is used to distinguish object, but these objects should not be limited by these terms in text.
Embodiment described above only expresses the several embodiments of the present invention, it is impossible to be interpreted as to the scope of the claims of the present invention Limitation.It should be pointed out that for those of ordinary skill in the art, without departing from the inventive concept of the premise, Various modifications and improvements can be made, these belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention It should be determined by the appended claims.

Claims (10)

1. a kind of keyword expanding method based on semantic similarity, which is characterized in that including:
Keyword to be expanded is received, calculates keyword to be expanded and each candidate key in predetermined candidate key set of words The semantic similarity of word;Multiple candidate keywords are included in the candidate key set of words;
Obtain searchable index of each candidate keywords in application library platform;According to the semantic similarity, preset semanteme The searchable index of similarity weight and each candidate keywords calculates each candidate keywords and waits to expand key relative to described The similarity score of word;
According to the sequence of similarity score, the candidate keywords of setting quantity are chosen from the candidate key set of words, Obtain the expansion keyword of the keyword to be expanded.
2. the keyword expanding method according to claim 1 based on semantic similarity, which is characterized in that receive and wait to expand Keyword calculates keyword to be expanded and the semantic similarity of each candidate keywords in predetermined candidate key set of words Before, it further includes:
The historical search record information of application library platform is obtained, recording information according to the historical search determines that each keyword corresponds to The first mapping relations;Wherein, the historical search record information includes the key word information for search and each keyword Search result information;First mapping relations include the corresponding candidate APP set of keyword, further include candidate APP collection The frequency of occurrence information of each APP in conjunction;
First mapping relations of multiple keywords in information are recorded according to the historical search, determine each APP corresponding the Two mapping relations;Second mapping relations include the corresponding keyword sets of APP;
The candidate key set of words of the application library platform is obtained according to first mapping relations and the second mapping relations.
3. the keyword expanding method according to claim 2 based on semantic similarity, which is characterized in that gone through according to described History search record information determines corresponding first mapping relations of each keyword, including:
Multiple search result of the same keyword in setting historical period in information is recorded according to the historical search, is obtained APP sequencing informations in the corresponding multiple search result of the keyword;
It sorts successively according to APP, the APP of setting quantity is chosen from each search result of the keyword, obtains the key The corresponding candidate APP set of word;
Frequency of occurrences of each APP in the multiple search result in candidate APP set is counted, the keyword is obtained and corresponds to Feature vector;Each element in described eigenvector corresponds to the appearance frequency of each APP in the candidate APP set respectively It is secondary;
According to the corresponding candidate APP set of the keyword and feature vector, obtain corresponding first mapping of the keyword and close System.
4. the keyword expanding method according to claim 3 based on semantic similarity, which is characterized in that according to described One mapping relations and the second mapping relations obtain the candidate key set of words of the application library platform, including:
A keyword matrix, the line number of the keyword matrix etc. are obtained according to first mapping relations and the second mapping relations The APP number in the first mapping relations in keyword corresponding candidate APP set, the columns of the keyword matrix is equal to the Keyword number in two mapping relations in the corresponding keyword sets of APP;
According to the frequency of occurrence of keyword each in the keyword matrix, it is big that frequency of occurrence is chosen from the keyword matrix In or equal to setting the frequency keyword, obtain interim key set of words;
The searchable index of each keyword in the interim key set of words is obtained, search is chosen from the interim key set of words Index is greater than or equal to the keyword of setting searchable index value, obtains candidate key set of words.
5. the keyword expanding method according to claim 3 or 4 based on semantic similarity, which is characterized in that by such as Keyword to be expanded and the semantic similarity of each candidate keywords in candidate key set of words described in lower formula calculating:
Wherein, ki、kjI-th of keyword and j-th of keyword, V (k are represented respectivelyi)、V(kj) i-th of keyword pair is represented respectively Feature vector, the corresponding feature vector of j-th of keyword answered, V (ki)·V(kj) represent two vectorial inner products, | | V (ki)| |2Represent vector V (ki) 2- norms, | | V (ki)||2||V(kj)||2Represent feature vector V (ki) and V (kj) 2- norms multiply Product, sim (ki,kj) represent the semantic similarity of i-th of keyword and j-th of keyword.
6. the keyword expanding method according to any one of claims 1 to 4 based on semantic similarity, which is characterized in that root According to the searchable index of the semantic similarity, preset semantic similarity weight and each candidate keywords, by following public Formula calculates similarity score of each candidate keywords relative to the keyword to be expanded:
Wherein, K ' represents keyword to be expanded, kiRepresent i-th of candidate keywords in candidate key set of words, Score (ki) table Show similarity score of candidate i-th of the candidate keywords relative to the keyword to be expanded;W represents the semantic similarity of setting Weight, (1-w) represent searchable index weight;sim(K′,ki) represent keyword to be expanded and the semanteme of i-th of candidate keywords Similarity;piRepresent the searchable index of i-th of candidate keywords, pminRepresent all candidate keywords in candidate key set of words Minimum searchable index value, pmaxIt is then maximum search exponential quantity, Scorei∈[0,100]。
7. the keyword expanding method according to claim 2 based on semantic similarity, which is characterized in that obtain application library The historical search record information of platform, including:
By the interface of application library platform, the application library platform historical search of nearest one week record information is obtained.
8. a kind of keyword expanding device based on semantic similarity, which is characterized in that including:
Semantic Similarity Measurement module for receiving keyword to be expanded, calculates keyword to be expanded and predetermined candidate The semantic similarity of each candidate keywords in keyword set;Multiple candidate keys are included in the candidate key set of words Word;
Similarity score computing module, for obtaining searchable index of each candidate keywords in application library platform, according to institute The searchable index of semantic similarity, preset semantic similarity weight and each candidate keywords is stated, calculates each candidate pass Keyword relative to the keyword to be expanded similarity score;
And selected ci poem modulus block is expanded, for according to the sequence of similarity score, being selected from the candidate key set of words The candidate keywords of setting quantity are taken, obtain the expansion keyword of the keyword to be expanded.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of claim 1 to 7 any the method is realized during row.
10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes the step of claim 1 to 7 any the method when performing described program Suddenly.
CN201711229082.7A 2017-11-29 2017-11-29 Keyword expansion method and device based on semantic similarity Active CN108182200B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711229082.7A CN108182200B (en) 2017-11-29 2017-11-29 Keyword expansion method and device based on semantic similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711229082.7A CN108182200B (en) 2017-11-29 2017-11-29 Keyword expansion method and device based on semantic similarity

Publications (2)

Publication Number Publication Date
CN108182200A true CN108182200A (en) 2018-06-19
CN108182200B CN108182200B (en) 2020-10-23

Family

ID=62545546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711229082.7A Active CN108182200B (en) 2017-11-29 2017-11-29 Keyword expansion method and device based on semantic similarity

Country Status (1)

Country Link
CN (1) CN108182200B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117475A (en) * 2018-07-02 2019-01-01 武汉斗鱼网络科技有限公司 A kind of method and relevant device of text rewriting
CN110795534A (en) * 2019-10-28 2020-02-14 维沃移动通信有限公司 Information searching method and mobile terminal
CN114238619A (en) * 2022-02-23 2022-03-25 成都数联云算科技有限公司 Method, system, device and medium for screening Chinese nouns based on edit distance
CN115630154A (en) * 2022-12-19 2023-01-20 竞速信息技术(廊坊)有限公司 Big data environment-oriented dynamic summary information construction method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103853722A (en) * 2012-11-29 2014-06-11 腾讯科技(深圳)有限公司 Query based keyword extension method, device and system
CN106610972A (en) * 2015-10-21 2017-05-03 阿里巴巴集团控股有限公司 Query rewriting method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103853722A (en) * 2012-11-29 2014-06-11 腾讯科技(深圳)有限公司 Query based keyword extension method, device and system
CN106610972A (en) * 2015-10-21 2017-05-03 阿里巴巴集团控股有限公司 Query rewriting method and apparatus

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117475A (en) * 2018-07-02 2019-01-01 武汉斗鱼网络科技有限公司 A kind of method and relevant device of text rewriting
CN110795534A (en) * 2019-10-28 2020-02-14 维沃移动通信有限公司 Information searching method and mobile terminal
CN114238619A (en) * 2022-02-23 2022-03-25 成都数联云算科技有限公司 Method, system, device and medium for screening Chinese nouns based on edit distance
CN114238619B (en) * 2022-02-23 2022-04-29 成都数联云算科技有限公司 Method, system, device and medium for screening Chinese nouns based on edit distance
CN115630154A (en) * 2022-12-19 2023-01-20 竞速信息技术(廊坊)有限公司 Big data environment-oriented dynamic summary information construction method and system
CN115630154B (en) * 2022-12-19 2023-05-05 竞速信息技术(廊坊)有限公司 Big data environment-oriented dynamic abstract information construction method and system

Also Published As

Publication number Publication date
CN108182200B (en) 2020-10-23

Similar Documents

Publication Publication Date Title
CN104123332B (en) The display methods and device of search result
US9639579B2 (en) Determination of a desired repository for retrieving search results
CA2655196C (en) System and method for generating a display of tags
CN109933660B (en) API information search method towards natural language form based on handout and website
CN108182200A (en) Keyword expanding method and device based on semantic similarity
CN104573130B (en) The entity resolution method and device calculated based on colony
CN109460519B (en) Browsing object recommendation method and device, storage medium and server
CN108509617A (en) Construction of knowledge base, intelligent answer method and device, storage medium, the terminal in knowledge based library
CN110532351B (en) Recommendation word display method, device and equipment and computer readable storage medium
US20180150561A1 (en) Searching method and searching apparatus based on neural network and search engine
CN105224554A (en) Search word is recommended to carry out method, system, server and the intelligent terminal searched for
JP6728178B2 (en) Method and apparatus for processing search data
CN111061954B (en) Search result sorting method and device and storage medium
CN107229737A (en) The method and electronic equipment of a kind of video search
CN114330329A (en) Service content searching method and device, electronic equipment and storage medium
CN109543113B (en) Method and device for determining click recommendation words, storage medium and electronic equipment
CN105677838A (en) User profile creating and personalized search ranking method and system based on user requirements
CN104933099B (en) Method and device for providing target search result for user
US20190347295A1 (en) Display apparatus and display method
CN108170665A (en) Keyword expanding method and device based on comprehensive similarity
CN111160699A (en) Expert recommendation method and system
CN110287348A (en) A kind of GIF format picture searching method based on machine learning
JP2007249600A (en) Method for classifying objective data to category
CN108170664A (en) Keyword expanding method and device based on emphasis keyword
CN114722086A (en) Method and device for determining search rearrangement model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant