CN108052554A - The method and apparatus that various dimensions expand keyword - Google Patents
The method and apparatus that various dimensions expand keyword Download PDFInfo
- Publication number
- CN108052554A CN108052554A CN201711229068.7A CN201711229068A CN108052554A CN 108052554 A CN108052554 A CN 108052554A CN 201711229068 A CN201711229068 A CN 201711229068A CN 108052554 A CN108052554 A CN 108052554A
- Authority
- CN
- China
- Prior art keywords
- app
- keyword
- keywords
- expanded
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to the method and apparatus that various dimensions expand keyword.The described method includes:After APP to be expanded is received, on the one hand determine associated 2nd APP of APP to be expanded, the first expansion keyword set is obtained based on the 2nd APP;On the other hand determine the emphasis keyword of APP to be expanded, the second expansion keyword set is obtained based on emphasis keyword;Then keyword screening is carried out on the basis of the first expansion keyword set and second expand keyword set, obtains the final of APP to be expanded and expand keyword.The present invention can carry out the expansion of keyword based on two dimensions of competing product APP and emphasis keyword, improve the quality of keyword expansion and comprehensive.
Description
Technical Field
The invention relates to the technical field of data analysis, in particular to a method and a device for multi-dimensional expansion of keywords.
Background
Since most users download various APPs (APP identities, also called applications) from an application library platform (i.e., an APP store) in an intelligent terminal, in order to improve the search quality of the APP in the APP store, an APP developer needs to make keyword analysis of the APP to optimize the APP.
Based on the specific industry knowledge background of the intelligent terminal application store, the keyword expansion of the traditional application store APP is judged and expanded by manpower, the expansion quality is greatly influenced by the subjective cognitive level of the manpower, and the quality of a keyword expansion result is unstable. In addition, the existing expansion thought is usually based on the characteristics of the APP itself to perform keyword expansion, so that the keywords are difficult to be fully expanded.
Disclosure of Invention
Based on the method and the device, the invention provides the method and the device for expanding the keywords in multiple dimensions, and the defects of unstable expansion quality and incomplete expansion of the keywords of the conventional application program can be overcome.
The scheme provided by the embodiment of the invention comprises the following steps:
a method for expanding keywords in multiple dimensions comprises the following steps:
acquiring first keywords covered by the APP to be expanded in the application library platform, and obtaining second APPs related to the APPs to be expanded according to the APPs searched by the first keywords in the application library platform; acquiring second keywords covered by each second APP in the application library platform, and obtaining a third APP associated with the APP to be expanded according to the APPs searched by each second keyword in the application library platform; obtaining keywords covered by each third APP in the application library platform, and obtaining a first candidate keyword set according to the keywords covered by each third APP; determining the similarity of each third APP relative to the second APP set, determining the proportion of each keyword in the first candidate keyword set, and calculating a first similarity score of each keyword in the first candidate keyword set according to the similarity and the proportion; screening a first set number of keywords from the first candidate keyword set according to the first similarity score to obtain a first expanded keyword set;
key keywords are screened out from the first keywords, and a fourth APP associated with the APP to be expanded is obtained according to the APPs searched by the key keywords in the application library platform; obtaining a second candidate keyword set according to keywords covered by a fourth APP in the application library platform; determining the comprehensive similarity of each keyword in the second candidate keyword set relative to the key keyword set, determining the proportion of each keyword in the second candidate keyword set, and calculating a second similarity score of each keyword in the second candidate keyword set according to the proportion and the comprehensive similarity; screening a second set number of keywords from the second candidate keyword set according to the second similarity score to obtain a second expanded keyword set;
selecting a third set number of keywords from the first expansion keyword set and the second expansion keyword set to obtain expansion keywords of the APP to be expanded;
wherein, the keyword that APP covered needs to satisfy the condition: and the search result of the keyword in the application library platform contains the APP.
An apparatus for multidimensional expansion of keywords, comprising:
the first word expansion module is used for acquiring first keywords covered by the APP to be expanded in the application library platform and obtaining a second APP related to the APP to be expanded according to the APPs searched by the first keywords in the application library platform; acquiring second keywords covered by each second APP in the application library platform, and obtaining a third APP associated with the APP to be expanded according to the APPs searched by each second keyword in the application library platform; obtaining keywords covered by each third APP in the application library platform, and obtaining a first candidate keyword set according to the keywords covered by each third APP; determining the similarity of each third APP relative to the second APP set, determining the proportion of each keyword in the first candidate keyword set, and calculating a first similarity score of each keyword in the first candidate keyword set according to the similarity and the proportion; screening a first set number of keywords from a first candidate keyword set according to the first similarity score to obtain a first expanded keyword set;
the second word expansion module is used for screening key words from the first key words and obtaining a fourth APP related to the APP to be expanded according to the APPs searched by the key words on the application library platform; obtaining a second candidate keyword set according to keywords covered by a fourth APP in the application library platform; determining the comprehensive similarity of each keyword in the second candidate keyword set relative to the key keyword set, determining the proportion of each keyword in the second candidate keyword set, and calculating a second similarity score of each keyword in the second candidate keyword set according to the proportion and the comprehensive similarity; screening a second set number of keywords from the second candidate keyword set according to the second similarity score to obtain a second expanded keyword set; and the number of the first and second groups,
the screening module is used for selecting a third set number of keywords from the first expansion keyword set and the second expansion keyword set to obtain expansion keywords of the APP to be expanded;
wherein, the keywords covered by APP need to satisfy the conditions: and the search result of the keyword in the application library platform contains the APP.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as described above.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method described above when executing the program.
By implementing the embodiment, after receiving the APP to be expanded, on one hand, determining a second APP (a competitive product APP) associated with the APP to be expanded, and obtaining a first expansion keyword set based on the second APP; on the other hand, key keywords of the APP to be expanded are determined, and a second expansion keyword set is obtained based on the key keywords; and then, screening keywords on the basis of the first expansion keyword set and the second expansion keyword set to obtain final expansion keywords of the APP to be expanded. According to the technical scheme, the keywords can be expanded based on the two dimensions of the competitive product APP and the key keywords according to the APP to be expanded, and the quality and the comprehensiveness of the keyword expansion can be improved. In addition, by the keyword expansion method of the embodiment, a keyword expansion scheme corresponding to the APP to be expanded can be conveniently derived in batches, and the efficiency is greatly improved; the mass production is realized, and the expansion quality can be ensured.
Drawings
FIG. 1 is a schematic flow chart diagram of a method for multidimensional expansion of keywords in one embodiment;
FIG. 2 is a schematic flow chart diagram of a method for multidimensional expansion of keywords according to another embodiment;
FIG. 3 is a schematic block diagram of an apparatus for multidimensional expansion of keywords according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and indicates that three relationships can exist, and the character "/" generally indicates that the associated objects before and after the character are in an "or" relationship.
Although the steps in the embodiments of the present invention are arranged by using the reference numerals, the order of the steps is not limited to be limited, and the relative order of the steps can be adjusted unless the order of the steps is explicitly described or other steps are required for performing a step.
FIG. 1 is a schematic flow chart diagram of a method for multidimensional expansion of keywords in one embodiment; as shown in fig. 1, the method for expanding keywords in multiple dimensions in this embodiment includes the steps of:
s11, acquiring first keywords covered by the APP to be expanded in the application library platform, and obtaining a second APP associated with the APP to be expanded according to the APPs searched by the first keywords in the application library platform; acquiring second keywords covered by each second APP in the application library platform, and obtaining a third APP associated with the APPs to be expanded according to the APPs searched by each second keyword in the application library platform; obtaining keywords covered by each third APP in the application library platform, and obtaining a first candidate keyword set according to the keywords covered by each third APP; determining the similarity of each third APP relative to the second APP set, determining the proportion of each keyword in the first candidate keyword set, and calculating a first similarity score of each keyword in the first candidate keyword set according to the similarity and the proportion; and screening a first set number of keywords from the first candidate keyword set according to the first similarity score to obtain a first expanded keyword set.
The keywords in the embodiment of the present invention include all characters that can be used for searching for APP on the application library platform, such as chinese characters, english words, or letters, numbers, or other characters, and may also be a combination of several characters. The first keyword can be obtained by analyzing historical search information of an application library platform, wherein the historical search information comprises a mapping relation between the keyword and an APP (application), and can also be pre-specified according to an empirical value; the number of the first keywords covered by the APP to be expanded is multiple, and the number of the second APPs associated with the APP to be expanded is also multiple.
Wherein, the keywords covered by APP need to satisfy the conditions: and the search result corresponding to the keyword contains the APP. Namely, each first keyword comprises the APP to be expanded in the search result of the application library platform.
The second keyword may be obtained by analyzing historical search information of the application library platform, or may be pre-specified according to an empirical value. A second keyword covered by a second APP, which satisfies the condition: the second keyword contains the second APP in the search result of the application library platform. The second keywords covered by the second APPs and the keywords covered by the third APPs are multiple, and the third APPs associated with the APPs to be expanded are multiple.
The keywords covered by the third APP may be obtained by analyzing historical search information of the application library platform, or may be pre-specified according to an empirical value. A keyword covered by a third APP, which satisfies the condition: and the keyword comprises the third APP in the search result of the application library platform.
And the similarity between the third APP and the corresponding second APP represents the comprehensive association degree of the third APP and the corresponding second APP. In an embodiment, if one second APP corresponding to a third APP is available, obtaining a similarity between the third APP and the corresponding second APP as a similarity of the third APP with respect to a second APP set; if the number of the second APPs corresponding to the third APP is more than two, respectively obtaining the similarity between the third APP and each corresponding second APP, and calculating a similarity mean value by using the similarity mean value as the similarity of the third APP relative to the second APP set. The similarity between the third APP and the single second APP may be predetermined, or may be calculated in real time based on a search record of an application platform. The calculating of the similarity mean value includes calculating an absolute average value and calculating a weighted average value.
Wherein the proportion of each keyword in the first candidate keyword set is determined based on the importance of the keyword to the third APP. I.e., the importance of a keyword to an APP, characterizes the ranking information of that APP in the search results for that keyword. The importance of the keyword to the APP may be obtained in advance through data analysis of historical search record data of the application library platform, or may be a preset importance. If the search result is the former, in an embodiment, the method further includes a step of determining importance of each keyword to the searched APP in advance according to historical search record information of the application library platform.
S12, key keywords are screened out from the first keywords, and a fourth APP associated with the APP to be expanded is obtained according to the APPs searched by the key keywords in the application library platform; obtaining a second candidate keyword set according to keywords covered by a fourth APP in the application library platform; determining the comprehensive similarity of each keyword in the second candidate keyword set relative to the key keyword set, determining the proportion of each keyword in the second candidate keyword set, and calculating a second similarity score of each keyword in the second candidate keyword set according to the proportion and the comprehensive similarity; and screening a second set number of keywords from the second candidate keyword set according to the second similarity score to obtain a second expanded keyword set.
Wherein, the key keywords need to satisfy the conditions: in the search results of the key words, the APP to be expanded is ranked first, for example, the APP to be expanded is ranked 10 top.
Optionally, the step further includes screening APPs searched by each key keyword on the application library platform, and selecting only the APPs ranked at the top 100 (the number can be specifically set), thereby obtaining a fourth APP associated with the APP to be expanded. For example, 500 APPs which can be searched by a key keyword on an application library platform are selected, and only the APPs with the top 100 ranks are selected, so that the calculation complexity of subsequent keyword expansion can be reduced, and meanwhile, the later the rank is, the lower the relevance between the keyword and the APPs is, so that the APPs which are ranked later are removed, and the accuracy of keyword expansion can also be ensured. Wherein, a second level keyword covered by a fourth APP needs to satisfy the condition: the second-level keyword contains the fourth APP in the search result of the application library platform.
The similarity between each keyword in the second candidate keyword set and a single key keyword represents the association degree of the keywords in the same application platform, and can reflect the contact degree of the searched APPs, and the similarity between the keywords and the keywords can be predetermined or calculated in real time based on the search records of the application platform. Optionally, the calculating, according to the key words corresponding to the key words in the second candidate key word set and the similarity between each key word and the corresponding key word, the comprehensive similarity of each key word in the second candidate key word set with respect to the key word includes: and acquiring key keywords corresponding to the keywords in the second candidate keyword set and the similarity between the keywords and the corresponding key keywords, and calculating the average value of the similarity between the keywords in the second candidate keyword set and the corresponding key keywords as the comprehensive similarity of the keywords in the second candidate keyword set relative to the key keyword set. The average includes an absolute average and also includes a weighted average.
For example: suppose key words: "shopping", "Taobao"; the keyword "shopping" develops the keywords covered by the APP: [ available from Kyoto, su Ningyi ]; the keyword "panning" expands out the keyword that APP covered: [ Jingdong, tianmao ]; then the second candidate keyword set is [ kyoton, su Ningyi, cat ].
And the comprehensive similarity of the Beijing east in the second candidate keyword set relative to the key keyword set is as follows:
sim (kyotong) = [ sim (shopping, kyotong) + sim (naobao, kyotong) ]/2.
The comprehensive similarity of "Su Ningyi buy" in the second candidate keyword set with respect to the key keyword set is as follows:
sim (Su Ningyi ex) = sim (shopping, available from sunin).
The comprehensive similarity of the "tianmao" in the second candidate keyword set relative to the key keyword set is as follows:
sim (heaven cat) = sim (panning, heaven cat).
The proportion of each keyword in the second candidate keyword set is determined based on the importance of the keyword to the corresponding APP, and the importance of the keyword to one APP represents ranking information of the APP in the keyword search result. The importance of the keyword to the APP may be obtained in advance through data analysis of historical search record data of the application library platform, or may be a preset importance. If the search result is the former, in an embodiment, the method further includes the step of determining the importance of each keyword to the searched APP in advance according to the historical search record information of the application library platform.
And S13, selecting a third set number of keywords from the first expansion keyword set and the second expansion keyword set to obtain expansion keywords of the APP to be expanded.
By the keyword expanding method of the embodiment, after receiving the APP to be expanded and after receiving the APP to be expanded, on one hand, a second APP (a contest APP) associated with the APP to be expanded is determined, and a first expanded keyword set is obtained based on the second APP; on the other hand, key keywords of the APP to be expanded are determined, and a second expansion keyword set is obtained based on the key keywords; and then, screening keywords on the basis of the first expansion keyword set and the second expansion keyword set to obtain final expansion keywords of the APP to be expanded. According to the technical scheme, the keywords can be expanded based on two dimensions of the competitive product APP and the key keywords according to the APP to be expanded, and the quality and the comprehensiveness of keyword expansion can be improved.
In an embodiment, in the step S11, a set number of keywords with the first similarity score ranked from high to low may be selected from the first candidate keyword set to obtain the first expanded keyword set.
In an embodiment, a set number of keywords with the second similarity score ranked from high to low may be selected from the second candidate keyword set to obtain a second expanded keyword set.
Further, in an embodiment, in the step S13, selecting a third set number of keywords from the first expanded keyword set and the second expanded keyword set to obtain expanded keywords of the APP to be expanded includes: respectively recording the first expansion keyword set as W (1) The second expansion keyword set is W (2) Obtaining a third expansion keyword set from the first expansion keyword set and the second expansion keyword set, and recording the third expansion keyword set as W (3) For the third extended keyword set W (3) Normalizing the first similarity score or the second similarity score of each keyword; obtaining a third expansion keyword set W (3) The search index of each keyword is calculated according to the search index of each keyword and the similarity score after normalization processing (3) The final similarity score of each keyword in the database; from W according to the final similarity score (3) Select a set number of keywordsAnd obtaining the expansion keywords of the APP to be expanded.
Optionally, W is treated in the following way (3) Normalizing the first similarity score or the second similarity score of the ith keyword:
wherein s is i Is W (3) The first similarity score or the second similarity score, s, of the ith keyword min And s max Respectively represent W (3) Minimum and maximum of the median similarity score, s i Is' W (3) And (4) the normalized similarity score of the ith keyword.
Optionally, the method for calculating the final similarity score of each keyword includes:
query W (3) The search index of each keyword in the Chinese character is applied to W according to the principle of the similarity score normalization (3) Normalizing the search index of each keyword to obtain a search index correction value p' of each keyword; calculating the final similarity score of each keyword by adopting the following method:
score i (1) =α·s′ i +(1-α)p′
wherein the preset weight system α ∈ [0,1].
Said scoring from W according to final similarity (3) The method comprises the following steps of selecting keywords with set quantity to obtain expansion keywords of APP to be expanded, wherein the expansion keywords comprise: from W (3) Selecting a set number of keywords with final similarity scores ranked from high to low to obtain expansion keywords of the APP to be expanded; alternatively, the final similarity score may be sequentially changed from W to W (3) And selecting a set number of keyword phrases, wherein each keyword phrase comprises a plurality of keywords, and obtaining a plurality of groups of expansion keywords of the APP to be expanded. Therefore, through the keyword expansion method of the embodiment, the keyword expansion scheme corresponding to the APP to be expanded can be conveniently derived in batches, and the realization efficiency is greatly improvedLifting; the mass production is realized, and the expansion quality can be ensured.
In an embodiment, in the step S11, the process of obtaining the first keyword covered by the APP to be expanded in the application library platform may include: acquiring all keywords covered by the APP to be expanded according to the historical search records of the application library platform; all keywords covered by the APP to be expanded are subjected to exception screening so as to delete the exception keywords, and a first keyword covered by the APP to be expanded is obtained. Wherein the abnormal keyword comprises: the search index is abnormal, the data of the keyword search result is abnormal, the APP ranks in the search result to be abnormal, and the keyword is at least one characteristic of the character number abnormality. The search index is obtained by calculating the accumulated times (search amount) of APP search in the application library platform by adopting the keyword within the set statistical time and considering factors such as search magnitude and the like, the search index and the search amount present a forward relation and are estimated approximately empirically, and the search amount corresponding to the search index is as follows:
wherein, P is the search index, f (x) represents the non-simple linear growth relationship between the search index and the search quantity. The search index is abnormal, namely the search index is smaller than a set value; the abnormal search result means that the number of the APPs searched by the keywords is less than the set number; the importance degree abnormality means that the APP ranks relatively later in the search results of the keywords; word count anomalies mean that the number of words is too short or too long.
Correspondingly, the process of obtaining the second keyword covered by each second APP may include: acquiring all keywords covered by each second APP according to the historical search records of the application library platform; and carrying out exception screening on all keywords covered by each second APP to delete the exception keywords therein to obtain the second keywords covered by the second APP. And the process of acquiring the keywords covered by each third APP in the application library platform may include: acquiring all keywords covered by each third APP according to the historical search records of the application library platform; and carrying out exception screening on all the keywords covered by each third APP to delete the abnormal keywords therein to obtain the keywords covered by the third APP.
The keyword filtering processing aims at performing abnormal screening on the keywords, for example, the keyword search results are too few, the search index is too low, the search ranking is back, the word number is too short or too long, and the like belong to the abnormal conditions of the keywords, and the abnormal conditions are removed, so that the interference of abnormal data on subsequent expansion is prevented, and the accuracy of keyword expansion is improved.
In an embodiment, before the step S11, a step of determining importance of the keyword to the corresponding APP in advance is further included, where the step specifically includes: according to ranking information of the APP in the keyword search results, the importance of the keyword to the APP is assigned:
V_2(w)=(15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0.5)
V_3(r)=(0,1,3,6,10,16,22,30,40,50,65,80,100,120,150,200,∞)
w i =V_2(w) t ;V_3(r) t <rank≤V_3(r) t+1
wherein i is E [1,16](ii) a V-2 (w) is an importance weight vector; v-3 (r) is a ranking interval vector; infinity represents a positive infinity ranking; rank represents the ranking of APPs in the search results; w is a i Representing a keyword k i Importance to APP. For example, APP at keyword k i Is 2, then keyword k i The importance of the APP is w i =V_2(w) 2 =14;V_3(r) 2 <rank≤V_3(r) 3 . Wherein, V _2 (w) and V _3 (r) can be preset according to different application library platforms.
In an embodiment, in the step S12, a specific process of screening out a key word from first key words covered by an APP to be expanded includes: obtaining the importance of each first keyword to the APP to be expanded, and selecting the first keywords with the importance greater than or equal to a first set importance threshold as key keywords covered by the APP to be expanded; the importance of the keywords to the APP to be expanded represents ranking information of the APP to be expanded in the search results of the keywords.
In an optional embodiment, before obtaining the keywords covered by the APP to be expanded according to the historical search record information of the application library platform, a step of preprocessing the historical search record information of the application library platform is further included. For example, based on the search log information that occurred in the application library platform in the last week, the historical search log information includes keyword information for searching and search result information corresponding to each keyword. Such as keyword search results of the last week, APP information (which may include dimensions of APPID, APP name, affiliated list, etc.), keyword information (which includes dimensions of keyword ID, keyword, search index, search results, etc.).
In an alternative embodiment, the step of preprocessing the historical search record information of the application library platform may comprise: acquiring historical search record information of an application library platform in a set time period, and determining a first mapping relation corresponding to each keyword according to the historical search record information; the first mapping relation comprises APP information corresponding to the keyword and ranking information of the APP in the multiple search results of the keyword. Then, according to the first mapping relation of a plurality of keywords in the historical search record information, determining a second mapping relation corresponding to each APP; the second mapping relation comprises keywords corresponding to the APP and importance degrees of the keywords to the APP, the importance degrees are used for representing ranking information of the APP in the search results of the keywords, and the importance degrees of the keywords to the APP are larger as the APP ranks in the search results of the keywords in the first place. Further, a data mapping library corresponding to the application library platform is established according to the first mapping relation and the second mapping relation.
Based on the data mapping library, the obtaining of the first keyword corresponding to the APP to be expanded according to the historical search record information of the application library platform may include: and querying the data mapping database, acquiring a second mapping relation corresponding to the APP to be expanded, and acquiring a first keyword corresponding to the APP to be expanded and the importance of the first keyword according to the second mapping relation. The obtaining of APP information covered by each first-level key keyword on the application library platform may include: and querying the data mapping database, acquiring a first mapping relation corresponding to each first-level key word, and obtaining APP information covered by each first-level key word according to the first mapping relation. The obtaining of the APP searched by each key keyword on the application library platform may include: and querying the data mapping database, obtaining a first mapping relation corresponding to each key word, and obtaining APP information covered by each key word according to the first mapping relation.
In an embodiment, in step S11, obtaining the second APP associated with the APP to be expanded according to the APP searched by each first keyword in the application library platform includes: obtaining the APP frequency sequencing information in the multiple search results corresponding to each first keyword according to the multiple search results of each first keyword in the historical search records in a set historical time period; and acquiring a set number of APPs with the frequency sequence arranged at the front as APP information searched by each first keyword. Obtaining an APP matrix according to all the first keywords and APP information searched by each first keyword; and counting the occurrence frequency of each APP in the APP matrix, and selecting the APP with the occurrence frequency greater than or equal to the first set frequency in the APP matrix as a second APP associated with the APP to be expanded.
In an embodiment, in the step S11, obtaining a third APP associated with an APP to be expanded according to the APP searched by each second keyword includes: obtaining the APP frequency sequencing information in the multiple search results corresponding to each second keyword according to the multiple search results of each second keyword in the historical search records in the set historical time period; acquiring a set number of APPs with a frequency sequence arranged in front as APP information searched by each second keyword; obtaining an APP matrix according to all the second keywords and APP information searched by each second keyword; and counting the occurrence frequency of each APP in the APP matrix, and selecting the APP with the occurrence frequency greater than or equal to a second set frequency in the APP matrix as a third APP associated with the second APP.
In an embodiment, in the step S12, the obtaining a fourth APP according to the APP searched by the key word includes: obtaining frequency sequencing information of APP in multiple search results corresponding to key words according to multiple search results of the key words in a historical search record within a set historical time period; acquiring a set number of APPs with a frequency sequence arranged in front as the APPs searched by the key keywords; obtaining an APP matrix according to all key keywords and APPs searched by each key keyword; and counting the occurrence frequency of each APP in the APP matrix, and selecting the APPs with the occurrence frequency greater than or equal to the set frequency in the APP matrix to obtain a fourth APP.
Since the same keyword may be searched for multiple times in a set history period (e.g., within a week), the search result changes with the change of the search time. The search results are counted and summarized to finally obtain a keyword k 0 Corresponding APP set A (k) 0 ) And a frequency ordering vector V (k) 0 ),
A(k 0 )=(appid 1 ,appid 2 ,…,appid n )
V(k 0 )=(count 1 ,count 2 ,…,count n )
Wherein k is 0 Representing a keyword, count n Indicating the use of a keyword k within a set history period 0 Searching for appearing appid n Corresponding to the frequency of the app. Wherein, the frequency ranking information of APP in the multiple search results corresponding to the keyword refers to a frequency ranking vector V (k) 0 ) The frequency of said APP.
In an embodiment, after obtaining the second APPs and before obtaining the second keywords covered by each second APP, the method further includes the steps of: and acquiring an application list to which the APP to be expanded belongs in the application library platform, and deleting a second APP which belongs to an application list different from the APP to be expanded. Optionally, after obtaining the third APPs and before obtaining the keywords covered by each third APP, the method further includes: and acquiring an application list to which the APP to be expanded belongs in the application library platform, and deleting a third APP which belongs to a different application list from the APP to be expanded. Optionally, after obtaining the fourth APP, before obtaining the keyword covered by each fourth APP, the method further includes: and acquiring an application list to which the APP to be expanded belongs in the application library platform, and deleting a fourth APP which belongs to a different application list from the APP to be expanded. Therefore, the accuracy of subsequent keyword expansion can be improved.
In an embodiment, the similarity between each third APP and a single second APP is a similarity calculated in real time, and the specific calculation process includes: obtaining a feature vector of a second APP according to a second keyword covered by the second APP, and obtaining a feature vector of each third APP according to a keyword covered by each third APP; processing the feature vector of the second APP and the feature vector of the third APP through One-Hot coding to obtain a sparse feature vector of the second APP and a sparse feature vector of the third APP; and calculating the similarity between each third APP and the corresponding second APP according to the sparse feature vector of the second APP and the sparse feature vector of the third APP. Wherein, the sparse eigenvector of the second APP is equal to the sparse eigenvector of the third APP in dimension, and satisfies the condition: d V M + n is less than or equal to m; m denotes the dimension of the feature vector of the second APP, n denotes the dimension of the feature vector of the third APP, d V Representing dimensions of the sparse feature vector.
For example: for example, the APP to be expanded is the APP (1) Assuming that its corresponding second APP comprises (APP) (2) 1 、APP (2) 2 ) Wherein the second APPAPP (2) 1 The covered key word is (KW) (2) 1 ,KW (2) 2 ,KW (2) 3 ) This is used as the second APPAPP (2) 1 The feature vector dimension of (3); second APPAPP (2) 2 The covered key word is (KW) (2) 2 ,KW (2) 3 ,KW (2) 4 ,KW (2) 5 ) This is used as the second APPAPP (2) 2 The feature vector dimension of (2) is 4.
Further, a second APPAPP (2) 1 The corresponding third APP comprises (APP) (3) 1 ,APP (3) 2 ,APP (3) 3 ) (ii) a Second APPAPP (2) 2 The corresponding third APP comprises (APP) (3) 3 ,APP (3) 4 ,APP (3) 5 ) (ii) a From this a third set of APPs (APPs) is obtained (3) 1 ,APP (3) 2 ,APP (3) 3 ,APP (3) 4 ,APP (3) 5 ). In the third set of APPs, the APPs (3) 1 Corresponding second APP Only APP (2) 1 Thus, APP (3) 1 Similarity with a second APP, i.e. APP (3) 1 With APP (2) 1 The similarity of (2); APP (3) 3 The corresponding second APP has an APP (2) 1 And APP (2) 2 Thus, obtaining APP separately (3) 3 With APP (2) 1 Similarity of (A) and (B) APP (3) 3 With APP (2) 2 Calculating a similarity mean value by using the similarity mean value, and taking the similarity mean value as the APP (3) 3 Similarity to the second APP.
Further, a third APPAPP (3) 1 The covered key word is (KW) (3) 1 ,KW (3) 2 ,KW (3) 3 ) This is used as the third APPAPP (3) 1 The feature vector of (3), the feature vector dimension is 3; third APPAP (3) 2 The covered key word is (KW) (3) 4 ,KW (3) 2 ,KW (3) 3 ,KW (3) 5 ) This is used as the third APPAPP (3) 2 The feature vector dimension of (2) is 4. Wherein, KW (3) 2 And KW (2) 2 Are the same keyword.
Thus, the second APPAPP (2) 1 Characteristic vector (KW) (2) 1 ,KW (2) 2 ,KW (2) 3 ) The third APPAPP (3) 1 Characteristic vector (KW) (3) 1 ,KW (3) 2 ,KW (3) 3 ),KW (3) 2 And KW (2) 2 Is the same keyword, so the feature vectors formed by the two in the real number space are (KW) (2) 1 ,KW (2) 2 ,KW (2) 3 ,KW (3) 1 ,KW (3) 3 ) Dimension of 5&=3+3, and the sparse feature vectors obtained by the two methods are respectively: second APPAPP (2) 1 Sparse feature vector of (2): (1,1,1,0,0), third APPAPPPP (3) 1 Sparse feature vector of (2): (0,1,0,1,1).
Based on the above embodiment, optionally, the similarity between each third APP and a single second APP is calculated by the following formula:
in the formula, APP (2) t Denotes the t-th second APP; s (3) i Represents the ith third APP; v (APP) (2) t )·V(S (3) i ) Denotes APP (2) t Sparse feature vector of (1) and (S) (3) i Inner product of sparse feature vectors of (a); i V (APP) (2) t )|| 2 ||V(S (3) i )|| 2 Denotes APP (2) t Sparse feature vector of (1) and (S) (3) i Is the product of the 2-norm of the sparse feature vector.
It is understood that the calculation method of the similarity between two APPs includes, but is not limited to, the above algorithm for calculating the similarity based on the cosine similarity, and other algorithms for calculating the similarity may also be used.
In an embodiment, obtaining the first candidate keyword set according to the keywords covered by each third APP includes: and obtaining a keyword matrix associated with the second APP according to the third APPs associated with the second APP and the keywords covered by the third APPs. Merging and counting the keywords in the keyword matrix to obtain a first candidate keyword set KW: ( 3) =(kw (3) 1 ,kw (3) 2 ,…,kw (3) n ) And the corresponding keyword frequency vector is C (3) =(c 1 ,c 2 ,…,c n )。
Further, first candidate keyword set KW (3) The proportion of the ith keyword in the list is as follows:
where i =1,2, …, n, n represents the first candidate keyword set KW (3) The total number of keywords contained therein.
In an embodiment, the calculating a first similarity score of each keyword in the first candidate keyword set according to the similarity and the specific gravity includes: and obtaining a first similarity score of the keyword in the first candidate keyword set according to the product of the proportion of the keyword in the first candidate keyword set and the similarity of the third APP corresponding to the keyword and the second APP. Specific examples thereof include: calculating a first similarity score of each keyword in the first candidate keyword set by the following formula:
score(kw (3) i )=V (1) i V (2) i
wherein kw is (3) i Representing a first set of candidate keywords KW (3) The ith keyword in (1), V (1) i Represents kw (3) i Similarity, V, of the corresponding third APP with the second APP (2) i Represents kw (3) i The occupied specific gravity; i =1,2, …, n, n represents the first set of candidate keywords KW (3) The total number of keywords contained therein.
It can be understood that the first similarity score of the keyword in the first candidate keyword set is obtained according to the product of the specific gravity of the keyword in the first candidate keyword set and the similarity of the third APP and the second APP corresponding to the keyword, that is, the first similarity score may be a direct product, or may be a product obtained by multiplying the first candidate keyword set by a scaling coefficient.
And finally, screening the first candidate keyword set according to the first similarity score to obtain a first expanded keyword set. According to the embodiment, expansion of the first expansion keyword set can be realized based on the competitive product APP according to the APP to be expanded, and the expansion efficiency is high.
In an embodiment, the step S12 further includes a step of determining similarity between each keyword in the second candidate keyword set and the corresponding key keyword, and the specific process of the step includes:
obtaining a feature vector of each keyword according to the searched APP of each keyword in the second candidate keyword set, and obtaining a feature vector of each key keyword according to the searched APP of each key keyword; carrying out One-Hot coding processing on the feature vector of each keyword in the second candidate keyword set and the feature vector of each key keyword respectively to obtain a sparse feature vector of the keyword and a sparse feature vector of each key keyword in the second candidate keyword set; and calculating the similarity between the keywords and each key keyword in the second candidate keyword set according to the sparse feature vector of the keywords and the sparse feature vector of each key keyword in the second candidate keyword set.
Further, an average value of the similarity between the keywords in the second candidate keyword set and the corresponding key keywords may be calculated as a comprehensive similarity between the keywords in the second candidate keyword set and the corresponding key keywords. The average may be an absolute average or a weighted average.
Optionally, the similarity between the ith keyword in the second candidate keyword set and the corresponding key keyword is determined as follows:
in the formula, KW (1)′ Set of key words, KW (1)′ k Representing the Kth key keyword; KW (2)′ i Representing the ith keyword in the second candidate keyword set; v (KW) (1)′ k )·V(KW (2)′ i ) Indicating KW (1)′ k Sparse feature vector and KW (2)′ i Inner product of sparse feature vectors of (a); i V (KW) (1)′ k )|| 2 ||V(KW (2)′ i )|| 2 Indicating KW (1)′ k Sparse feature vector and KW (2)′ i Is the product of the 2-norm of the sparse feature vector.
It is understood that the method for calculating the similarity between two keywords includes, but is not limited to, the above algorithm for calculating the similarity based on cosine similarity, and other algorithms for calculating the similarity may also be used.
In an embodiment, in the step S12, obtaining the second candidate keyword set according to the keywords covered by the fourth APP includes: obtaining a keyword matrix according to the keywords covered by all the fourth APPs; merging and counting the keywords in the keyword matrix to obtain a second candidate keyword set KW (2) =(kw (2) 1,kw 2 (2) ,…,kw n (2) ) And a keyword frequency vector C corresponding to the second candidate keyword set (2) =(c 1 ,c 2 ,…,c n )。
Second set of candidate keywords KW (2) The proportion of the ith keyword in the list is as follows:
where i =1,2, …, n, n represents a second candidate keyword set KW (2) The total number of keywords contained therein.
In an embodiment, in the step S12, calculating a second similarity score of each keyword in a second candidate keyword set according to the specific gravity and the comprehensive similarity includes: and obtaining a second similarity score of the keyword in the second candidate keyword set according to the product of the proportion of the keyword in the second candidate keyword set and the comprehensive similarity of the keyword relative to the key keyword. Specific examples thereof include: the second similarity score for each keyword in the second set of candidate keywords may be calculated by the following formula:
sim(KW (2) i )=weight i ·cos′ i
wherein, KW (2) Representing the ith keyword, weight, in the second set of candidate keywords i Representing a second set of candidate keywords KW (2) Specific gravity of the ith keyword, cos i Representing a second set of candidate keywords KW (2) The comprehensive similarity of the ith keyword.
It can be understood that the second similarity score of the keyword in the second candidate keyword set is obtained according to the product of the specific gravity of the keyword in the second candidate keyword set and the comprehensive similarity of the keyword and the corresponding key keyword, which may be a direct product or a product obtained by multiplying the specific gravity of the keyword by a proportionality coefficient.
And finally, screening a second candidate keyword set according to the second similarity score to obtain a second expanded keyword set. According to the technical scheme, the second expansion keyword set can be obtained based on the key keywords of the APP to be expanded, the expansion breadth of the second expansion keyword set is guaranteed, and the expansion quality of the keywords is guaranteed.
The apple application store is taken as an example below to further explain the keyword expansion process of the embodiment of the invention, and the other application library platforms have the same principle.
Referring to fig. 2, the expansion process of the first expansion keyword set includes the following steps.
1. Keyword content crawling
And acquiring historical search record data of the apple application store in the last week by using the apple developer API, wherein the historical search record data comprises but is not limited to application names, keyword details, keyword search indexes, keyword search results, application lists and the like.
2. Pre-processing of historical keyword search record data
2.1 Forward mapping of keywords to APP, denoted A (k), represents the search results for keyword k, the index of index for appid represents the actual ranking of searching APP with keyword k,
A(k)=(appid 1 ,appid 2 ,…,appid n ) (2-1)
wherein n is a positive integer.
It should be noted that, in the embodiment of the present invention, the APPs may be identified by appids, and the appids are uniformly allocated by the application library platform and used for identifying different APPs.
2.2 inverse mapping relation between app and keyword, denoted as K (a), representing all keywords covered by application a:
K(a)=(keyword 1 ,...,keyword n ) (2-2)
wherein n is a positive integer.
3. Obtaining competitive products APP (namely the second APP associated)
3.1 noting that the APP to be expanded is APP (1) ;
3.2 obtaining APP by K (a) (1) Overlaid keyword set K (APP) (1) ) A first keyword covered by the APP to be expanded;
3.3 keyword set K (APP) (1) ) And (5) carrying out exception screening. The data abnormal conditions that the keyword search results are too few, the search index is too low, the search ranking is back, the word number is too short or too long are all the data abnormal conditions, and the data abnormal conditions are eliminated;
3.4 obtaining keyword set K (APP) by A (K) (1) ) Each keyword in the set corresponds to appid, and is marked as A (K (APP) (1) ));
3.5 pairs of A (K (APP) (1) ) ) performing merging statistics, and taking the appid n before the frequency ranking as the APP set S (1)′ ;
3.6 reject APP set S (1)′ Neutralization of APP (1) The APP which do not belong to the same application list only takes k APP as the competitive product APP and records as the competitive product APP set S (1) I.e. the second APP to which the APP association is to be extended.
4. APP expansion keyword
Note S (1) i Aggregating APP for an item of contest S (1) The ith APP in the game is traversed to the competitive product APP set S (1) The method comprises the following steps:
4.1 obtaining Association appid
Obtaining a competitive product APPS in the same step as the first 5 steps in the step 3 (1) i The associated APP is marked as the third APP, and the corresponding set uses S (3) Represents:
A(K(S (1) i ))=(appid 1 ,…,appid n ) (4-1)
further, the keyword matrix covered by (4-1) can be obtained:
and 4.2, extracting the feature vector.
The competitive products are APPS (1) i Characteristic vector of (1) and S (3) The feature vectors covered by each APP (namely, the feature vectors and the corresponding row of keywords in (4-2)) are subjected to One-Hot coding, and thus the competitive product APPS is obtained (1) i sparse feature vector V (S) (1) i ) And S (3) Sparse feature vector V (S) covered by each APP (3) i )。
4.3 calculate APP similarity.
Based on the results of 4.2, S was calculated (3) Middle APP and S (1) i The similarity of (c) is as follows:
in the formula, S (3) j Denotes S (3) The jth third APP of (1); v (S) (3) j )·V(S (1) i ) Denotes S (3) j Sparse feature vector of (1) and (S) (1) i Inner product of sparse feature vectors of (a); i V (S) (3) j )|| 2 ||V(S (1) i )|| 2 Denotes S (3) j Sparse feature vector of (1) and (S) (1) i Is the product of the 2-norm of the sparse feature vector.
Grouping the keywords in (4-2)And counting to obtain a first candidate keyword set KW (3) =(kw (3) 1 ,kw (3) 2 ,…,kw (3) n ) And the corresponding frequency vector is C (3) =(c 1 ,c 2 ,…,c n );
First set of candidate keywords KW (3) The specific gravity of the ith keyword is as follows:
where i =1,2, …, n, n represents the first candidate keyword set KW (3) The total number of keywords contained therein.
4.4 calculating a first similarity score of each keyword in the first candidate keyword set.
According to the similarity of (4-3) and the specific gravity of (4-4), the first candidate keyword set KW can be calculated (3) Each of which is close
A first similarity score for the keyword.
Finally, for the first candidate keyword set KW (3) The medium keywords are subjected to reverse order (from high to low) according to the first similarity score, and KW is taken (3) M in the front of the system to obtain a first expansion keyword set W (1) 。
Referring to fig. 2, the expansion process of the second expansion keyword set includes the following steps.
5. Obtaining key keywords
Recording the APP as the APP (1) (ii) a Obtaining the first keyword covered by the APP to be expanded in the same way as the first keyword in the expansion process of the first expansion keyword set, and recording the first keyword as K (APP) (1) )。
For the first keyword K (APP) (1) ) Performing exception screening, wherein the data exception condition includes that the keyword search result is too few, the search index is too low, the search ranking is late, the word number is too short or too long, and the data exception condition is removed from the set corresponding to the first keyword; then selecting APP in the search result according to A (k) (1) The key words with the top k ranking are used as key words and recorded as key word set KW (1)′ ;
6. Keyword expansion keyword
Note KW (1)′ i Set KW of key words (1)′ The ith keyword in (1) traverses KW (1)′ The method comprises the following steps:
6.1 obtaining keyword KW according to A (k) (1)′ i Corresponding to appid, take APP at k before ranking, and record as A (KW) (1)′ i );
6.2 obtaining A (KW) from K (a) (1)′ i ) The keywords covered by each APP in the database are marked as K (A (KW) (1)′ i ) Merging and counting the keywords to obtain the frequency of the keywords, and taking k keywords with the front frequency to obtain a second candidate keyword set as follows: KW (2) =(kw (2) 1,kw 2 (2) ,…,kw n (2) ) The frequency vector is: c (2) =(c 1 ,c 2 ,…,c n );
Defining a second set of candidate keywords KW (2) The specific gravity of the ith keyword is as follows:
wherein i =1,2, …, n, c i As a second set of candidate keywords KW (2) The frequency of the ith keyword.
6.3 obtaining key keywords KW according to A (k) (1)′ i With a second set of candidate keywords KW (2) And taking the appid corresponding to each keyword as a feature vector of the keyword, and acquiring corresponding sparse feature vectors based on One-Hot coding. Based on the sparse feature vectors corresponding to the keywords respectively, key keywords KW can be calculated (1)′ k With a second set of candidate keywords KW (2) The cosine similarity of the ith keyword is recorded as cos i 。
In the formula, KW (1)′ k Representing the Kth key keyword; KW (2) i Representing the ith keyword in the second candidate keyword set; v (KW) (1)′ k )·V(KW (2) i ) Indicating KW (1)′ k Sparse feature vector and KW (2) i Inner product of sparse feature vectors of (d); i V (KW) (1)′ k )|| 2 ||V(KW (2) i )|| 2 Indicating KW (1)′ k Sparse feature vector and KW (2) i Is the product of the 2-norm of the sparse feature vector.
Calculating the average value of the similarity between the keywords in the second candidate keyword set and the corresponding key keywords as the comprehensive similarity between the keywords in the second candidate keyword set and the corresponding key keywords, and recording the comprehensive similarity of the keywords as cos i ′。
6.4 calculating a second similarity score of the keywords in the second candidate keyword set as:
sim(KW (2) i )=weight i ·cos′ i ;
wherein, KW (2) i Representing the ith keyword, weight, in the second set of candidate keywords i Representing a second set of candidate keywords KW (2) Specific gravity of the ith keyword in the Chinese character, "cos' i Representing a second set of candidate keywords KW (2) The comprehensive similarity of the ith keyword.
6.5 finally, for the second set of candidate keywords KW (2) The middle keywords are subjected to reverse order (from high to low) according to the second similarity score, and KW is taken (2) M key words in the middle and top, thereby obtaining a second expansion key word set W (2) 。
7. Normalization
To eliminate dimension, a first set of extension keywords W is generated (1) And a second extended keyword set W (2) The similarity scores of the middle keywords are normalized to an interval [0,1] according to the following formula]The method specifically comprises the following steps:
expanding a keyword set W by a first (1) A second expanded keyword set W (2) Obtain a keyword set W (3) The set W is corrected by (3) Normalizing the first similarity score or the second similarity score of each keyword:
wherein s is i Is W (3) The first similarity score or the second similarity score, s, of the ith keyword min And s max Are respectively provided with
Represents W (3) Minimum value and maximum value, s ', of similarity score corresponding to medium keyword' i Is W (3) And (4) the normalized similarity score of the ith keyword.
8. Exception screening
W (3) And if the search index of the medium key words is too low, the length is too small or too large, the medium key words are complex characters, the covering words of the APP of the medium key words and the APP of the competitive products do not exist, and the medium key words are removed.
9. Calculating a score
Query W (3) The search index of the medium key word is normalized based on the principle of a formula (7-1) to obtain W (3) Setting weight alpha epsilon [0,1] for search index correction value p' of each keyword]W is calculated according to the following formula (3) The final similarity score of each keyword in (1):
score i (1) =α·s′ i +(1-α)p′ (9-1)
10. keyword output scheme
10.1 pairs of W (3) The keywords are arranged in a reverse order according to the final similarity score;
10.2 selecting whether the duplicate value needs to be removed to increase the information content of the scheme; if yes, step 10.3 is executed, if no, the keyword scheme is limited to be output when the set word number (for example, 100 words) is satisfied (not by the number of keywords, such as "shopping" and "social", which are two keywords but four words), and so on until three sets of keyword schemes are output, and the score of each set of keyword scheme is determined as:
in the formula, i =1,2,3,m is the number of keywords in each set of keyword scheme.
10.3 if the duplication value needs to be eliminated to increase the information content of the scheme, first, W is obtained (3) Middle ith keyword W (3) i And the (i + 1) th keyword W (3) i+1 If the length of str (i, i + 1) is greater than the predetermined length, W is added (3) i Str (i, i + 1) of (1) is replaced with W (3) i+1 For example, "Taobao mobile phone" and "Taobao shopping" can be combined into "Taobao mobile phone shopping"; otherwise, the interval is comma. And similarly, 10.2, outputting three sets of keyword schemes and determining the score of each set of keyword scheme.
In the above steps, 1-2 can be off-line calculation, and are updated periodically, for example, once again every week. And 3-10, performing online calculation, inquiring the data mapping database for each APP name input by the user to obtain a corresponding appid, and further comprehensively developing keywords corresponding to the APP from two dimensions.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present invention is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present invention. Further, the above embodiments may be arbitrarily combined to obtain other embodiments.
Based on the same idea as the method for expanding the keywords in the foregoing embodiment, the present invention further provides a device for expanding the keywords in multiple dimensions, where the device can be used to execute the method for expanding the keywords in multiple dimensions. For convenience of description, in the structural schematic diagram of the device embodiment of the multi-dimensional expansion keyword, only the part related to the embodiment of the present invention is shown, and those skilled in the art will understand that the illustrated structure does not constitute a limitation to the device, and may include more or less components than those illustrated, or combine some components, or arrange different components.
FIG. 3 is a schematic block diagram of an apparatus for multidimensional expansion of keywords according to an embodiment of the present invention; as shown in fig. 3, the apparatus for expanding keywords in multiple dimensions of this embodiment includes:
the first word expansion module is used for acquiring first keywords covered by the APP to be expanded in the application library platform and obtaining a second APP related to the APP to be expanded according to the APPs searched by the first keywords in the application library platform; acquiring second keywords covered by each second APP in the application library platform, and obtaining a third APP associated with the APP to be expanded according to the APPs searched by each second keyword in the application library platform; obtaining keywords covered by each third APP in the application library platform, and obtaining a first candidate keyword set according to the keywords covered by each third APP; determining the similarity of each third APP relative to the second APP set, determining the proportion of each keyword in the first candidate keyword set, and calculating a first similarity score of each keyword in the first candidate keyword set according to the similarity and the proportion; screening a first set number of keywords from a first candidate keyword set according to the first similarity score to obtain a first expanded keyword set;
the second word expansion module is used for screening out key words from the first key words and obtaining a fourth APP associated with the APPs to be expanded according to the APPs searched by the key words in the application library platform; obtaining a second candidate keyword set according to keywords covered by a fourth APP in the application library platform; determining the comprehensive similarity of each keyword in the second candidate keyword set relative to the key keyword set, determining the proportion of each keyword in the second candidate keyword set, and calculating a second similarity score of each keyword in the second candidate keyword set according to the proportion and the comprehensive similarity; screening a second set number of keywords from the second candidate keyword set according to the second similarity score to obtain a second expanded keyword set; and the number of the first and second groups,
the screening module is used for selecting a third set number of keywords from the first expansion keyword set and the second expansion keyword set to obtain expansion keywords of the APP to be expanded;
wherein, the keywords covered by APP need to satisfy the conditions: and the search result of the keyword in the application library platform contains the APP.
It should be noted that, in the implementation of the foregoing exemplary multi-dimensional keyword expansion apparatus, because the content of information interaction, execution process, and the like between the modules is based on the same concept as that of the foregoing method embodiment of the present invention, the technical effect brought by the content is the same as that of the foregoing method embodiment of the present invention, and specific content may refer to the description in the method embodiment of the present invention, and is not described herein again.
In addition, in the above embodiment of the device for multidimensional extension keywords, the logical division of each program module is only an example, and in practical applications, the above function distribution may be completed by different program modules according to needs, for example, due to the configuration requirements of corresponding hardware or the convenience of implementation of software, that is, the internal structure of the device for multidimensional extension keywords is divided into different program modules to complete all or part of the above described functions.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium and sold or used as a stand-alone product. When executed, the program may perform all or a portion of the steps of the methods of the various embodiments described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
Accordingly, in an embodiment, a storage medium is further provided, on which a computer program is stored, wherein the program, when executed by a processor, implements the method for multidimensional expansion of keywords as in any of the above embodiments.
In addition, the storage medium may be provided in a computer device, and the computer device further includes a processor, and when the processor executes the program in the storage medium, all or part of the steps of the method in the foregoing embodiments can be implemented.
Accordingly, in an embodiment, a computer device is also provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the method for multi-dimensional expansion keywords according to any one of the embodiments described above.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. It will be understood that the terms "first," "second," and the like as used herein are used herein to distinguish one object from another, but the objects are not limited by these terms. The above-described examples merely represent several embodiments of the present invention and should not be construed as limiting the scope of the invention. It should be noted that various changes and modifications can be made by those skilled in the art without departing from the spirit of the invention, and these changes and modifications are all within the scope of the invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (15)
1. A method for expanding keywords in multiple dimensions is characterized by comprising the following steps:
acquiring first keywords covered by the APP to be expanded in the application library platform, and acquiring a second APP related to the APP to be expanded according to the APPs searched by the first keywords in the application library platform; acquiring second keywords covered by each second APP, and obtaining a third APP associated with the APP to be expanded according to the APPs searched by each second keyword; obtaining keywords covered by each third APP, and obtaining a first candidate keyword set according to the keywords covered by each third APP; determining the similarity of each third APP relative to the second APP set, determining the proportion of each keyword in the first candidate keyword set, and calculating a first similarity score of each keyword in the first candidate keyword set according to the similarity and the proportion; screening a first set number of keywords from a first candidate keyword set according to the first similarity score to obtain a first expanded keyword set;
key keywords are screened out from the first keywords, and a fourth APP associated with the APP to be expanded is obtained according to the APPs searched by the key keywords; obtaining a second candidate keyword set according to the keywords covered by the fourth APP; determining the comprehensive similarity of each keyword in the second candidate keyword set relative to the key keyword set, determining the proportion of each keyword in the second candidate keyword set, and calculating a second similarity score of each keyword in the second candidate keyword set according to the proportion and the comprehensive similarity; screening a second set number of keywords from the second candidate keyword set according to the second similarity score to obtain a second expanded keyword set;
selecting a third set number of keywords from the first expansion keyword set and the second expansion keyword set to obtain expansion keywords of the APP to be expanded;
wherein, the keywords covered by APP need to satisfy the conditions: and the search result of the keyword in the application library platform contains the APP.
2. The method for multi-dimensional expansion of keywords according to claim 1,
the acquiring of the first keyword covered by the APP to be expanded in the application library platform comprises the following steps: acquiring all keywords covered by the APP to be expanded according to the historical search records of the application library platform; all keywords covered by the APP to be expanded are subjected to abnormal screening to delete the abnormal keywords in the keywords to be expanded, so that first keywords covered by the APP to be expanded are obtained;
and/or the presence of a gas in the gas,
the obtaining of the second keyword covered by each second APP includes: acquiring all keywords covered by each second APP according to the historical search records of the application library platform; performing abnormal screening on all keywords covered by each second APP to delete the abnormal keywords therein to obtain second keywords covered by the second APP;
and/or the presence of a gas in the gas,
the obtaining of the keywords covered by each third APP includes: acquiring all keywords covered by each third APP according to the historical search records of the application library platform; performing abnormal screening on all keywords covered by each third APP to delete the abnormal keywords therein to obtain the keywords covered by the third APP;
the abnormal keywords include: the search index is abnormal, the data of the keyword search result is abnormal, the APP ranks in the search result to be abnormal, and the keyword is at least one characteristic of the character number abnormality.
3. The method for multi-dimensional expansion of keywords according to claim 2,
the obtaining of the second APP associated with the APP to be expanded according to the APPs searched by the first keywords on the application library platform includes: obtaining frequency sequencing information of APP in multiple search results corresponding to each first keyword according to multiple search results of each first keyword in a historical search record within a set historical time period; acquiring a set number of APPs with a frequency sequence arranged in front as the APPs searched by each first keyword; obtaining an APP matrix according to all the first keywords and APP information searched by each first keyword; counting the occurrence frequency of each APP in the APP matrix, and selecting the APP with the occurrence frequency greater than or equal to a first set frequency in the APP matrix as a second APP associated with the APP to be expanded;
and/or the presence of a gas in the gas,
the obtaining of the third APP associated with the APP to be expanded according to the APP searched by each second keyword includes: obtaining frequency sequencing information of the APP in the multiple search results corresponding to each second keyword in the historical search records according to the multiple search results of each second keyword in the historical search records within a set historical time period; acquiring a set number of APPs with a frequency sequence arranged in front as the APPs searched by each second keyword; obtaining an APP matrix according to all the second keywords and the APPs searched by the second keywords; counting the occurrence frequency of each APP in the APP matrix, and selecting an APP with the occurrence frequency greater than or equal to a second set frequency in the APP matrix as a third APP associated with the second APP;
and/or the presence of a gas in the gas,
obtaining a fourth APP according to the APP searched by the key keyword, including: obtaining frequency sequencing information of APP in multiple search results corresponding to key words according to multiple search results of the key words in a historical search record within a set historical time period; acquiring a set number of APPs with a frequency sequence arranged in front as the APPs searched by the key keywords; obtaining an APP matrix according to all key keywords and APPs searched by each key keyword; and counting the occurrence frequency of each APP in the APP matrix, and selecting the APPs with the occurrence frequency greater than or equal to the set frequency in the APP matrix to obtain a fourth APP.
4. The method for multi-dimensional expansion of keywords according to claim 3,
after obtaining the second APPs, before obtaining the second keywords covered by each second APP, the method further includes: acquiring an application list to which the APP to be expanded belongs in an application library platform, and deleting a second APP which belongs to an application list different from the APP to be expanded;
and/or the presence of a gas in the gas,
after obtaining the third APP, before obtaining the keyword covered by each third APP, the method further includes: acquiring an application list to which the APP to be expanded belongs in an application library platform, and deleting a third APP belonging to a different application list from the APP to be expanded;
and/or the presence of a gas in the gas,
after obtaining the fourth APP, before obtaining the keyword covered by each fourth APP, the method further includes: and acquiring an application list to which the APP to be expanded belongs in the application library platform, and deleting a fourth APP which belongs to a different application list from the APP to be expanded.
5. The method for multi-dimensional expansion of keywords according to claim 1,
the determining the similarity of each third APP with respect to the second APP set includes: if one second APP corresponding to the third APP is available, acquiring the similarity between the third APP and the corresponding second APP as the similarity of the third APP relative to the second APP set; if the number of the second APPs corresponding to the third APP is more than two, respectively obtaining the similarity between the third APP and each corresponding second APP, calculating a similarity mean value, and taking the similarity mean value as the similarity of the third APP relative to the second APP set;
and/or the presence of a gas in the gas,
determining the comprehensive similarity of each keyword in the second candidate keyword set relative to the key keyword set, including: if one key word corresponding to one key word in the second candidate key word set is one key word, acquiring the similarity between the key word and the corresponding key word, and taking the similarity as the comprehensive similarity of the key word relative to the key word set; if the number of the key words corresponding to one key word in the second candidate key word set is more than two, the similarity between the key word and each corresponding key word is respectively obtained, so that the similarity mean value is calculated and is used as the comprehensive similarity of the key word relative to the key word set.
6. The method for multi-dimensional expansion of keywords according to claim 5, further comprising:
a step of determining a similarity between the third APP and the corresponding second APP, the step comprising: obtaining a feature vector of a second APP according to a second keyword covered by the second APP, and obtaining a feature vector of each third APP according to a keyword covered by each third APP; processing the feature vector of the second APP and the feature vector of the third APP through One-Hot coding to obtain a sparse feature vector of the second APP and a sparse feature vector of the third APP; calculating the similarity between each third APP and the corresponding second APP according to the sparse feature vector of the second APP and the sparse feature vector of the third APP;
and/or the presence of a gas in the gas,
determining the similarity between each keyword in the second candidate keyword set and the corresponding key keyword, including: obtaining a feature vector of each keyword according to the searched APP of each keyword in the second candidate keyword set, and obtaining a feature vector of each key keyword according to the searched APP of each key keyword; carrying out One-Hot coding processing on the feature vector of each keyword in the second candidate keyword set and the feature vector of the corresponding key keyword respectively to obtain a sparse feature vector of the keyword in the second candidate keyword set and a sparse feature vector of the corresponding key keyword; and calculating the similarity between the keywords in the second candidate keyword set and the corresponding key keywords according to the sparse feature vectors of the keywords in the second candidate keyword set and the sparse feature vectors of the corresponding key keywords.
7. The method for multi-dimensional expansion of keywords according to claim 6, wherein the similarity between each third APP and the corresponding second APP is calculated by the following formula:
in the formula, APP (2) t Denotes the t-th second APP; s. the (3) i Represents the ith third APP; v (APP) (2) t )·V(S (3) i ) Denotes APP (2) t Sparse feature vector of (1) and (S) (3) i Inner product of sparse feature vectors of (a); i V (APP) (2) t )|| 2 ||V(S (3) i )|| 2 Denotes APP (2) t Sparse feature vector of (1) and (S) (3) i 2-norm product of the sparse feature vectors of (a);
and/or the presence of a gas in the atmosphere,
calculating the similarity between the ith keyword and the corresponding key keyword in the second candidate keyword set by the following formula:
in the formula, KW (1)′ k Representing the Kth key keyword; KW (2)′ i Representing the ith keyword in the second candidate keyword set; v (KW) (1)′ k )·V(KW (2)′ i ) Indicating KW (1)′ k Sparse feature vector and KW (2)′ i Inner product of sparse feature vectors of (d);
||V(KW (1)′ k )|| 2 ||V(KW (2)′ i )|| 2 indicating KW (1)′ k Sparse feature vector and KW (2)′ i Is the product of the 2-norm of the sparse feature vector.
8. The method for multi-dimensional expansion of keywords according to claim 7,
obtaining a first candidate keyword set according to the keywords covered by each third APP, including: obtaining a keyword matrix according to third APPs associated with the second APP and keywords covered by the third APPs; merging and counting the keywords in the keyword matrix to obtain a first candidate keyword set KW (3) =(kw (3) 1 ,kw (3) 2 ,…,kw (3) n ) And the corresponding keyword frequency vector is C (3) =(c 1 ,c 2 ,…,c n ) Each element of the keyword frequency vector corresponds to the occurrence frequency of each keyword in the first candidate keyword set respectively;
the first candidate keyword set KW (3) The proportion of the ith keyword in the list is as follows:
where i =1,2, …, n, n represents the first candidate keyword set KW (3) The total number of keywords contained therein;
and/or the presence of a gas in the gas,
obtaining a second candidate keyword set according to the keywords covered by the fourth APP, including: obtaining a keyword matrix according to the keywords covered by all the fourth APPs; merging and counting the keywords in the keyword matrix to obtain a second candidate keyword set KW (2) =(kw (2) 1 ,kw 2 (2) ,…,kw n (2) ) And a keyword frequency vector C corresponding to the second candidate keyword set (2) =(c 1 ,c 2 ,…,c n ) (ii) a Each element of the keyword frequency vector corresponds to the occurrence frequency of each keyword in the second candidate keyword set respectively;
the second candidate keyword set KW (2) The proportion of the ith key word is as follows:
where i =1,2, …, n, n represents the second candidate keyword set KW (2) The total number of keywords contained therein.
9. The method for multi-dimensional expansion of keywords according to any of claims 1 to 8,
calculating a first similarity score of each keyword in a first candidate keyword set according to the similarity and the proportion, wherein the first similarity score comprises the following steps: obtaining a first similarity score of the keyword in the first candidate keyword set according to the product of the proportion of the keyword in the first candidate keyword set and the similarity of a third APP corresponding to the keyword and a corresponding second APP;
and/or the presence of a gas in the gas,
calculating a second similarity score of each keyword in a second candidate keyword set according to the proportion and the comprehensive similarity, wherein the second similarity score comprises the following steps: and obtaining a second similarity score of the keyword in the second candidate keyword set according to the product of the proportion of the keyword in the second candidate keyword set and the comprehensive similarity of the keyword and the corresponding key keyword.
10. The method for expanding keywords in multiple dimensions according to any one of claims 1 to 8, wherein the step of screening out key keywords from the first keywords comprises:
according to the importance of each first keyword to the APP to be expanded, selecting the first keywords with the importance greater than or equal to a first set importance threshold as key keywords;
the importance of the keywords to the APP to be expanded represents ranking information of the APP to be expanded in the search results of the keywords.
11. The method for expanding the keywords in multiple dimensions according to claim 1, wherein a third set number of keywords are selected from the first expanded keyword set and the second expanded keyword set to obtain expanded keywords of the APP to be expanded, and the method comprises the following steps:
obtaining a third expanded keyword set from the first expanded keyword set and the second expanded keyword set, and normalizing the first similarity score or the second similarity score of each keyword in the third expanded keyword set; acquiring a search index of each keyword in the third expanded keyword set, and calculating a final similarity score of each keyword in the third expanded keyword set according to the search index of each keyword and the similarity score after normalization processing; and selecting a third set number of keywords from the third expansion keyword set according to the final similarity score to obtain the expansion keywords of the APP to be expanded.
12. The method for multi-dimensional expansion of keywords according to claim 11,
obtaining a third expansion keyword set from the first expansion keyword set and the second expansion keyword set, wherein the third expansion keyword set comprises: the first extended keyword set and the second extended keyword set obtain a keyword set, and duplicate removal processing is carried out on the keyword set to obtain a third extended keyword set;
and/or the presence of a gas in the gas,
and normalizing the first similarity score or the second similarity score of each keyword in the third expanded keyword set by the following formula:
wherein s is i A first similarity score or a second similarity score, s, for the ith keyword in the third expanded keyword set min And s max Respectively representing the minimum and maximum of the similarity scores corresponding to the keywords in the third extended keyword set, s i ' is the similarity score after the normalization of the ith keyword in the third expanded keyword set;
and/or the presence of a gas in the gas,
calculating the final similarity score of the ith keyword in the third expanded keyword set according to the following formula:
score i (1) =α·s′ i +(1-α)p′;
wherein s is i 'is the similarity score after the ith keyword normalization in the third extended keyword set, p' is the search index modification value of the keyword, and the weight alpha belongs to [0,1]];
And/or the presence of a gas in the gas,
selecting a third set number of keywords from the third expansion keyword set according to the final similarity score to obtain expansion keywords of the APP to be expanded, wherein the expansion keywords comprise:
arranging the keywords in the third expanded keyword set in a reverse order according to the final similarity score, sequentially selecting the keywords with the set word number according to the arrangement result to obtain a set of keyword schemes, and repeating the steps to obtain a plurality of sets of keyword schemes; and determining the score of the ith set of keyword scheme as follows:
wherein i =1,2.. Q, q is the total number of keyword sets in the keyword set, m is the number of keywords in the keyword set, score i (1) And the final similarity score of the ith keyword in the third expanded keyword set.
13. A device for expanding keywords in multiple dimensions is characterized by comprising:
the first word expansion module is used for acquiring first keywords covered by the APP to be expanded in the application library platform and obtaining a second APP related to the APP to be expanded according to the APPs searched by the first keywords in the application library platform; acquiring second keywords covered by each second APP, and obtaining a third APP associated with the APP to be expanded according to the APPs searched by each second keyword; obtaining keywords covered by each third APP, and obtaining a first candidate keyword set according to the keywords covered by each third APP; determining the similarity of each third APP relative to the second APP set, determining the proportion of each keyword in the first candidate keyword set, and calculating a first similarity score of each keyword in the first candidate keyword set according to the similarity and the proportion; screening a first set number of keywords from a first candidate keyword set according to the first similarity score to obtain a first expanded keyword set;
the second expansion module is used for screening key words from the first key words and obtaining a fourth APP related to the APP to be expanded according to the APPs searched by the key words; obtaining a second candidate keyword set according to the keywords covered by the fourth APP; determining comprehensive similarity of each keyword in the second candidate keyword set relative to the key keyword set, determining proportion of each keyword in the second candidate keyword set, and calculating a second similarity score of each keyword in the second candidate keyword set according to the proportion and the comprehensive similarity; screening a second set number of keywords from the second candidate keyword set according to the second similarity score to obtain a second expanded keyword set; and (c) a second step of,
the screening module is used for selecting a third set number of keywords from the first expansion keyword set and the second expansion keyword set to obtain expansion keywords of the APP to be expanded;
wherein, the keywords covered by APP need to satisfy the conditions: and the search result of the keyword in the application library platform contains the APP.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.
15. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1 to 12 are performed when the program is executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711229068.7A CN108052554B (en) | 2017-11-29 | 2017-11-29 | The method and apparatus of various dimensions expansion keyword |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711229068.7A CN108052554B (en) | 2017-11-29 | 2017-11-29 | The method and apparatus of various dimensions expansion keyword |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108052554A true CN108052554A (en) | 2018-05-18 |
CN108052554B CN108052554B (en) | 2019-04-30 |
Family
ID=62121443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711229068.7A Active CN108052554B (en) | 2017-11-29 | 2017-11-29 | The method and apparatus of various dimensions expansion keyword |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108052554B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111435374A (en) * | 2019-01-11 | 2020-07-21 | 百度在线网络技术(北京)有限公司 | Display device and method for searching statistical data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049470A (en) * | 2012-09-12 | 2013-04-17 | 北京航空航天大学 | Opinion retrieval method based on emotional relevancy |
CN103455613A (en) * | 2013-09-06 | 2013-12-18 | 南京大学 | Interest aware service recommendation method based on MapReduce model |
CN103870505A (en) * | 2012-12-17 | 2014-06-18 | 阿里巴巴集团控股有限公司 | Query term recommending method and query term recommending system |
CN104915405A (en) * | 2015-06-02 | 2015-09-16 | 华东师范大学 | Microblog query expansion method based on multiple layers |
JP2016126567A (en) * | 2015-01-05 | 2016-07-11 | 日本放送協会 | Content recommendation device and program |
-
2017
- 2017-11-29 CN CN201711229068.7A patent/CN108052554B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049470A (en) * | 2012-09-12 | 2013-04-17 | 北京航空航天大学 | Opinion retrieval method based on emotional relevancy |
CN103870505A (en) * | 2012-12-17 | 2014-06-18 | 阿里巴巴集团控股有限公司 | Query term recommending method and query term recommending system |
CN103455613A (en) * | 2013-09-06 | 2013-12-18 | 南京大学 | Interest aware service recommendation method based on MapReduce model |
JP2016126567A (en) * | 2015-01-05 | 2016-07-11 | 日本放送協会 | Content recommendation device and program |
CN104915405A (en) * | 2015-06-02 | 2015-09-16 | 华东师范大学 | Microblog query expansion method based on multiple layers |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111435374A (en) * | 2019-01-11 | 2020-07-21 | 百度在线网络技术(北京)有限公司 | Display device and method for searching statistical data |
CN111435374B (en) * | 2019-01-11 | 2023-04-25 | 百度在线网络技术(北京)有限公司 | Display device and method for searching statistical data |
Also Published As
Publication number | Publication date |
---|---|
CN108052554B (en) | 2019-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112732883A (en) | Fuzzy matching method and device based on knowledge graph and computer equipment | |
WO2020177743A1 (en) | System and method for intelligent guided shopping | |
US11971892B2 (en) | Methods for stratified sampling-based query execution | |
CN109739975B (en) | Hot event extraction method and device, readable storage medium and electronic equipment | |
CN105302807B (en) | Method and device for acquiring information category | |
CN108182200B (en) | Keyword expansion method and device based on semantic similarity | |
CN111859004A (en) | Retrieval image acquisition method, device, equipment and readable storage medium | |
CN111091883B (en) | Medical text processing method, device, storage medium and equipment | |
CN112182264A (en) | Method, device and equipment for determining landmark information and readable storage medium | |
CN109462635B (en) | Information pushing method, computer readable storage medium and server | |
CN111160699A (en) | Expert recommendation method and system | |
CN111260419A (en) | Method and device for acquiring user attribute, computer equipment and storage medium | |
CN106708880B (en) | Topic associated word acquisition method and device | |
CN108052554B (en) | The method and apparatus of various dimensions expansion keyword | |
CN113407702A (en) | Method, system, computer and storage medium for quantifying employee cooperation strength | |
CN108170664B (en) | Key word expansion method and device based on key words | |
CN108170665B (en) | Keyword expansion method and device based on comprehensive similarity | |
CN109885758B (en) | Random walk recommendation method based on bipartite graph | |
Nguyen et al. | Efficient regular path query evaluation by splitting with unit-subquery cost matrix | |
CN104636366B (en) | Method and device for acquiring search result queue | |
CN113190763B (en) | Information recommendation method and system | |
CN113282807B (en) | Keyword expansion method, device, equipment and medium based on bipartite graph | |
CN108021640B (en) | Keyword expanding method and device based on associated application | |
CN114547286A (en) | Information searching method and device and electronic equipment | |
CN110175296B (en) | Node recommendation method and server in network graph and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |