WO2017157090A1 - 相似度挖掘方法及装置 - Google Patents
相似度挖掘方法及装置 Download PDFInfo
- Publication number
- WO2017157090A1 WO2017157090A1 PCT/CN2017/070225 CN2017070225W WO2017157090A1 WO 2017157090 A1 WO2017157090 A1 WO 2017157090A1 CN 2017070225 W CN2017070225 W CN 2017070225W WO 2017157090 A1 WO2017157090 A1 WO 2017157090A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- word
- brand
- search
- data
- similarity
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
- G06F16/90324—Query formulation using system suggestions
- G06F16/90328—Query formulation using system suggestions using search space presentation or visualization, e.g. category or range presentation and selection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/49—Data-driven translation using very large corpora, e.g. the web
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Definitions
- the present invention belongs to the field of information processing technology, and more particularly, to a similarity mining method and apparatus.
- the existing methods for mining brand similarity have all kinds of scoring manual evaluation methods and public opinion hotspot clustering methods.
- the manual evaluation method of each party generally collects brand words manually; let all parties, such as the public, educators, politicians, ordinary people, and enterprise elites, score the similarity between brands; Coordinates people from all walks of life to score points, use formulas to calculate brand similarity, and give rankings.
- this method requires a large number of questionnaires, and the labor cost is high; whether it is a paper questionnaire or an online questionnaire survey, the respondents often have a perfunctory attitude, resulting in inaccurate results, subjective calculation results; manual processing real-time Lower, there will be delayed response.
- the hotspot clustering method generally crawls the commentary viewpoint data containing the brand keywords on the social network, and uses the clustering method, such as the LDA topic clustering method, to add the formula to calculate the brand network heat.
- the method crawls user's comment data on the brand on a search engine or a social network such as Weibo, which involves how to quickly and efficiently crawl and A convenient form of storage technology; data cleansing of unstructured data from user comments, eliminating junk data, useless data, and interference data. After purification, another copy is stored in a structured form; the required structured data is read, and clustered by LDA topic clustering to obtain a probability matrix of each brand word.
- the network heat is more likely to cause fluctuations due to hot events, which can only represent a certain network heat, and can not represent a relatively stable brand similarity.
- a similarity mining method including: acquiring user behavior data and brand word data, wherein the user behavior data includes user search word data and user comment data; and searching for word data according to the user And pre-stored brand word data to obtain a search brand word; construct a corpus important vocabulary library for searching brand words according to the user behavior data; and use the corpus important vocabulary as an input of a word vector tool to perform word vector model training to obtain the search The word vector of the brand word; calculating the similarity between the search brand words according to the word vector of the search brand word.
- the similarity mining method further comprises: supplementing user comment data under the search brand word when the similarity between the search brand word and other search brand words is less than a preset threshold.
- a corpus important vocabulary for searching for brand words according to the user behavior data
- a corpus important vocabulary is constructed by filtering, merging, segmenting, and deactivating words of the user behavior data.
- the word vector model is trained to obtain the word vector of the search brand word, and word2vec is used as the word vector tool, and the HS-CBOW model is used to establish the vocabulary important vocabulary. Library word vector.
- the similarity mining method further comprises: classifying the search brand words according to the similarity between the search brand words, and displaying a brand relevance map of each category according to the classification result.
- a similarity mining device including: a data acquisition module, configured to acquire user behavior data and brand word data, wherein the user behavior data includes user search word data and user comment data; Searching a brand word mining module, configured to obtain a search brand word according to the user search word data and the pre-stored brand word data; the vocabulary library building module is configured to construct a corpus important vocabulary library for searching brand words according to the user behavior data; a training module, configured to use the corpus important vocabulary as an input of a word vector tool to perform a word vector model training to obtain a word vector of the search brand word; a similarity calculation module, configured to calculate a word vector according to the search brand word The similarity between the search brand words.
- the similarity mining device further includes: a data supplementing module, configured to acquire a similarity between the search brand words according to the distance between the search brand words.
- the vocabulary building module constructs a corpus important vocabulary by filtering, merging, segmenting, and deactivating words of the user behavior data.
- the training module uses word2vec as a word vector tool, and uses the HS-CBOW model to build a word vector of the corpus important vocabulary.
- the similarity mining device further includes: a display module, configured to classify the search brand words according to the similarity between the search brand words, and display brand relevance of each category according to the classification result.
- a display module configured to classify the search brand words according to the similarity between the search brand words, and display brand relevance of each category according to the classification result.
- the similarity mining method and device provided by the present invention use a clustering algorithm (such as word2vector) to calculate the similarity of brand words according to the user's search word data and the user's purchased comment data, and can automatically calculate the similarity between brands and reduce personnel. Cost, increase brand Recall rate to increase the recommended brand conversion rate.
- a clustering algorithm such as word2vector
- FIG. 1 is a flow chart showing the prior art scoring manual evaluation method
- FIG. 2 is a flow chart showing a public opinion clustering method in the prior art
- FIG. 3 is a flow chart showing a similarity mining method according to an embodiment of the present invention.
- FIG. 4 is a block diagram showing the structure of a similarity mining device according to an embodiment of the present invention.
- Figure 5 illustrates a different category of brand relevance maps in accordance with an embodiment of the present invention
- Fig. 6 is a schematic view showing the market structure of milk powder in a maternal and child brand according to an embodiment of the present invention.
- the present invention can be embodied in various forms, some of which are described below.
- FIG. 3 shows a flow chart of a similarity mining method according to an embodiment of the present invention. As shown in FIG. 3, the similarity mining method includes the following steps.
- step S01 user behavior data and brand word data are acquired, wherein the user behavior data includes user search word data and user comment data.
- the user's post-shopping comment text data, user search word data, and brand word data are obtained from the data warehouse through the hive query statement. After observing a large amount of data, after understanding the data, formulate filtering rules to filter out invalid garbage data. After the user's shopping, the text data segmentation and part-of-speech tagging, and the establishment of a proprietary vocabulary to enhance the effect of word segmentation and part-of-speech tagging.
- step S02 a search brand word is acquired based on the user search word data and the brand word data.
- the user search word data is filtered, and the search words not related to the brand are filtered out to obtain a search term related to the brand.
- the brand word data is extracted from the brand-related search words to obtain the search brand word.
- the user behavior data is filtered to obtain user search word data, wherein the user search word data includes brand words, and one of the user search word data is taken as an example, and the user search word data is: Bosideng, down jacket, light, thin According to the brand word data, the brand word, that is, the search brand word, is obtained from the user search word data. We can get the search brand word: Bosideng.
- step S03 a corpus important vocabulary for searching for brand words is constructed based on the user behavior data.
- the corpus important vocabulary is constructed by filtering, merging, segmenting, and deactivating words of the user behavior data.
- step S04 the corpus important vocabulary is used as a word vector tool to perform a word vector model training to obtain a word vector of the search brand word.
- the word vector model training it is implemented by means of the word2vec tool.
- the trained corpus important vocabulary contains user review data for search brand words, each of which includes search brand words and text describing the search brand words.
- the data is first filtered and combined, and after the data is cleaned, effective data is obtained.
- the HS-CBOW model with faster training and relatively easy engineering implementation is used to establish the word vector of the important vocabulary of the corpus.
- the word vector dimension generally speaking, the higher the dimension and the larger the text window, the better the feature representation of the word vector is, but the longer the word vector training takes longer, the more the training result storage space is occupied. Big. Faced with a large data set, the dimension is set to 100 dimensions, and the text window is selected as 5 to maintain faster computational efficiency. Finally, a word vector of a certain amount of vocabulary is obtained through training.
- Word2vec is a neural network toolkit released by Google.
- the main models used are CBOW (Contiuous Bag-of-Words) and Skip-Gram.
- the text vocabulary in the input can be transformed into a series of word vectors, which has been applied in many applications of natural language processing.
- a typical word2vec algorithm is to construct a vocabulary library with training text data, and then learn to obtain a vector representation of the vocabulary.
- the similarity between the search brand words is calculated according to the word vector of the search brand word.
- the similarity mining method further includes step S06.
- step S06 when the similarity between the search brand word and other search brand words is less than a preset threshold, the user comment data under the search brand word is supplemented.
- search brand word finds the relevant brand according to the calculated similarity, that is, when the similarity between one of the search brand words and the other search brand words is less than the preset threshold, indicating that the search brand word is not The relevant brand is found, and the user comment data under the search brand word is extracted according to the search brand word in which the similarity is not found, and the word vector of the search brand word is calculated again from step S01.
- the process is iterated multiple times until the number of iterations is greater than the set number of thresholds, thereby greatly increasing the recall rate of the brand similarity distance.
- Table 1 the similarity of several brands is exemplified, and the measurement of brand similarity is more intuitive.
- the similarity mining method further includes step S07.
- step S07 the search brand words are classified according to the similarity between the search brand words, and the brand relevance maps of the respective categories are displayed according to the classification results.
- the search brand words are classified according to the similarity between the search brand words, and when the similarity between the search brand words is greater than a certain threshold, they are classified into one category to form different categories.
- Figures 5a-b show the market structure of underwear in the clothing brand.
- Figure 6a-6b shows the market structure of the milk powder in the maternal and child brand. It can recommend the brand with high similarity to the user according to the brand correlation map of each category. Brand positioning strategy.
- the similarity mining method provided by the present invention uses a clustering algorithm (such as word2vector) to calculate the similarity of brand words according to the user's search word data and the user's purchased comment data, and can automatically calculate the similarity between brands and reduce the personnel cost. Increase brand recall rate and increase recommended brand conversion rate.
- a clustering algorithm such as word2vector
- the similarity mining device includes a data acquisition module 101, a search brand word mining module 102, a vocabulary library construction module 103, a training module 104, and a similarity calculation module 105.
- the data acquisition module 101 is configured to acquire user behavior data and brand word data, wherein the user behavior data includes user search word data and user comment data.
- the data obtaining module 101 obtains the user's post-shopping comment text data, user search word data, and brand word data from the data warehouse through the hive query statement. After observing a large amount of data, after understanding the data, formulate filtering rules to filter out invalid garbage data. After the user's shopping, the text data segmentation and part-of-speech tagging, and the establishment of a proprietary vocabulary to enhance the effect of word segmentation and part-of-speech tagging.
- the search brand word mining module 102 is configured to obtain a search brand word according to the user search word data and the pre-stored brand word data.
- the search brand word mining module 102 filters the user search word data, filters out the search words that are not related to the brand, and obtains the search words related to the brand. According to the brand word data, the brand word is extracted from the brand-related search words to obtain the search brand word.
- the search brand word mining module 102 filters the user behavior data to obtain user search word data, wherein the user search word data includes a brand word, and one of the user search word data is taken as an example, and the user search word data is: Bosideng , down jacket, light, thin, according to the brand word data from the user search word data to obtain the brand word that is the search brand word.
- the search brand word Bosideng.
- the vocabulary building module 103 is configured to construct a corpus important vocabulary for searching for brand words based on the user behavior data.
- the vocabulary library building module 103 passes the number of the user behaviors. According to filtering, merging, word segmentation, and stop words to build a vocabulary important vocabulary.
- the training module 104 is configured to perform the word vector model training as the input of the corpus important vocabulary as a word vector tool to obtain the word vector of the search brand word.
- the training module 104 is implemented by means of the word2vec tool.
- the trained corpus important vocabulary contains user review data for search brand words, each of which includes search brand words and text describing the search brand words.
- the data is first filtered and combined, and after the data is cleaned, effective data is obtained.
- the HS-CBOW model with faster training and relatively easy engineering implementation is used to establish the word vector of the important vocabulary of the corpus.
- Word2vec is used as a word vector tool, and the HS-CBOW model is used to establish the word vector of the important vocabulary of the corpus.
- the word vector dimension is set to 100 dimensions and the text window is set to 5.
- the word vector dimension generally speaking, the higher the dimension and the larger the text window, the better the feature representation of the word vector is, but the longer the word vector training takes longer, the more the training result storage space is occupied. Big. Faced with a large data set, the dimension is set to 100 dimensions, and the text window is selected as 5 to maintain faster computational efficiency. Finally, a word vector of a certain amount of vocabulary is obtained through training.
- the similarity calculation module 105 is configured to calculate the similarity between the search brand words according to the word vector of the search brand word.
- the similarity mining device further includes a data supplemental module Block 106, for supplementing user comment data under the search brand word when the similarity between the search brand word and other search brand words is less than a preset threshold.
- search brand word finds the relevant brand according to the calculated similarity, that is, when the similarity between one of the search brand words and the other search brand words is less than the preset threshold, indicating that the search brand word is not The relevant brand is found, and the user comment data under the search brand word is extracted according to the search brand word in which the similarity is not found, and the word vector of the search brand word is calculated again from step S01.
- the process is iterated multiple times until the number of iterations is greater than the set number of thresholds, thereby greatly increasing the recall rate of the brand similarity distance.
- the similarity mining device further includes a display module 107, configured to classify the search brand words according to the similarity between the search brand words, and display each according to the classification result.
- the brand relevance map for the category is not limited to the display module 107.
- the search brand words are classified according to the similarity between the search brand words, and when the similarity between the search brand words is greater than a certain threshold, they are classified into one category to form different categories.
- Figures 5a-b show the market structure of underwear in the clothing brand.
- Figure 6a-6b shows the market structure of the milk powder in the maternal and child brand. It can recommend the brand with high similarity to the user according to the brand correlation map of each category. Brand positioning strategy.
- the similarity mining device uses a clustering algorithm (such as word2vector) to calculate the similarity of brand words according to the user's search word data and the user's purchased comment data, and can automatically calculate the similarity between the brands and reduce the personnel cost. Increase brand recall rate and increase recommended brand conversion rate.
- a clustering algorithm such as word2vector
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
图5示出了根据本发明实施例的不同类别的品牌相关性图;
图6示出了根据本发明实施例的母婴品牌中奶粉的市场结构的示意图。
品牌1 | 品牌2 | 相似度 |
GXG | 杰克琼斯 | 80% |
恒源祥 | 南极人 | 85% |
恒源祥 | 杰克琼斯 | 75% |
恒源祥 | 麦当劳 | 30% |
Claims (10)
- 一种相似度挖掘方法,包括:获取用户行为数据以及品牌词数据,其中,所述用户行为数据包括用户搜索词数据和用户评论数据;根据所述用户搜索词数据以及预存的品牌词数据获取搜索品牌词;根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库;将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量;根据所述搜索品牌词的词向量计算所述搜索品牌词之间的相似度。
- 根据权利要求1所述的方法,其中,还包括:在所述搜索品牌词与其他搜索品牌词之间的相似度都小于预设阈值时,补充所述搜索品牌词下的用户评论数据。
- 根据权利要求1所述的方法,其中,在根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库中,通过对所述用户行为数据进行过滤、合并、分词、去停用词以构建语料重要词汇库。
- 根据权利要求1所述的方法,其中,在将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量中,采用word2vec作为词向量工具,并采用HS-CBOW模型建立语料重要词汇库的词向量。
- 根据权利要求1所述的方法,其中,还包括:根据所述搜索品牌词之间的相似度将所述搜索品牌词进行分类,并根据所述分类结果展示各个类别的品牌相关性图。
- 一种相似度挖掘装置,包括:数据获取模块,用于获取用户行为数据以及品牌词数据,其中,所述用户行为数据包括用户搜索词数据和用户评论数据;搜索品牌词挖掘模块,用于根据所述用户搜索词数据以及预存的品牌词数据获取搜索品牌词;词汇库构建模块,用于根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库;训练模块,用于将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量;相似度计算模块,用于根据所述搜索品牌词的词向量计算所述搜索品牌词之间的相似度。
- 根据权利要求6所述的装置,其中,还包括:数据补充模块,用于在所述搜索品牌词与其他搜索品牌词之间的相似度都小于预设阈值时,补充所述搜索品牌词下的用户评论数据。
- 根据权利要求6所述的装置,其中,所述词汇库构建模块通过对所述用户行为数据进行过滤、合并、分词、去停用词以构建语料重要词汇库。
- 根据权利要求6所述的装置,其中,所述训练模块采用word2vec作为词向量工具,并采用HS-CBOW模型建立语料重要词汇库的词向量。
- 根据权利要求6所述的装置,其中,还包括:展示模块,用于根据所述搜索品牌词之间的相似度将所述搜索品牌词进行分类,并根据所述分类结果展示各个类别的品牌相关性图。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/085,893 US11017043B2 (en) | 2016-03-15 | 2017-01-05 | Similarity mining method and device |
RU2018135971A RU2700191C1 (ru) | 2016-03-15 | 2017-01-05 | Способ и устройство выявления сходства |
AU2017232659A AU2017232659A1 (en) | 2016-03-15 | 2017-01-05 | Similarity mining method and device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610146542.9A CN107193832A (zh) | 2016-03-15 | 2016-03-15 | 相似度挖掘方法及装置 |
CN201610146542.9 | 2016-03-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017157090A1 true WO2017157090A1 (zh) | 2017-09-21 |
Family
ID=59850739
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/070225 WO2017157090A1 (zh) | 2016-03-15 | 2017-01-05 | 相似度挖掘方法及装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US11017043B2 (zh) |
CN (1) | CN107193832A (zh) |
AU (1) | AU2017232659A1 (zh) |
RU (1) | RU2700191C1 (zh) |
WO (1) | WO2017157090A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108038133A (zh) * | 2017-11-20 | 2018-05-15 | 青岛鹏海软件有限公司 | 个性化推荐方法 |
CN108416611A (zh) * | 2018-01-31 | 2018-08-17 | 佛山市顺德区中山大学研究院 | 一种超市路径推荐系统及其方法 |
CN109033232A (zh) * | 2018-07-01 | 2018-12-18 | 东莞市华睿电子科技有限公司 | 一种云平台与共享设备相结合的社交用户推荐方法 |
CN112036120A (zh) * | 2020-08-31 | 2020-12-04 | 上海硕恩网络科技股份有限公司 | 一种技能短语抽取方法 |
CN112667919A (zh) * | 2020-12-28 | 2021-04-16 | 山东大学 | 一种基于文本数据的个性化社区矫正方案推荐系统及其工作方法 |
CN114201962A (zh) * | 2021-12-03 | 2022-03-18 | 中国中医科学院中医药信息研究所 | 一种论文新颖性分析方法、装置、介质和设备 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108763205B (zh) * | 2018-05-21 | 2022-05-03 | 创新先进技术有限公司 | 一种品牌别名识别方法、装置及电子设备 |
CN110874609B (zh) * | 2018-09-04 | 2022-08-16 | 武汉斗鱼网络科技有限公司 | 基于用户行为的用户聚类方法、存储介质、设备及系统 |
CN109635383A (zh) * | 2018-11-28 | 2019-04-16 | 优信拍(北京)信息科技有限公司 | 一种基于word2vec的车系相关度确定的方法及装置 |
CN113673216B (zh) * | 2021-10-20 | 2022-02-01 | 支付宝(杭州)信息技术有限公司 | 文本侵权检测方法、装置和电子设备 |
CN116308683B (zh) * | 2023-05-17 | 2023-08-04 | 武汉纺织大学 | 基于知识图谱的服装品牌定位推荐方法、设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101206674A (zh) * | 2007-12-25 | 2008-06-25 | 北京科文书业信息技术有限公司 | 以商品为媒介的增强型相关搜索系统及其方法 |
CN104778161A (zh) * | 2015-04-30 | 2015-07-15 | 车智互联(北京)科技有限公司 | 基于Word2Vec和Query log抽取关键词方法 |
CN105095430A (zh) * | 2015-07-22 | 2015-11-25 | 深圳证券信息有限公司 | 构建词语网络及抽取关键词的方法和装置 |
CN105279288A (zh) * | 2015-12-04 | 2016-01-27 | 深圳大学 | 一种基于深度神经网络的在线内容推荐方法 |
US20160065534A1 (en) * | 2011-07-06 | 2016-03-03 | Nominum, Inc. | System for correlation of domain names |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6529892B1 (en) * | 1999-08-04 | 2003-03-04 | Illinois, University Of | Apparatus, method and product for multi-attribute drug comparison |
US7925044B2 (en) * | 2006-02-01 | 2011-04-12 | Markmonitor Inc. | Detecting online abuse in images |
US7606810B1 (en) * | 2006-04-27 | 2009-10-20 | Colin Jeavons | Editorial related advertising content delivery system |
US20080270203A1 (en) * | 2007-04-27 | 2008-10-30 | Corporation Service Company | Assessment of Risk to Domain Names, Brand Names and the Like |
US7873635B2 (en) * | 2007-05-31 | 2011-01-18 | Microsoft Corporation | Search ranger system and double-funnel model for search spam analyses and browser protection |
US9438733B2 (en) * | 2008-09-08 | 2016-09-06 | Invoca, Inc. | Methods and systems for data transfer and campaign management |
US9496003B2 (en) * | 2008-09-08 | 2016-11-15 | Apple Inc. | System and method for playlist generation based on similarity data |
KR101078864B1 (ko) * | 2009-03-26 | 2011-11-02 | 한국과학기술원 | 질의/문서 주제 범주 변화 분석 시스템 및 그 방법과 이를 이용한 질의 확장 기반 정보 검색 시스템 및 그 방법 |
US20130014136A1 (en) * | 2011-07-06 | 2013-01-10 | Manish Bhatia | Audience Atmospherics Monitoring Platform Methods |
US20120144499A1 (en) * | 2010-12-02 | 2012-06-07 | Sky Castle Global Limited | System to inform about trademarks similar to provided input |
WO2013124521A1 (en) * | 2012-02-22 | 2013-08-29 | Nokia Corporation | A system and a method for determining context |
US9406072B2 (en) * | 2012-03-29 | 2016-08-02 | Spotify Ab | Demographic and media preference prediction using media content data analysis |
US20150242525A1 (en) * | 2014-02-26 | 2015-08-27 | Pixured, Inc. | System for referring to and/or embedding posts within other post and posts within any part of another post |
US10102669B2 (en) * | 2014-09-08 | 2018-10-16 | Apple Inc. | Density sampling map labels |
US9767409B1 (en) * | 2015-03-30 | 2017-09-19 | Amazon Technologies, Inc. | Latent feature based tag routing |
US20170075998A1 (en) * | 2015-09-14 | 2017-03-16 | Ebay Inc. | Assessing translation quality |
US11263664B2 (en) * | 2015-12-30 | 2022-03-01 | Yahoo Assets Llc | Computerized system and method for augmenting search terms for increased efficiency and effectiveness in identifying content |
-
2016
- 2016-03-15 CN CN201610146542.9A patent/CN107193832A/zh active Pending
-
2017
- 2017-01-05 WO PCT/CN2017/070225 patent/WO2017157090A1/zh active Application Filing
- 2017-01-05 US US16/085,893 patent/US11017043B2/en active Active
- 2017-01-05 AU AU2017232659A patent/AU2017232659A1/en not_active Abandoned
- 2017-01-05 RU RU2018135971A patent/RU2700191C1/ru active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101206674A (zh) * | 2007-12-25 | 2008-06-25 | 北京科文书业信息技术有限公司 | 以商品为媒介的增强型相关搜索系统及其方法 |
US20160065534A1 (en) * | 2011-07-06 | 2016-03-03 | Nominum, Inc. | System for correlation of domain names |
CN104778161A (zh) * | 2015-04-30 | 2015-07-15 | 车智互联(北京)科技有限公司 | 基于Word2Vec和Query log抽取关键词方法 |
CN105095430A (zh) * | 2015-07-22 | 2015-11-25 | 深圳证券信息有限公司 | 构建词语网络及抽取关键词的方法和装置 |
CN105279288A (zh) * | 2015-12-04 | 2016-01-27 | 深圳大学 | 一种基于深度神经网络的在线内容推荐方法 |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108038133A (zh) * | 2017-11-20 | 2018-05-15 | 青岛鹏海软件有限公司 | 个性化推荐方法 |
CN108416611A (zh) * | 2018-01-31 | 2018-08-17 | 佛山市顺德区中山大学研究院 | 一种超市路径推荐系统及其方法 |
CN108416611B (zh) * | 2018-01-31 | 2020-12-04 | 佛山市顺德区中山大学研究院 | 一种超市路径推荐系统及其方法 |
CN109033232A (zh) * | 2018-07-01 | 2018-12-18 | 东莞市华睿电子科技有限公司 | 一种云平台与共享设备相结合的社交用户推荐方法 |
CN109033232B (zh) * | 2018-07-01 | 2021-12-28 | 东莞市华睿电子科技有限公司 | 一种云平台与共享设备相结合的社交用户推荐方法 |
CN112036120A (zh) * | 2020-08-31 | 2020-12-04 | 上海硕恩网络科技股份有限公司 | 一种技能短语抽取方法 |
CN112667919A (zh) * | 2020-12-28 | 2021-04-16 | 山东大学 | 一种基于文本数据的个性化社区矫正方案推荐系统及其工作方法 |
CN114201962A (zh) * | 2021-12-03 | 2022-03-18 | 中国中医科学院中医药信息研究所 | 一种论文新颖性分析方法、装置、介质和设备 |
CN114201962B (zh) * | 2021-12-03 | 2023-07-25 | 中国中医科学院中医药信息研究所 | 一种论文新颖性分析方法、装置、介质和设备 |
Also Published As
Publication number | Publication date |
---|---|
RU2700191C1 (ru) | 2019-09-13 |
US11017043B2 (en) | 2021-05-25 |
US20200301982A1 (en) | 2020-09-24 |
AU2017232659A1 (en) | 2018-10-11 |
CN107193832A (zh) | 2017-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017157090A1 (zh) | 相似度挖掘方法及装置 | |
McKenzie et al. | Weighted multi-attribute matching of user-generated points of interest | |
KR101423544B1 (ko) | 시맨틱 토픽 추출 장치 및 방법 | |
WO2019041521A1 (zh) | 用户关键词提取装置、方法及计算机可读存储介质 | |
JP6381775B2 (ja) | 情報処理システム及び情報処理方法 | |
US20170109633A1 (en) | Comment-comment and comment-document analysis of documents | |
CN109816482B (zh) | 电商平台的知识图谱构建方法、装置、设备及存储介质 | |
WO2016196857A1 (en) | Entity classification and/or relationship identification | |
CN108805598B (zh) | 相似度信息确定方法、服务器及计算机可读存储介质 | |
CN106959966A (zh) | 一种信息推荐方法及系统 | |
JP2017508214A (ja) | 検索推奨の提供 | |
CN103425635A (zh) | 一种答案推荐方法和装置 | |
CN106096609B (zh) | 一种基于ocr的商品查询关键字自动生成方法 | |
CN109783614B (zh) | 一种社交网络待发布文本的差分隐私泄露检测方法及系统 | |
CN106294425A (zh) | 商品相关网络文章之自动图文摘要方法及系统 | |
CN107015976B (zh) | 业务处理方法、数据处理方法及装置 | |
CN106294744A (zh) | 兴趣识别方法及系统 | |
TW201241773A (en) | Method and apparatus of determining product category information | |
CN104462327B (zh) | 语句相似度的计算、搜索处理方法及装置 | |
TWI705411B (zh) | 社交業務特徵用戶的識別方法和裝置 | |
CN109597990B (zh) | 一种社会热点与商品品类的匹配方法 | |
CN103593474A (zh) | 基于深度学习的图像检索排序方法 | |
CN105023178B (zh) | 一种基于本体的电子商务推荐方法 | |
CN111191099B (zh) | 一种基于社交媒体的用户活动类型识别方法 | |
CN103577472A (zh) | 个人信息获得、推定、商品的分类、检索方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2017232659 Country of ref document: AU Date of ref document: 20170105 Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17765632 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.01.2019) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17765632 Country of ref document: EP Kind code of ref document: A1 |