WO2017157090A1 - 相似度挖掘方法及装置 - Google Patents

相似度挖掘方法及装置 Download PDF

Info

Publication number
WO2017157090A1
WO2017157090A1 PCT/CN2017/070225 CN2017070225W WO2017157090A1 WO 2017157090 A1 WO2017157090 A1 WO 2017157090A1 CN 2017070225 W CN2017070225 W CN 2017070225W WO 2017157090 A1 WO2017157090 A1 WO 2017157090A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
brand
search
data
similarity
Prior art date
Application number
PCT/CN2017/070225
Other languages
English (en)
French (fr)
Inventor
黄运杜
陈海勇
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Priority to US16/085,893 priority Critical patent/US11017043B2/en
Priority to RU2018135971A priority patent/RU2700191C1/ru
Priority to AU2017232659A priority patent/AU2017232659A1/en
Publication of WO2017157090A1 publication Critical patent/WO2017157090A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90324Query formulation using system suggestions
    • G06F16/90328Query formulation using system suggestions using search space presentation or visualization, e.g. category or range presentation and selection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/49Data-driven translation using very large corpora, e.g. the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Definitions

  • the present invention belongs to the field of information processing technology, and more particularly, to a similarity mining method and apparatus.
  • the existing methods for mining brand similarity have all kinds of scoring manual evaluation methods and public opinion hotspot clustering methods.
  • the manual evaluation method of each party generally collects brand words manually; let all parties, such as the public, educators, politicians, ordinary people, and enterprise elites, score the similarity between brands; Coordinates people from all walks of life to score points, use formulas to calculate brand similarity, and give rankings.
  • this method requires a large number of questionnaires, and the labor cost is high; whether it is a paper questionnaire or an online questionnaire survey, the respondents often have a perfunctory attitude, resulting in inaccurate results, subjective calculation results; manual processing real-time Lower, there will be delayed response.
  • the hotspot clustering method generally crawls the commentary viewpoint data containing the brand keywords on the social network, and uses the clustering method, such as the LDA topic clustering method, to add the formula to calculate the brand network heat.
  • the method crawls user's comment data on the brand on a search engine or a social network such as Weibo, which involves how to quickly and efficiently crawl and A convenient form of storage technology; data cleansing of unstructured data from user comments, eliminating junk data, useless data, and interference data. After purification, another copy is stored in a structured form; the required structured data is read, and clustered by LDA topic clustering to obtain a probability matrix of each brand word.
  • the network heat is more likely to cause fluctuations due to hot events, which can only represent a certain network heat, and can not represent a relatively stable brand similarity.
  • a similarity mining method including: acquiring user behavior data and brand word data, wherein the user behavior data includes user search word data and user comment data; and searching for word data according to the user And pre-stored brand word data to obtain a search brand word; construct a corpus important vocabulary library for searching brand words according to the user behavior data; and use the corpus important vocabulary as an input of a word vector tool to perform word vector model training to obtain the search The word vector of the brand word; calculating the similarity between the search brand words according to the word vector of the search brand word.
  • the similarity mining method further comprises: supplementing user comment data under the search brand word when the similarity between the search brand word and other search brand words is less than a preset threshold.
  • a corpus important vocabulary for searching for brand words according to the user behavior data
  • a corpus important vocabulary is constructed by filtering, merging, segmenting, and deactivating words of the user behavior data.
  • the word vector model is trained to obtain the word vector of the search brand word, and word2vec is used as the word vector tool, and the HS-CBOW model is used to establish the vocabulary important vocabulary. Library word vector.
  • the similarity mining method further comprises: classifying the search brand words according to the similarity between the search brand words, and displaying a brand relevance map of each category according to the classification result.
  • a similarity mining device including: a data acquisition module, configured to acquire user behavior data and brand word data, wherein the user behavior data includes user search word data and user comment data; Searching a brand word mining module, configured to obtain a search brand word according to the user search word data and the pre-stored brand word data; the vocabulary library building module is configured to construct a corpus important vocabulary library for searching brand words according to the user behavior data; a training module, configured to use the corpus important vocabulary as an input of a word vector tool to perform a word vector model training to obtain a word vector of the search brand word; a similarity calculation module, configured to calculate a word vector according to the search brand word The similarity between the search brand words.
  • the similarity mining device further includes: a data supplementing module, configured to acquire a similarity between the search brand words according to the distance between the search brand words.
  • the vocabulary building module constructs a corpus important vocabulary by filtering, merging, segmenting, and deactivating words of the user behavior data.
  • the training module uses word2vec as a word vector tool, and uses the HS-CBOW model to build a word vector of the corpus important vocabulary.
  • the similarity mining device further includes: a display module, configured to classify the search brand words according to the similarity between the search brand words, and display brand relevance of each category according to the classification result.
  • a display module configured to classify the search brand words according to the similarity between the search brand words, and display brand relevance of each category according to the classification result.
  • the similarity mining method and device provided by the present invention use a clustering algorithm (such as word2vector) to calculate the similarity of brand words according to the user's search word data and the user's purchased comment data, and can automatically calculate the similarity between brands and reduce personnel. Cost, increase brand Recall rate to increase the recommended brand conversion rate.
  • a clustering algorithm such as word2vector
  • FIG. 1 is a flow chart showing the prior art scoring manual evaluation method
  • FIG. 2 is a flow chart showing a public opinion clustering method in the prior art
  • FIG. 3 is a flow chart showing a similarity mining method according to an embodiment of the present invention.
  • FIG. 4 is a block diagram showing the structure of a similarity mining device according to an embodiment of the present invention.
  • Figure 5 illustrates a different category of brand relevance maps in accordance with an embodiment of the present invention
  • Fig. 6 is a schematic view showing the market structure of milk powder in a maternal and child brand according to an embodiment of the present invention.
  • the present invention can be embodied in various forms, some of which are described below.
  • FIG. 3 shows a flow chart of a similarity mining method according to an embodiment of the present invention. As shown in FIG. 3, the similarity mining method includes the following steps.
  • step S01 user behavior data and brand word data are acquired, wherein the user behavior data includes user search word data and user comment data.
  • the user's post-shopping comment text data, user search word data, and brand word data are obtained from the data warehouse through the hive query statement. After observing a large amount of data, after understanding the data, formulate filtering rules to filter out invalid garbage data. After the user's shopping, the text data segmentation and part-of-speech tagging, and the establishment of a proprietary vocabulary to enhance the effect of word segmentation and part-of-speech tagging.
  • step S02 a search brand word is acquired based on the user search word data and the brand word data.
  • the user search word data is filtered, and the search words not related to the brand are filtered out to obtain a search term related to the brand.
  • the brand word data is extracted from the brand-related search words to obtain the search brand word.
  • the user behavior data is filtered to obtain user search word data, wherein the user search word data includes brand words, and one of the user search word data is taken as an example, and the user search word data is: Bosideng, down jacket, light, thin According to the brand word data, the brand word, that is, the search brand word, is obtained from the user search word data. We can get the search brand word: Bosideng.
  • step S03 a corpus important vocabulary for searching for brand words is constructed based on the user behavior data.
  • the corpus important vocabulary is constructed by filtering, merging, segmenting, and deactivating words of the user behavior data.
  • step S04 the corpus important vocabulary is used as a word vector tool to perform a word vector model training to obtain a word vector of the search brand word.
  • the word vector model training it is implemented by means of the word2vec tool.
  • the trained corpus important vocabulary contains user review data for search brand words, each of which includes search brand words and text describing the search brand words.
  • the data is first filtered and combined, and after the data is cleaned, effective data is obtained.
  • the HS-CBOW model with faster training and relatively easy engineering implementation is used to establish the word vector of the important vocabulary of the corpus.
  • the word vector dimension generally speaking, the higher the dimension and the larger the text window, the better the feature representation of the word vector is, but the longer the word vector training takes longer, the more the training result storage space is occupied. Big. Faced with a large data set, the dimension is set to 100 dimensions, and the text window is selected as 5 to maintain faster computational efficiency. Finally, a word vector of a certain amount of vocabulary is obtained through training.
  • Word2vec is a neural network toolkit released by Google.
  • the main models used are CBOW (Contiuous Bag-of-Words) and Skip-Gram.
  • the text vocabulary in the input can be transformed into a series of word vectors, which has been applied in many applications of natural language processing.
  • a typical word2vec algorithm is to construct a vocabulary library with training text data, and then learn to obtain a vector representation of the vocabulary.
  • the similarity between the search brand words is calculated according to the word vector of the search brand word.
  • the similarity mining method further includes step S06.
  • step S06 when the similarity between the search brand word and other search brand words is less than a preset threshold, the user comment data under the search brand word is supplemented.
  • search brand word finds the relevant brand according to the calculated similarity, that is, when the similarity between one of the search brand words and the other search brand words is less than the preset threshold, indicating that the search brand word is not The relevant brand is found, and the user comment data under the search brand word is extracted according to the search brand word in which the similarity is not found, and the word vector of the search brand word is calculated again from step S01.
  • the process is iterated multiple times until the number of iterations is greater than the set number of thresholds, thereby greatly increasing the recall rate of the brand similarity distance.
  • Table 1 the similarity of several brands is exemplified, and the measurement of brand similarity is more intuitive.
  • the similarity mining method further includes step S07.
  • step S07 the search brand words are classified according to the similarity between the search brand words, and the brand relevance maps of the respective categories are displayed according to the classification results.
  • the search brand words are classified according to the similarity between the search brand words, and when the similarity between the search brand words is greater than a certain threshold, they are classified into one category to form different categories.
  • Figures 5a-b show the market structure of underwear in the clothing brand.
  • Figure 6a-6b shows the market structure of the milk powder in the maternal and child brand. It can recommend the brand with high similarity to the user according to the brand correlation map of each category. Brand positioning strategy.
  • the similarity mining method provided by the present invention uses a clustering algorithm (such as word2vector) to calculate the similarity of brand words according to the user's search word data and the user's purchased comment data, and can automatically calculate the similarity between brands and reduce the personnel cost. Increase brand recall rate and increase recommended brand conversion rate.
  • a clustering algorithm such as word2vector
  • the similarity mining device includes a data acquisition module 101, a search brand word mining module 102, a vocabulary library construction module 103, a training module 104, and a similarity calculation module 105.
  • the data acquisition module 101 is configured to acquire user behavior data and brand word data, wherein the user behavior data includes user search word data and user comment data.
  • the data obtaining module 101 obtains the user's post-shopping comment text data, user search word data, and brand word data from the data warehouse through the hive query statement. After observing a large amount of data, after understanding the data, formulate filtering rules to filter out invalid garbage data. After the user's shopping, the text data segmentation and part-of-speech tagging, and the establishment of a proprietary vocabulary to enhance the effect of word segmentation and part-of-speech tagging.
  • the search brand word mining module 102 is configured to obtain a search brand word according to the user search word data and the pre-stored brand word data.
  • the search brand word mining module 102 filters the user search word data, filters out the search words that are not related to the brand, and obtains the search words related to the brand. According to the brand word data, the brand word is extracted from the brand-related search words to obtain the search brand word.
  • the search brand word mining module 102 filters the user behavior data to obtain user search word data, wherein the user search word data includes a brand word, and one of the user search word data is taken as an example, and the user search word data is: Bosideng , down jacket, light, thin, according to the brand word data from the user search word data to obtain the brand word that is the search brand word.
  • the search brand word Bosideng.
  • the vocabulary building module 103 is configured to construct a corpus important vocabulary for searching for brand words based on the user behavior data.
  • the vocabulary library building module 103 passes the number of the user behaviors. According to filtering, merging, word segmentation, and stop words to build a vocabulary important vocabulary.
  • the training module 104 is configured to perform the word vector model training as the input of the corpus important vocabulary as a word vector tool to obtain the word vector of the search brand word.
  • the training module 104 is implemented by means of the word2vec tool.
  • the trained corpus important vocabulary contains user review data for search brand words, each of which includes search brand words and text describing the search brand words.
  • the data is first filtered and combined, and after the data is cleaned, effective data is obtained.
  • the HS-CBOW model with faster training and relatively easy engineering implementation is used to establish the word vector of the important vocabulary of the corpus.
  • Word2vec is used as a word vector tool, and the HS-CBOW model is used to establish the word vector of the important vocabulary of the corpus.
  • the word vector dimension is set to 100 dimensions and the text window is set to 5.
  • the word vector dimension generally speaking, the higher the dimension and the larger the text window, the better the feature representation of the word vector is, but the longer the word vector training takes longer, the more the training result storage space is occupied. Big. Faced with a large data set, the dimension is set to 100 dimensions, and the text window is selected as 5 to maintain faster computational efficiency. Finally, a word vector of a certain amount of vocabulary is obtained through training.
  • the similarity calculation module 105 is configured to calculate the similarity between the search brand words according to the word vector of the search brand word.
  • the similarity mining device further includes a data supplemental module Block 106, for supplementing user comment data under the search brand word when the similarity between the search brand word and other search brand words is less than a preset threshold.
  • search brand word finds the relevant brand according to the calculated similarity, that is, when the similarity between one of the search brand words and the other search brand words is less than the preset threshold, indicating that the search brand word is not The relevant brand is found, and the user comment data under the search brand word is extracted according to the search brand word in which the similarity is not found, and the word vector of the search brand word is calculated again from step S01.
  • the process is iterated multiple times until the number of iterations is greater than the set number of thresholds, thereby greatly increasing the recall rate of the brand similarity distance.
  • the similarity mining device further includes a display module 107, configured to classify the search brand words according to the similarity between the search brand words, and display each according to the classification result.
  • the brand relevance map for the category is not limited to the display module 107.
  • the search brand words are classified according to the similarity between the search brand words, and when the similarity between the search brand words is greater than a certain threshold, they are classified into one category to form different categories.
  • Figures 5a-b show the market structure of underwear in the clothing brand.
  • Figure 6a-6b shows the market structure of the milk powder in the maternal and child brand. It can recommend the brand with high similarity to the user according to the brand correlation map of each category. Brand positioning strategy.
  • the similarity mining device uses a clustering algorithm (such as word2vector) to calculate the similarity of brand words according to the user's search word data and the user's purchased comment data, and can automatically calculate the similarity between the brands and reduce the personnel cost. Increase brand recall rate and increase recommended brand conversion rate.
  • a clustering algorithm such as word2vector

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种相似度挖掘方法及装置,该方法包括:获取用户行为数据以及品牌词数据,其中,所述用户行为数据包括用户搜索词数据和用户评论数据(S01);根据所述用户搜索词数据以及预存的品牌词数据获取搜索品牌词(S02);根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库(S03);将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量(S04);根据所述搜索品牌词的词向量计算所述搜索品牌词之间的相似度(S05)。该方法可以自动计算品牌间的相似度,降低人员成本,增加品牌召回率,提高推荐品牌转化率。

Description

相似度挖掘方法及装置 技术领域
本发明属于信息处理技术领域,更具体地,涉及一种相似度挖掘方法和装置。
背景技术
在这个经济快速发展的时代,人们对物质的需求也极大的提高了。由于品牌下的产品比较稳定的风格、功能、效果、口味等,人们往往会使用自己熟悉的品牌。这使得推荐系统中推荐其他品牌给用户的话,会造成一定的阻力;也使得新品牌公司难以推广本品牌的产品。所以发明一种自动化低成本的挖掘品牌相似度的方法,对于推荐系统中推荐类似风格、功能、效果、口味等品牌给用户,让用户更容易接受推荐的品牌;对于构建市场的品牌生态结构,让公司更有针对性的制定战略方案,都具有重大的意义。
现有的品牌相似度挖掘方法有各方打分人工评估法和舆论热点聚类法。其中,如图1所示,各方打分人工评估法一般由人工收集品牌词;让各方,如社会人士、教育人士、政界人士、普通群众、企业精英等对各个品牌之间相似度打分;统筹各界人士打分,使用公式计算品牌相似度,给出排名。然而该方法需要大量的问卷调查,人力成本高;无论是纸质问卷还是网络问卷调查,被调查人往往会有敷衍了事的态度应对,导致结果不准确,计算结果比较主观;人工处理实时性较低,会有延迟反应。
如图2所示,舆论热点聚类法一般是在社交网络上爬取包含品牌关键词的评论观点数据,使用聚类法,如LDA主题聚类法,再加入公式来计算品牌网络热度。该方法在搜索引擎上或者微博等社交网络上爬取用户对品牌的评论数据,其中涉及到如何快速高效的爬取并且以 方便读取的形式存储的技术;对用户评论的非结构化数据进行数据清洗,剔除垃圾数据、无用数据和干扰数据。再提纯后,以结构化的形式另外存储一份;读取需要的结构化数据,用LDA主题聚类法聚类,得到每个品牌词的概率矩阵。使用公式计算品牌之间的相似度。但是,根据舆论计算网络热度比较容易因热点事件引起波动,只能代表一定的网络热度,并不能很好的代表相对稳定的品牌相似度。
发明内容
本发明的目的在于提供一种相似度挖掘方法及装置。
根据本发明的一方面,提供一种相似度挖掘方法,包括:获取用户行为数据以及品牌词数据,其中,所述用户行为数据包括用户搜索词数据和用户评论数据;根据所述用户搜索词数据以及预存的品牌词数据获取搜索品牌词;根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库;将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量;根据所述搜索品牌词的词向量计算所述搜索品牌词之间的相似度。
优选地,所述相似度挖掘方法还包括:在所述搜索品牌词与其他搜索品牌词之间的相似度都小于预设阈值时,补充所述搜索品牌词下的用户评论数据。
优选地,在根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库中,通过对所述用户行为数据进行过滤、合并、分词、去停用词以构建语料重要词汇库。
优选地,在将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量中,采用word2vec作为词向量工具,并采用HS-CBOW模型建立语料重要词汇库的词向量。
优选地,所述相似度挖掘方法还包括:根据所述搜索品牌词之间的相似度将所述搜索品牌词进行分类,并根据所述分类结果展示各个类别的品牌相关性图。
根据本发明的另一方面,提供一种相似度挖掘装置,包括:数据获取模块,用于获取用户行为数据以及品牌词数据,其中,所述用户行为数据包括用户搜索词数据和用户评论数据;搜索品牌词挖掘模块,用于根据所述用户搜索词数据以及预存的品牌词数据获取搜索品牌词;词汇库构建模块,用于根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库;训练模块,用于将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量;相似度计算模块,用于根据所述搜索品牌词的词向量计算所述搜索品牌词之间的相似度。
优选地,所述相似度挖掘装置还包括:数据补充模块,用于根据所述搜索品牌词之间的距离获取所述搜索品牌词之间的相似度。
优选地,所述词汇库构建模块通过对所述用户行为数据进行过滤、合并、分词、去停用词以构建语料重要词汇库。
优选地,所述训练模块采用word2vec作为词向量工具,并采用HS-CBOW模型建立语料重要词汇库的词向量。
优选地,所述相似度挖掘装置还包括:展示模块,用于根据所述搜索品牌词之间的相似度将所述搜索品牌词进行分类,并根据所述分类结果展示各个类别的品牌相关性图。
本发明提供的相似度挖掘方法及装置根据用户的搜索词数据以及用户购买后的评论数据,使用聚类算法(如word2vector)计算品牌词的相似度,可以自动计算品牌间的相似度,降低人员成本,增加品牌 召回率,提高推荐品牌转化率。
附图说明
通过以下参照附图对本发明实施例的描述,本发明的上述以及其他目的、特征和优点将更为清楚,在附图中:
图1示出了现有技术中各方打分人工评估法的流程图;
图2示出了现有技术中舆论热点聚类法的流程图;
图3示出了根据本发明实施例的相似度挖掘方法的流程图;
图4示出了根据本发明实施例的相似度挖掘装置的结构示意图;
[根据细则26改正10.02.2017] 
图5示出了根据本发明实施例的不同类别的品牌相关性图;
[根据细则91更正 10.02.2017] 
图6示出了根据本发明实施例的母婴品牌中奶粉的市场结构的示意图。
具体实施方式
以下将参照附图更详细地描述本发明的各种实施例。在各个附图中,相同的元件采用相同或类似的附图标记来表示。为了清楚起见,附图中的各个部分没有按比例绘制。
本发明可以各种形式呈现,以下将描述其中一些示例。
图3示出了根据本发明实施例的相似度挖掘方法的流程图。如图3所示,所述相似度挖掘方法包括以下步骤。
在步骤S01中,获取用户行为数据以及品牌词数据,其中,所述用户行为数据包括用户搜索词数据和用户评论数据。
在本实施例中,通过hive查询语句从数据仓库获取用户购物后的评论文本数据、用户搜索词数据以及品牌词数据。通过观察大量的数据,了解数据后,制定过滤规则,过滤掉无效的垃圾数据。对用户购物后的评论文本数据分词与词性标注,建立专有词库提升分词与词性标注效果。
在步骤S02中,根据所述用户搜索词数据以及所述品牌词数据获取搜索品牌词。
在本实施例中,对用户搜索词数据进行过滤,过滤掉与品牌不相关的搜索词,得到与品牌相关的搜索词。根据品牌词数据从与品牌相关的搜索词中提取品牌词得到搜索品牌词。
具体地,对用户行为数据过滤得到用户搜索词数据,其中,所述用户搜索词数据包含品牌词,以其中的一条用户搜索词数据为例,用户搜索词数据为:波司登、羽绒服、轻、薄,根据品牌词数据从该用户搜索词数据中获取其中的品牌词即搜索品牌词。我们可以得到搜索品牌词为:波司登。
在步骤S03中,根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库。
在本实施例中,通过对所述用户行为数据进行过滤、合并、分词、去停用词以构建语料重要词汇库。
在步骤S04中,将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量。
在本实施例中,在词向量模型训练中,借助word2vec工具实现。所训练的语料重要词汇库中包含搜索品牌词的用户评论数据,每条数据包括搜索品牌词以及描述搜索品牌词的文字。为了减少造成数据对训练词向量的影响,首先对数据进行过滤及合并处理,经数据清洗处理后,得到有效的数据。另外,考虑训练速度和实现推荐的复杂度,选用训练较快且工程上相对容易实现的HS-CBOW模型来建立语料重要词汇库的词向量。
进一步地,在词向量维度的选择上,一般而言维度越高、文本窗口越大,词向量的特征表示效果相对会较好,但同时词向量训练耗时越长,训练结果存储占用空间越大。面对较大的数据集,维度设定为100维、文本窗口选为5能保持较快的计算效率,通过训练最后获得一定量词汇的词向量。
word2vec是由Google发布的神经网络工具包,主要采用的模型有CBOW(Contiuous Bag-of-Words)和Skip-Gram两种。可以将输入中的文本词汇转化为一系列词向量,这个工具集已经开始应用在自然语言处理的许多应用中。一种典型的word2vec算法实现是用训练文本数据构建词汇库,再通过学习得到词汇的向量表示。
在S05中,根据所述搜索品牌词的词向量计算所述搜索品牌词之间的相似度。
在本实施例中,通过词向量的数量积计算品牌a和b之间的距离,再根据公式sim(a,b)=cosine(word2vec(a),word2vec(b))计算a和b之间的相似度。a和b之间的距离越大,a和b之间的相似度越高。
在一个优选的实施方式中,所述相似度挖掘方法还包括步骤S06。
在步骤S06中,在所述搜索品牌词与其他搜索品牌词之间的相似度都小于预设阈值时,补充所述搜索品牌词下的用户评论数据。
由于用户评论纷繁复杂、数量庞大,我们不能一次性使用全部的评论数据来训练,而且并不是所有评论都对我们需要的搜索品牌词计算词向量有贡献。有贡献的数据的不充足很有可能导致我们的某个搜索品牌词找不到其相关的品牌。在此,我们根据计算出来的相似度来判断搜索品牌词是否找到相关的品牌,即当其中的一个搜索品牌词语其他搜索品牌词之间的相似度都小于预设阈值,表明该搜索品牌词未 找到相关的品牌,根据未找到相似度的搜索品牌词,提取该搜索品牌词下的用户评论数据,重新从步骤S01开始,计算该搜索品牌词的词向量。该过程迭代多次,直到迭代次数大于设定的次数阈值时停止,以此来极大的提高品牌相似度距离的召回率。如下表1,举例几个品牌的相似度,更直观感受品牌相似度的度量。
表1:品牌相似度
品牌1 品牌2 相似度
GXG 杰克琼斯 80%
恒源祥 南极人 85%
恒源祥 杰克琼斯 75%
恒源祥 麦当劳 30%
在一个优选的实施方式中,所述相似度挖掘方法还包括步骤S07。
在步骤S07中,根据所述搜索品牌词之间的相似度将所述搜索品牌词进行分类,并根据所述分类结果展示各个类别的品牌相关性图。
在本实施例,根据所述搜索品牌词之间的相似度将所述搜索品牌词进行分类,当搜索品牌词之间的相似度大于一定的阈值时,将它们归为一类,形成不同类别的结构,并展示各个类别的品牌相关性图。图5a-图5b展示了服装品牌中内衣的市场结构,如图6a-6b展示了母婴品牌中奶粉的市场结构,可以根据各个类别的品牌相关性图向用户推荐相似度高的品牌,优化品牌定位的策略。
本发明提供的相似度挖掘方法根据用户的搜索词数据以及用户购买后的评论数据,使用聚类算法(如word2vector)计算品牌词的相似度,可以自动计算品牌间的相似度,降低人员成本,增加品牌召回率,提高推荐品牌转化率。
图4示出了根据本发明实施例的相似度挖掘装置的结构示意图。 如图4所示,所述相似度挖掘装置包括数据获取模块101、搜索品牌词挖掘模块102、词汇库构建模块103、训练模块104和相似度计算模块105。
数据获取模块101用于获取用户行为数据以及品牌词数据,其中,所述用户行为数据包括用户搜索词数据和用户评论数据。
在本实施例中,数据获取模块101通过hive查询语句从数据仓库获取用户购物后的评论文本数据、用户搜索词数据以及品牌词数据。通过观察大量的数据,了解数据后,制定过滤规则,过滤掉无效的垃圾数据。对用户购物后的评论文本数据分词与词性标注,建立专有词库提升分词与词性标注效果。
搜索品牌词挖掘模块102用于根据所述用户搜索词数据以及预存的品牌词数据获取搜索品牌词。
在本实施例中,搜索品牌词挖掘模块102对用户搜索词数据进行过滤,过滤掉与品牌不相关的搜索词,得到与品牌相关的搜索词。根据品牌词数据从与品牌相关的搜索词中提取品牌词得到搜索品牌词。
具体地,搜索品牌词挖掘模块102对用户行为数据过滤得到用户搜索词数据,其中,所述用户搜索词数据包含品牌词,以其中的一条用户搜索词数据为例,用户搜索词数据为:波司登、羽绒服、轻、薄,根据品牌词数据从该用户搜索词数据中获取其中的品牌词即搜索品牌词。我们可以得到搜索品牌词为:波司登。
词汇库构建模块103用于根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库。
在本实施例中,所述词汇库构建模块103通过对所述用户行为数 据进行过滤、合并、分词、去停用词以构建语料重要词汇库。
训练模块104用于将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量。
在本实施例中,所述训练模块104借助word2vec工具实现。所训练的语料重要词汇库中包含搜索品牌词的用户评论数据,每条数据包括搜索品牌词以及描述搜索品牌词的文字。为了减少造成数据对训练词向量的影响,首先对数据进行过滤及合并处理,经数据清洗处理后,得到有效的数据。另外,考虑训练速度和实现推荐的复杂度,选用训练较快且工程上相对容易实现的HS-CBOW模型来建立语料重要词汇库的词向量。采用word2vec作为词向量工具,并采用HS-CBOW模型建立语料重要词汇库的词向量。所述词向量维度设定为100维,文本窗口设定为5。
进一步地,在词向量维度的选择上,一般而言维度越高、文本窗口越大,词向量的特征表示效果相对会较好,但同时词向量训练耗时越长,训练结果存储占用空间越大。面对较大的数据集,维度设定为100维、文本窗口选为5能保持较快的计算效率,通过训练最后获得一定量词汇的词向量。
相似度计算模块105用于根据所述搜索品牌词的词向量计算所述搜索品牌词之间的相似度。
在本实施例中,相似度计算模块105通过词向量的数量积计算品牌a和b之间的距离,再根据公式sim(a,b)=cosine(word2vec(a),word2vec(b))计算a和b之间的相似度。a和b之间的距离越大,a和b之间的相似度越高。
在一个优选的实施例中,所述相似度挖掘装置还包括数据补充模 块106,用于在所述搜索品牌词与其他搜索品牌词之间的相似度都小于预设阈值时,补充所述搜索品牌词下的用户评论数据。
由于用户评论纷繁复杂、数量庞大,我们不能一次性使用全部的评论数据来训练,而且并不是所有评论都对我们需要的搜索品牌词计算词向量有贡献。有贡献的数据的不充足很有可能导致我们的某个搜索品牌词找不到其相关的品牌。在此,我们根据计算出来的相似度来判断搜索品牌词是否找到相关的品牌,即当其中的一个搜索品牌词语其他搜索品牌词之间的相似度都小于预设阈值,表明该搜索品牌词未找到相关的品牌,根据未找到相似度的搜索品牌词,提取该搜索品牌词下的用户评论数据,重新从步骤S01开始,计算该搜索品牌词的词向量。该过程迭代多次,直到迭代次数大于设定的次数阈值时停止,以此来极大的提高品牌相似度距离的召回率。
在一个优选的实施例中,所述相似度挖掘装置还包括展示模块107,用于根据所述搜索品牌词之间的相似度将所述搜索品牌词进行分类,并根据所述分类结果展示各个类别的品牌相关性图。
在本实施例,根据所述搜索品牌词之间的相似度将所述搜索品牌词进行分类,当搜索品牌词之间的相似度大于一定的阈值时,将它们归为一类,形成不同类别的结构,并展示各个类别的品牌相关性图。图5a-图5b展示了服装品牌中内衣的市场结构,如图6a-6b展示了母婴品牌中奶粉的市场结构,可以根据各个类别的品牌相关性图向用户推荐相似度高的品牌,优化品牌定位的策略。
本发明提供的相似度挖掘装置根据用户的搜索词数据以及用户购买后的评论数据,使用聚类算法(如word2vector)计算品牌词的相似度,可以自动计算品牌间的相似度,降低人员成本,增加品牌召回率,提高推荐品牌转化率。
依照本发明的实施例如上文所述,这些实施例并没有详尽叙述所有的细节,也不限制该发明仅为所述的具体实施例。显然,根据以上描述,可作很多的修改和变化。本说明书选取并具体描述这些实施例,是为了更好地解释本发明的原理和实际应用,从而使所属技术领域技术人员能很好地利用本发明以及在本发明基础上的修改使用。本发明的保护范围应当以本发明权利要求所界定的范围为准。

Claims (10)

  1. 一种相似度挖掘方法,包括:
    获取用户行为数据以及品牌词数据,其中,所述用户行为数据包括用户搜索词数据和用户评论数据;
    根据所述用户搜索词数据以及预存的品牌词数据获取搜索品牌词;
    根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库;
    将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量;
    根据所述搜索品牌词的词向量计算所述搜索品牌词之间的相似度。
  2. 根据权利要求1所述的方法,其中,还包括:
    在所述搜索品牌词与其他搜索品牌词之间的相似度都小于预设阈值时,补充所述搜索品牌词下的用户评论数据。
  3. 根据权利要求1所述的方法,其中,在根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库中,通过对所述用户行为数据进行过滤、合并、分词、去停用词以构建语料重要词汇库。
  4. 根据权利要求1所述的方法,其中,在将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量中,采用word2vec作为词向量工具,并采用HS-CBOW模型建立语料重要词汇库的词向量。
  5. 根据权利要求1所述的方法,其中,还包括:
    根据所述搜索品牌词之间的相似度将所述搜索品牌词进行分类,并根据所述分类结果展示各个类别的品牌相关性图。
  6. 一种相似度挖掘装置,包括:
    数据获取模块,用于获取用户行为数据以及品牌词数据,其中,所述用户行为数据包括用户搜索词数据和用户评论数据;
    搜索品牌词挖掘模块,用于根据所述用户搜索词数据以及预存的品牌词数据获取搜索品牌词;
    词汇库构建模块,用于根据所述用户行为数据构建关于搜索品牌词的语料重要词汇库;
    训练模块,用于将所述语料重要词汇库作为词向量工具的输入进行词向量模型训练获取所述搜索品牌词的词向量;
    相似度计算模块,用于根据所述搜索品牌词的词向量计算所述搜索品牌词之间的相似度。
  7. 根据权利要求6所述的装置,其中,还包括:
    数据补充模块,用于在所述搜索品牌词与其他搜索品牌词之间的相似度都小于预设阈值时,补充所述搜索品牌词下的用户评论数据。
  8. 根据权利要求6所述的装置,其中,所述词汇库构建模块通过对所述用户行为数据进行过滤、合并、分词、去停用词以构建语料重要词汇库。
  9. 根据权利要求6所述的装置,其中,所述训练模块采用word2vec作为词向量工具,并采用HS-CBOW模型建立语料重要词汇库的词向量。
  10. 根据权利要求6所述的装置,其中,还包括:
    展示模块,用于根据所述搜索品牌词之间的相似度将所述搜索品牌词进行分类,并根据所述分类结果展示各个类别的品牌相关性图。
PCT/CN2017/070225 2016-03-15 2017-01-05 相似度挖掘方法及装置 WO2017157090A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US16/085,893 US11017043B2 (en) 2016-03-15 2017-01-05 Similarity mining method and device
RU2018135971A RU2700191C1 (ru) 2016-03-15 2017-01-05 Способ и устройство выявления сходства
AU2017232659A AU2017232659A1 (en) 2016-03-15 2017-01-05 Similarity mining method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610146542.9A CN107193832A (zh) 2016-03-15 2016-03-15 相似度挖掘方法及装置
CN201610146542.9 2016-03-15

Publications (1)

Publication Number Publication Date
WO2017157090A1 true WO2017157090A1 (zh) 2017-09-21

Family

ID=59850739

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/070225 WO2017157090A1 (zh) 2016-03-15 2017-01-05 相似度挖掘方法及装置

Country Status (5)

Country Link
US (1) US11017043B2 (zh)
CN (1) CN107193832A (zh)
AU (1) AU2017232659A1 (zh)
RU (1) RU2700191C1 (zh)
WO (1) WO2017157090A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038133A (zh) * 2017-11-20 2018-05-15 青岛鹏海软件有限公司 个性化推荐方法
CN108416611A (zh) * 2018-01-31 2018-08-17 佛山市顺德区中山大学研究院 一种超市路径推荐系统及其方法
CN109033232A (zh) * 2018-07-01 2018-12-18 东莞市华睿电子科技有限公司 一种云平台与共享设备相结合的社交用户推荐方法
CN112036120A (zh) * 2020-08-31 2020-12-04 上海硕恩网络科技股份有限公司 一种技能短语抽取方法
CN112667919A (zh) * 2020-12-28 2021-04-16 山东大学 一种基于文本数据的个性化社区矫正方案推荐系统及其工作方法
CN114201962A (zh) * 2021-12-03 2022-03-18 中国中医科学院中医药信息研究所 一种论文新颖性分析方法、装置、介质和设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763205B (zh) * 2018-05-21 2022-05-03 创新先进技术有限公司 一种品牌别名识别方法、装置及电子设备
CN110874609B (zh) * 2018-09-04 2022-08-16 武汉斗鱼网络科技有限公司 基于用户行为的用户聚类方法、存储介质、设备及系统
CN109635383A (zh) * 2018-11-28 2019-04-16 优信拍(北京)信息科技有限公司 一种基于word2vec的车系相关度确定的方法及装置
CN113673216B (zh) * 2021-10-20 2022-02-01 支付宝(杭州)信息技术有限公司 文本侵权检测方法、装置和电子设备
CN116308683B (zh) * 2023-05-17 2023-08-04 武汉纺织大学 基于知识图谱的服装品牌定位推荐方法、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206674A (zh) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 以商品为媒介的增强型相关搜索系统及其方法
CN104778161A (zh) * 2015-04-30 2015-07-15 车智互联(北京)科技有限公司 基于Word2Vec和Query log抽取关键词方法
CN105095430A (zh) * 2015-07-22 2015-11-25 深圳证券信息有限公司 构建词语网络及抽取关键词的方法和装置
CN105279288A (zh) * 2015-12-04 2016-01-27 深圳大学 一种基于深度神经网络的在线内容推荐方法
US20160065534A1 (en) * 2011-07-06 2016-03-03 Nominum, Inc. System for correlation of domain names

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6529892B1 (en) * 1999-08-04 2003-03-04 Illinois, University Of Apparatus, method and product for multi-attribute drug comparison
US7925044B2 (en) * 2006-02-01 2011-04-12 Markmonitor Inc. Detecting online abuse in images
US7606810B1 (en) * 2006-04-27 2009-10-20 Colin Jeavons Editorial related advertising content delivery system
US20080270203A1 (en) * 2007-04-27 2008-10-30 Corporation Service Company Assessment of Risk to Domain Names, Brand Names and the Like
US7873635B2 (en) * 2007-05-31 2011-01-18 Microsoft Corporation Search ranger system and double-funnel model for search spam analyses and browser protection
US9438733B2 (en) * 2008-09-08 2016-09-06 Invoca, Inc. Methods and systems for data transfer and campaign management
US9496003B2 (en) * 2008-09-08 2016-11-15 Apple Inc. System and method for playlist generation based on similarity data
KR101078864B1 (ko) * 2009-03-26 2011-11-02 한국과학기술원 질의/문서 주제 범주 변화 분석 시스템 및 그 방법과 이를 이용한 질의 확장 기반 정보 검색 시스템 및 그 방법
US20130014136A1 (en) * 2011-07-06 2013-01-10 Manish Bhatia Audience Atmospherics Monitoring Platform Methods
US20120144499A1 (en) * 2010-12-02 2012-06-07 Sky Castle Global Limited System to inform about trademarks similar to provided input
WO2013124521A1 (en) * 2012-02-22 2013-08-29 Nokia Corporation A system and a method for determining context
US9406072B2 (en) * 2012-03-29 2016-08-02 Spotify Ab Demographic and media preference prediction using media content data analysis
US20150242525A1 (en) * 2014-02-26 2015-08-27 Pixured, Inc. System for referring to and/or embedding posts within other post and posts within any part of another post
US10102669B2 (en) * 2014-09-08 2018-10-16 Apple Inc. Density sampling map labels
US9767409B1 (en) * 2015-03-30 2017-09-19 Amazon Technologies, Inc. Latent feature based tag routing
US20170075998A1 (en) * 2015-09-14 2017-03-16 Ebay Inc. Assessing translation quality
US11263664B2 (en) * 2015-12-30 2022-03-01 Yahoo Assets Llc Computerized system and method for augmenting search terms for increased efficiency and effectiveness in identifying content

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206674A (zh) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 以商品为媒介的增强型相关搜索系统及其方法
US20160065534A1 (en) * 2011-07-06 2016-03-03 Nominum, Inc. System for correlation of domain names
CN104778161A (zh) * 2015-04-30 2015-07-15 车智互联(北京)科技有限公司 基于Word2Vec和Query log抽取关键词方法
CN105095430A (zh) * 2015-07-22 2015-11-25 深圳证券信息有限公司 构建词语网络及抽取关键词的方法和装置
CN105279288A (zh) * 2015-12-04 2016-01-27 深圳大学 一种基于深度神经网络的在线内容推荐方法

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038133A (zh) * 2017-11-20 2018-05-15 青岛鹏海软件有限公司 个性化推荐方法
CN108416611A (zh) * 2018-01-31 2018-08-17 佛山市顺德区中山大学研究院 一种超市路径推荐系统及其方法
CN108416611B (zh) * 2018-01-31 2020-12-04 佛山市顺德区中山大学研究院 一种超市路径推荐系统及其方法
CN109033232A (zh) * 2018-07-01 2018-12-18 东莞市华睿电子科技有限公司 一种云平台与共享设备相结合的社交用户推荐方法
CN109033232B (zh) * 2018-07-01 2021-12-28 东莞市华睿电子科技有限公司 一种云平台与共享设备相结合的社交用户推荐方法
CN112036120A (zh) * 2020-08-31 2020-12-04 上海硕恩网络科技股份有限公司 一种技能短语抽取方法
CN112667919A (zh) * 2020-12-28 2021-04-16 山东大学 一种基于文本数据的个性化社区矫正方案推荐系统及其工作方法
CN114201962A (zh) * 2021-12-03 2022-03-18 中国中医科学院中医药信息研究所 一种论文新颖性分析方法、装置、介质和设备
CN114201962B (zh) * 2021-12-03 2023-07-25 中国中医科学院中医药信息研究所 一种论文新颖性分析方法、装置、介质和设备

Also Published As

Publication number Publication date
RU2700191C1 (ru) 2019-09-13
US11017043B2 (en) 2021-05-25
US20200301982A1 (en) 2020-09-24
AU2017232659A1 (en) 2018-10-11
CN107193832A (zh) 2017-09-22

Similar Documents

Publication Publication Date Title
WO2017157090A1 (zh) 相似度挖掘方法及装置
McKenzie et al. Weighted multi-attribute matching of user-generated points of interest
KR101423544B1 (ko) 시맨틱 토픽 추출 장치 및 방법
WO2019041521A1 (zh) 用户关键词提取装置、方法及计算机可读存储介质
JP6381775B2 (ja) 情報処理システム及び情報処理方法
US20170109633A1 (en) Comment-comment and comment-document analysis of documents
CN109816482B (zh) 电商平台的知识图谱构建方法、装置、设备及存储介质
WO2016196857A1 (en) Entity classification and/or relationship identification
CN108805598B (zh) 相似度信息确定方法、服务器及计算机可读存储介质
CN106959966A (zh) 一种信息推荐方法及系统
JP2017508214A (ja) 検索推奨の提供
CN103425635A (zh) 一种答案推荐方法和装置
CN106096609B (zh) 一种基于ocr的商品查询关键字自动生成方法
CN109783614B (zh) 一种社交网络待发布文本的差分隐私泄露检测方法及系统
CN106294425A (zh) 商品相关网络文章之自动图文摘要方法及系统
CN107015976B (zh) 业务处理方法、数据处理方法及装置
CN106294744A (zh) 兴趣识别方法及系统
TW201241773A (en) Method and apparatus of determining product category information
CN104462327B (zh) 语句相似度的计算、搜索处理方法及装置
TWI705411B (zh) 社交業務特徵用戶的識別方法和裝置
CN109597990B (zh) 一种社会热点与商品品类的匹配方法
CN103593474A (zh) 基于深度学习的图像检索排序方法
CN105023178B (zh) 一种基于本体的电子商务推荐方法
CN111191099B (zh) 一种基于社交媒体的用户活动类型识别方法
CN103577472A (zh) 个人信息获得、推定、商品的分类、检索方法及系统

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017232659

Country of ref document: AU

Date of ref document: 20170105

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17765632

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.01.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17765632

Country of ref document: EP

Kind code of ref document: A1