WO2013185601A1 - Method and device for obtaining product information and computer storage medium - Google Patents

Method and device for obtaining product information and computer storage medium Download PDF

Info

Publication number
WO2013185601A1
WO2013185601A1 PCT/CN2013/077110 CN2013077110W WO2013185601A1 WO 2013185601 A1 WO2013185601 A1 WO 2013185601A1 CN 2013077110 W CN2013077110 W CN 2013077110W WO 2013185601 A1 WO2013185601 A1 WO 2013185601A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
product
module
obtaining
collected
Prior art date
Application number
PCT/CN2013/077110
Other languages
French (fr)
Chinese (zh)
Inventor
唐沐
陈妍
樊中一
骆玘
孙鹏
牟伟成
郭洪伟
黄利贤
吕虹
胡炜
苏楠
张弘
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP13804547.1A priority Critical patent/EP2846271A4/en
Priority to US14/404,905 priority patent/US20150149383A1/en
Publication of WO2013185601A1 publication Critical patent/WO2013185601A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0278Product appraisal

Definitions

  • the invention belongs to the information acquisition technology in the field of information processing technology, and in particular relates to a method, a device and a computer storage medium for acquiring product information. Background technique
  • relevant user feedback information about network products such as network product usage, existing problems, and suggestions, are mainly obtained through online questionnaire survey or forum collection.
  • the online questionnaire does not currently support the user's independent participation. Instead, it requires a large amount of human and material resources to actively invite users to participate, and adopts manual collection methods, especially on non-internal platforms, which requires a large amount of financial support and high cost. Moreover, it usually takes 3-5 days for the data to be placed and collected, and it is necessary to send a person to manually check the collected results for classification and classification, which takes a long time, is inefficient, and cannot guarantee the accuracy.
  • the object to which the questionnaire is placed has a certain platform bias, that is, it is selective, and is aimed at the internal platform (dedicated platform), and is not random, and is performed on any common platform, which is not conducive to accuracy. improve.
  • the collection of the forum also requires a lot of time and effort to monitor and collect the feedback from the users on the major forum websites.
  • the information fed back by the users can only be qualitatively classified and cannot be quantitatively analyzed.
  • Embodiments of the present invention provide a method for acquiring product information, to solve the existing technology. High cost, inefficiency, platform bias, and the inability to obtain quantitative data to improve accuracy.
  • the embodiment of the present invention is implemented by the method for obtaining product information, where the method includes:
  • the filtered information is analyzed to obtain relevance information related to the product; the obtained relevance information is classified, statistically analyzed, and user feedback information related to the product is obtained.
  • An embodiment of the present invention provides an apparatus for acquiring product information, where the apparatus includes: an information collection module, configured to collect, from a public platform, original information related to a product reviewed by a user;
  • An information filtering module configured to filter the original information collected by the information collection module
  • An information analysis module configured to analyze information filtered by the information filtering module, and obtain relevant information related to the product
  • the result obtaining module is configured to perform statistic and analysis on the obtained relevance information, and obtain user feedback information related to the product.
  • Embodiments of the present invention provide a computer storage medium in which a computer program for executing the above method for acquiring product information is stored.
  • the embodiment of the present invention collects original information related to the product of the user comment from any common platform, rather than a dedicated platform of the prior art, and filters and analyzes the original information. Obtaining relevance information related to the product, classifying, counting, and analyzing the obtained relevance information, and obtaining user feedback information ultimately related to the product, so that the product operator can perform feedback according to the user. The information fully understands the user's use of the product, facilitates the improvement of the product, and improves the user's use. Intention.
  • the original information related to the product that the user participates in the review is collected directly from any public platform, instead of the passive inviting user of the prior art, the original information is provided by the user actively (for example, Post microblogs, leave a message in the forum, etc.), do not need to invite users to do research, which effectively reduces costs.
  • the automated processing process including classification, statistics and analysis
  • the embodiment of the present invention can cover multiple information sources at the same time (such as Tencent Weibo, Sina Weibo, Support platform, etc., can effectively avoid the bias caused by platform differences, the accuracy rate caused by the lack of quantitative data and the high cost of questionnaires.
  • FIG. 1 is a flowchart of an implementation of a method for acquiring product information according to Embodiment 1 of the present invention
  • FIG. 2 is a specific flowchart of a method for obtaining product information according to Embodiment 2 of the present invention
  • a structural diagram of the information device A structural diagram of the information device. detailed description
  • Embodiment 1
  • FIG. 1 is a flowchart showing an implementation process of a method for acquiring product information according to Embodiment 1 of the present invention. The process of the method is as follows:
  • Step S101 Collect original information related to the product of the user comment from the public platform.
  • the public platform refers to the internal platform or the dedicated platform. Platforms, such as common Weibo and/or major forums.
  • the step is specifically: collecting raw information related to the product of the user comment from the microblog and/or the forum.
  • the user's commentary and product are collected from the microblog and/or forum through an application programming interface (API, Application Programming Interface) and/or a web crawler. Relevant original information, and the collected original information is stored in a database.
  • the original information is collected from the microblog and/or the forum, but is also collected from the support platform, the Exp platform, and the like.
  • the time interval of the collection (for example, every 1 hour) or multiple consecutive acquisitions may be preset.
  • the embodiment further includes: storing the collected original information according to a preset rule, and the classifying according to the preset rule comprises classifying the content feature according to the original information, where the content feature of the original information includes
  • the content feature of the original information includes
  • it is not limited to media information, official information, advertising information, and default blacklist user comment information, as shown in Table 1:
  • Step S102 Filter the collected original information.
  • the filtering the collected original information includes: performing deduplication processing on the collected original information and removing the invalid information.
  • the support platform can be deduplicated based on text content and user name; Tencent Weibo, Sina Weibo: Threshold can be set, when the same or similar number of text content is greater than the threshold, it is considered as advertising or pure Share the class microblog to delete it.
  • the process of removing invalid information includes removing invalid information such as official releases, activity advertisements, internal water forces, and irrelevant statements as shown in Table 1.
  • Step S103 Perform analysis on the filtered information to obtain relevance information related to the product.
  • the relevance information specifically includes: a hot spot attention word and/or a word-of-mouth word.
  • the hot topic attention word refers to a user's hot spot of interest on the product; the word-of-mouth word refers to a user's comment trend on the product.
  • the step is specifically: analyzing the filtered information to obtain hot topic words and/or word-of-mouth words related to the product.
  • information retained after filtering such as opinion comment type, media, sharing, and the like, is mainly analyzed.
  • follow-up mainly extracts word-of-mouth words from comments and comments.
  • the obtaining the hot topic attention word and the word-of-mouth word related to the product specifically includes:
  • the filtered information is processed by word segmentation to obtain the processing result.
  • the filtered information is processed by the Chinese lexical analysis system to obtain the processing result, for example, the Chinese lexical analysis system can be obtained.
  • ICTCLAS Institute of Computing The Chinese word segmentation interface provided by the Technology Chinese Lexical Analysis System calls the word segmentation algorithm in ICTCLAS to perform word segmentation on the filtered information to obtain the processing result.
  • the words in the processing result that meet the set appearance frequency are selected, and the selected results are filtered through the pre-stored thesaurus to obtain hot topic attention words and/or word-of-mouth words related to the product.
  • the processing result is corrected by using a pre-stored word segment to obtain a calibration result; the calibration result is filtered by a pre-stored word-of-mouth lexicon and/or an invalid vocabulary to obtain a hot spot related to the network product. Focus on words and / or word of mouth.
  • the process of acquiring a hot topic attention word includes removing a word in the noun column whose appearance frequency is less than a preset value (such as one percent of the highest frequency in the effective word); removing a single word, such as a person, a network, or the like .
  • the process of obtaining word-of-mouth words includes removing words in the adjective column that are less than the preset value (such as one percent of the highest frequency in the valid words); searching for common word-of-mouth words in the verb column, such as pits, force, etc.; The found word-of-mouth words are compared with the pre-stored word-of-mouth lexicon (completed in excel) to obtain word-of-mouth words related to the network products.
  • the preset value such as one percent of the highest frequency in the valid words
  • searching for common word-of-mouth words in the verb column such as pits, force, etc.
  • the found word-of-mouth words are compared with the pre-stored word-of-mouth lexicon (completed in excel) to obtain word-of-mouth words related to the network products.
  • Step S104 Perform statistic and analysis on the obtained relevance information, and obtain user feedback information related to the product.
  • the step is specifically: classifying the obtained hot spot attention words and/or word-of-mouth words, and performing statistics and analysis on the classified hot spot attention words and word-of-mouth words, and obtaining user feedback related to the products. information.
  • it includes dividing the obtained hot spot words into one category, and the positive word-of-mouth words (for example, good, power, GOOD, etc.) in word-of-mouth words are divided into one category, negative word-of-mouth words in word-of-mouth words (for example, poor, potholes) Etc.) is divided into one category.
  • the positive word-of-mouth words for example, good, power, GOOD, etc.
  • negative word-of-mouth words in word-of-mouth words for example, poor, potholes
  • the user feedback information includes a quantitative analysis report and/or a qualitative analysis report.
  • the quantitative analysis report includes information such as hot topic words and positive word-of-mouth and negative word-of-mouth words, changes in quantity, and reasons for quantity changes.
  • the qualitative analysis report includes information such as the user's hotspots on the product and word-of-mouth evaluation.
  • the product operator can fully understand the feedback of the user on the use of the product, and facilitate the improvement of the product to improve the satisfaction of the user.
  • the method further includes:
  • Information from the user's peers and similar products associated with the product is collected from the public platform, such as Weibo and/or the forum.
  • the information of the similar products and similar products related to the products may be pre-stored, including the names of similar products and similar products, serial aliases, names of some key functional blocks, etc., from Weibo and/or forums.
  • the original information related to the product of the user's comment is collected, and information of the same type and similar products related to the product that the user reviews are collected from the microblog and/or forum according to the stored information of the same type and similar products.
  • the original information related to the product of the user comment is collected from the microblog and/or the forum, and the original information is filtered and analyzed to obtain the user's comment trend on the product (word of mouth) and the user Product attention hotspots (hot focus words), classify and count the hot topic words and/or word-of-mouth words obtained, obtain quantitative analysis reports and/or qualitative analysis reports of the products, so that product operators can
  • the quantitative analysis report and/or the qualitative analysis report fully understand the feedback of the user on the use of the product, facilitate the improvement of the product, and improve the satisfaction of the user.
  • the original information related to the product is directly collected from the Weibo and/or the forum, the original information is provided by the user (for example, publishing Weibo, In the forum, etc.), there is no need to invite users to do research, which effectively reduces costs.
  • the automated processing after information collection effectively improves efficiency and accuracy.
  • it covers multiple information sources at the same time (such as Tencent Weibo, Sina Weibo, support platform, etc.), it can effectively avoid the bias caused by platform differences, the lack of quantitative data leads to lower accuracy and the high cost of questionnaires. problem.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • FIG. 2 shows a specific process of the method for obtaining product information provided by the second embodiment of the present invention.
  • the embodiment mainly includes four parts: information collection, information filtering, information analysis, and quantitative and qualitative text acquisition.
  • user information is collected from information sources such as Weibo, forums, etc. (including platforms such as the support platform, EXP platform, etc.) through APIs and/or web crawlers.
  • Information sources such as Weibo, forums, etc. (including platforms such as the support platform, EXP platform, etc.) through APIs and/or web crawlers.
  • Raw information related to the product, and the collected raw information is stored in the database.
  • the deduplication processing includes content text deduplication and content text and user name deduplication.
  • the process of removing invalid information includes removing irrelevant text information, officially released information, information published by the water army, and advertising information.
  • the classification of the filtered information is mainly divided into a media news class, an active sharing class, and a suggestion comment class, and a word segmentation interface provided by ICTCLAS according to the general term of the product and/or its competitive product.
  • ICTCLAS Calling the word segmentation algorithm in ICTCLAS to perform word segmentation on the filtered information, obtaining the processing result, correcting the processing result through the pre-stored word segment, and obtaining the correction result, by using the pre-stored word-of-mouth lexicon and the invalid word bank
  • the calibration results are filtered to obtain hot topic words and word-of-mouth words related to the product.
  • it also includes screening the suggested Weibo through the suggested comment class and the pre-stored suggestion vocabulary. Get suggested text.
  • the methods of categorizing, deducting, analyzing and counting the obtained hot spot words and word-of-mouth words are used to obtain quantitative and qualitative analysis reports of the products.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • FIG. 3 shows a component structure of the device for acquiring product information provided in Embodiment 3 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown.
  • the device for acquiring product information may be a software unit, a hardware unit or a combination of hardware and software running in each application system.
  • the device for acquiring product information includes an information collecting module 31, an information filtering module 32, an information analyzing module 33, and a result obtaining module 34.
  • the specific functions of each unit are as follows:
  • the information collecting module 31 is configured to collect original information related to the product of the user comment from the public platform;
  • the public platform includes: a microblog and/or a forum;
  • the information filtering module 32 is configured to filter the original information collected by the information collecting module.
  • the information analyzing module 33 is configured to analyze the filtered information of the information filtering module, and obtain relevant information related to the product.
  • the relevance information includes: a hot topic word and/or a word of mouth;
  • the result obtaining module 34 is configured to classify the obtained relevance information, perform statistics and analysis, and obtain user feedback information related to the product.
  • the device further includes:
  • the information storage module 35 is configured to classify the collected original information according to its content characteristics and then store the collected original information.
  • the information analysis module 33 includes:
  • the processing module 331 is configured to perform word segmentation processing on the filtered information according to the general terms of the product, and/or the product of the same type and similar competitive products, and obtain the processing result;
  • An obtaining module 332 configured to select, from a processing result of the processing module, a set appearance frequency
  • the second words are filtered by the pre-stored thesaurus to obtain the correlation information.
  • the information collecting module 31 is also used to collect user comments from the public platform. Information about the products and their similar products.
  • the information filtering module is further configured to filter the collected original information, including but not limited to a deduplication process and a process of removing invalid information.
  • the device for obtaining the product information provided in this embodiment may use the corresponding method for acquiring the product information in the foregoing.
  • the device for obtaining the product information may use the corresponding method for acquiring the product information in the foregoing.
  • the related descriptions of the first and second embodiments of the method for obtaining the product information and details are not described herein again.
  • the embodiment of the present invention collects original information related to a product of a user's comment from a public platform such as a microblog and/or a forum, and filters and analyzes the original information to obtain a correlation related to the product.
  • Degree information such as the user's comment trend on the product (word of mouth) and the user's attention to the product (hot topic)
  • classify the obtained hot topic words and / or word-of-mouth words and classify
  • the hot focus attention word and/or word-of-mouth word is used for statistics and analysis, and the quantitative analysis report and/or qualitative analysis report of the product is obtained, so that the product operator can fully understand the user pair according to the quantitative analysis report and/or the qualitative analysis report.
  • the use feedback of the product facilitates the improvement of the product and improves the satisfaction of the user.
  • the original information related to the product is directly collected from the Weibo and/or the forum, the original information is provided by the user (for example, posting Weibo, leaving a message in the forum, etc.), and the user is not required to conduct research. Thereby effectively reducing the cost.
  • the automated processing after information collection effectively improves efficiency and accuracy.
  • it covers multiple information sources at the same time (such as Tencent Weibo, Sina Weibo, support platform, etc.), it can effectively avoid the bias caused by platform differences, the accuracy rate caused by the lack of quantitative data, and the high required for questionnaires. Cost issue.
  • the integrated modules described in the embodiments of the present invention may also be stored in a computer readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product.
  • the computer software product is stored in a storage medium and includes a plurality of instructions.
  • a computer device (which may be a personal computer, server, or network device, etc.) is implemented to perform all or part of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk and the like, which can store program codes.
  • the embodiment of the present invention further provides a computer storage medium, wherein a computer program is stored, and the computer program is used to execute the method for obtaining product information in the embodiment of the present invention.

Abstract

The present invention is applicable to the field of information processing technologies, and provides a method and a device for obtaining product information and a computer storage medium. The method comprises: collecting, from a common platform, original information related to a product and commented by a user; filtering the collected original information; analyzing the filtered information, and obtaining relevance information related to the product; and performing counting and analyzing after classifying the obtained relevance information, and obtaining user feedback information related to the product. By means of the present invention, the problem in the prior art that when the user feedback information related to the product is obtained, the cost is high, the efficiency is low, platform is biased and quantitative data cannot be obtained to improve the accuracy can be effectively solved.

Description

一种获取产品信息的方法、 装置及计算机存储介庸 技术领域  Method, device and computer storage for obtaining product information
本发明属于信息处理技术领域的信息获取技术, 尤其涉及一种获取产 品信息的方法、 装置及计算机存储介质。 背景技术  The invention belongs to the information acquisition technology in the field of information processing technology, and in particular relates to a method, a device and a computer storage medium for acquiring product information. Background technique
目前对网络产品的相关用户反馈信息, 例如网络产品使用情况、 存在 的问题、 建议等主要通过网络问卷调查或论坛搜集的方式获取。  At present, relevant user feedback information about network products, such as network product usage, existing problems, and suggestions, are mainly obtained through online questionnaire survey or forum collection.
然而, 网络问卷调查目前不支持用户自主参与, 而是需要投入大量人 力物力主动邀请用户的参与, 且采用人工搜集的方式, 尤其在非内部平台 投放问卷需要大量的资金支持,成本较高。 而且通常需要经过 3-5天的时间 进行数据的投放和收集, 并需要派专人去人工核对搜集结果以进行分类统 计, 耗时较长、 效率较低, 准确率也无法保证。 此外, 投放问卷的对象具 有一定的平台偏向性, 也就是说, 是有选择性的, 且针对内部平台 (专用 平台), 而不具有随机性, 针对任意一个公共平台进行, 不利于准确率的提 高。  However, the online questionnaire does not currently support the user's independent participation. Instead, it requires a large amount of human and material resources to actively invite users to participate, and adopts manual collection methods, especially on non-internal platforms, which requires a large amount of financial support and high cost. Moreover, it usually takes 3-5 days for the data to be placed and collected, and it is necessary to send a person to manually check the collected results for classification and classification, which takes a long time, is inefficient, and cannot guarantee the accuracy. In addition, the object to which the questionnaire is placed has a certain platform bias, that is, it is selective, and is aimed at the internal platform (dedicated platform), and is not random, and is performed on any common platform, which is not conducive to accuracy. improve.
而论坛搜集也需要花费大量的时间和精力在各大论坛网站监测并搜集 用户反馈的信息, 对用户所反馈的信息只能进行定性的统计分类, 无法做 到定量的分析。  The collection of the forum also requires a lot of time and effort to monitor and collect the feedback from the users on the major forum websites. The information fed back by the users can only be qualitatively classified and cannot be quantitatively analyzed.
综上所述, 现有技术在获取网络产品的相关用户反馈信息时, 存在成 本高、 效率低、 平台偏向性以及无法获得定量数据以提高准确率等问题。 发明内容  In summary, the prior art has the problems of high cost, low efficiency, platform bias, and inability to obtain quantitative data to improve accuracy when obtaining relevant user feedback information of the network product. Summary of the invention
本发明实施例提供一种获取产品信息的方法, 以解决现有技术存在的 成本高、 效率低、 平台偏向性以及无法获得定量数据以提高准确率的问题。 本发明实施例是这样实现的, 一种获取产品信息的方法, 所述方法包 括: Embodiments of the present invention provide a method for acquiring product information, to solve the existing technology. High cost, inefficiency, platform bias, and the inability to obtain quantitative data to improve accuracy. The embodiment of the present invention is implemented by the method for obtaining product information, where the method includes:
从公共平台采集用户评论的与产品相关的原始信息;  Collecting product-related raw information from user reviews from a public platform;
对采集的所述原始信息进行过滤;  Filtering the collected original information;
对过滤后的信息进行分析, 获取与所述产品相关的相关度信息; 对所获取的所述相关度信息进行归类后进行统计和分析, 获取与所述 产品相关的用户反馈信息。  The filtered information is analyzed to obtain relevance information related to the product; the obtained relevance information is classified, statistically analyzed, and user feedback information related to the product is obtained.
本发明实施例提供一种获取产品信息的装置, 所述装置包括: 信息采集模块, 用于从公共平台采集用户评论的与产品相关的原始信 息;  An embodiment of the present invention provides an apparatus for acquiring product information, where the apparatus includes: an information collection module, configured to collect, from a public platform, original information related to a product reviewed by a user;
信息过滤模块, 用于对所述信息采集模块采集的所述原始信息进行过 滤;  An information filtering module, configured to filter the original information collected by the information collection module;
信息分析模块, 用于对所述信息过滤模块过滤后的信息进行分析, 获 取与所述产品相关的相关度信息;  An information analysis module, configured to analyze information filtered by the information filtering module, and obtain relevant information related to the product;
结果获取模块, 用于对所获取的所述相关度信息进行归类后进行统计 和分析, 获取与所述产品相关的用户反馈信息。  The result obtaining module is configured to perform statistic and analysis on the obtained relevance information, and obtain user feedback information related to the product.
本发明实施例提供一种计算机存储介质, 其中存储有计算机程序, 该 计算机程序用于执行上述获取产品信息的方法。  Embodiments of the present invention provide a computer storage medium in which a computer program for executing the above method for acquiring product information is stored.
从上述技术方案可以看出, 本发明实施例通过从任意一个公共平台, 而非现有技术的专用平台来采集用户评论的与产品相关的原始信息, 并对 所述原始信息进行过滤、 分析后获取与所述产品相关的相关度信息, 对所 获取的所述相关度信息进行归类、 统计和分析, 获取最终与所述产品相关 的用户反馈信息, 使得产品运营者可以根据所述用户反馈信息充分了解用 户对所述产品的使用情况, 便于对所述产品进行改进, 提高用户使用的满 意度。 而且由于是直接从任意一个公共平台采集用户自主参与评论的与产 品相关的原始信息, 而不是现有技术的被动邀请用户来参与, 即本发明实 施例所述原始信息都是用户主动提供(例如发表微博, 在论坛留言等), 不 需要邀请用户做调研, 从而有效的降低了成本。 同时, 有别于现有技术的 信息采集的人工搜集过程, 而采用信息采集后的自动化处理过程(包括归 类、 统计和分析), 能有效地提高信息获取效率和准确率。 另外, 由于基于 任意一个公共平台进行随机采集数据, 而不是现有技术的基于专用平台有 选择的采集数据, 即本发明实施例能同时覆盖多个信息源 (如腾讯微博、 新浪微博、 support平台等), 可有效避免由于平台差异导致的偏向性、 定量 数据缺乏导致的准确率降低以及问卷投放所需的高成本问题。 附图说明 It can be seen from the above technical solution that the embodiment of the present invention collects original information related to the product of the user comment from any common platform, rather than a dedicated platform of the prior art, and filters and analyzes the original information. Obtaining relevance information related to the product, classifying, counting, and analyzing the obtained relevance information, and obtaining user feedback information ultimately related to the product, so that the product operator can perform feedback according to the user The information fully understands the user's use of the product, facilitates the improvement of the product, and improves the user's use. Intention. Moreover, since the original information related to the product that the user participates in the review is collected directly from any public platform, instead of the passive inviting user of the prior art, the original information is provided by the user actively (for example, Post microblogs, leave a message in the forum, etc.), do not need to invite users to do research, which effectively reduces costs. At the same time, it is different from the manual collection process of information collection in the prior art, and the automated processing process (including classification, statistics and analysis) after information collection can effectively improve information acquisition efficiency and accuracy. In addition, since the data is collected randomly based on any common platform, instead of the prior art, the collected data is selectively selected based on the dedicated platform, that is, the embodiment of the present invention can cover multiple information sources at the same time (such as Tencent Weibo, Sina Weibo, Support platform, etc., can effectively avoid the bias caused by platform differences, the accuracy rate caused by the lack of quantitative data and the high cost of questionnaires. DRAWINGS
图 1是本发明实施例一提供的获取产品信息方法的实现流程图; 图 2是本发明实施例二提供的获取产品信息方法的具体流程图; 图 3是本发明实施例三提供的获取产品信息装置的组成结构图。 具体实施方式  1 is a flowchart of an implementation of a method for acquiring product information according to Embodiment 1 of the present invention; FIG. 2 is a specific flowchart of a method for obtaining product information according to Embodiment 2 of the present invention; A structural diagram of the information device. detailed description
为了使本发明的技术方案及优点更加清楚明白, 以下结合附图及实施 例, 对本发明进行进一步详细说明。 应当理解, 此处所描述的具体实施例 仅仅用以解释本发明, 并不用于限定本发明。  In order to make the technical solutions and advantages of the present invention more comprehensible, the present invention will be further described in detail with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
为了说明本发明所述的技术方案, 下面通过具体实施例来进行说明。 实施例一:  In order to explain the technical solutions of the present invention, the following description will be made by way of specific embodiments. Embodiment 1:
图 1 示出了本发明实施例一提供的获取产品信息方法的实现流程, 该 方法过程详述如下:  FIG. 1 is a flowchart showing an implementation process of a method for acquiring product information according to Embodiment 1 of the present invention. The process of the method is as follows:
步骤 S101: 从公共平台采集用户评论的与产品相关的原始信息。  Step S101: Collect original information related to the product of the user comment from the public platform.
这里需要指出的是: 所述公共平台指有别于内部平台或称为专用平台 的平台, 如常见的微博和 /或各大论坛。 It should be pointed out here that: The public platform refers to the internal platform or the dedicated platform. Platforms, such as common Weibo and/or major forums.
优选的, 本步骤具体为: 从微博和 /或论坛采集用户评论的与产品相关 的原始信息。  Preferably, the step is specifically: collecting raw information related to the product of the user comment from the microblog and/or the forum.
具体是通过应用程序编程接口( API, Application Programming Interface ) 和 /或网络爬虫从微博和 /或论坛采集用户评论的与产品(包括所述产品的名 称、 系列别名或部分重点功能块的名称)相关的原始信息, 并将采集的所 述原始信息存储至数据库。 在本实施例中, 包括但不限于从微博和 /或论坛 采集原始信息, 还可以从 support平台、 Exp平台等采集。  Specifically, the user's commentary and product (including the name of the product, the serial name of the product, or the name of the partial key function block) are collected from the microblog and/or forum through an application programming interface (API, Application Programming Interface) and/or a web crawler. Relevant original information, and the collected original information is stored in a database. In this embodiment, the original information is collected from the microblog and/or the forum, but is also collected from the support platform, the Exp platform, and the like.
需要说明的是, 本实施例在采集所述原始信息时, 可以预先设定采集 的时间间隔 (例如每隔 1小时采集一次)或者连续多次的采集。  It should be noted that, in the embodiment, when the original information is collected, the time interval of the collection (for example, every 1 hour) or multiple consecutive acquisitions may be preset.
优选的是, 本实施例还包括: 将采集的所述原始信息按预设规则分类 后存储, 所述按预设规则分类包括按原始信息的内容特征进行分类, 所述 原始信息的内容特征包括但不限于媒体信息、 官方信息、 广告信息、 预设 的黑名单用户评论信息等, 如表一所示:  Preferably, the embodiment further includes: storing the collected original information according to a preset rule, and the classifying according to the preset rule comprises classifying the content feature according to the original information, where the content feature of the original information includes However, it is not limited to media information, official information, advertising information, and default blacklist user comment information, as shown in Table 1:
一级分类 二级分类 特征 处理方式  Primary classification secondary classification characteristics treatment
媒体 媒体、 新闻等 存储 信息传播型 官方发布 官方账号发布等 删除  Media, media, news, etc. Storage Information dissemination Official release Official account release, etc. Delete
分享 应用分享、 ##等 存储 活动广告 广告宣传、 有奖活动等 删除 内部水军 黑名单用户 删除 意见评论型 用户建议 包含口碑词, 如给力等 存储  Share Application Sharing, ##, etc. Storage Activity Advertising Advertising, Awards, etc. Delete Internal Water Army Blacklist User Delete Comments User Suggestions Contains word-of-mouth words, such as Alibaba
评论感慨  Comment
无关陈述型 模糊搜索导致 与搜索关键词完全无 删除  Irrelevant statement type fuzzy search leads to no search keyword
关 步骤 S102: 对采集的原始信息进行过滤。 turn off Step S102: Filter the collected original information.
在本实施例中, 所述对采集的原始信息进行过滤包括: 对采集的所述 原始信息进行去重处理以及去除无效信息的处理。  In this embodiment, the filtering the collected original information includes: performing deduplication processing on the collected original information and removing the invalid information.
例如, 去重处理:  For example, to deal with heavy processing:
对于 Exp平台, support平台: 可以基于文本内容及用户名称进行去重; 腾讯微博, 新浪微博: 可以设定阈值, 当文本内容的相同或相似条数 大于所述阈值时认定为广告或纯分享类微博对其进行删除。  For the Exp platform, the support platform: can be deduplicated based on text content and user name; Tencent Weibo, Sina Weibo: Threshold can be set, when the same or similar number of text content is greater than the threshold, it is considered as advertising or pure Share the class microblog to delete it.
去除无效信息的处理, 包括去除如表一所示的官方发布、 活动广告、 内部水军、 无关陈述型等无效信息。  The process of removing invalid information includes removing invalid information such as official releases, activity advertisements, internal water forces, and irrelevant statements as shown in Table 1.
步骤 S103 : 对过滤后的信息进行分析, 获取与所述产品相关的相关度 信息。  Step S103: Perform analysis on the filtered information to obtain relevance information related to the product.
这里需要指出的是, 所述相关度信息具体包括: 热点关注词和 /或口碑 词。 其中, 所述热点关注词指用户对所述产品的关注热点; 所述口碑词指 用户对所述产品的评论趋势。  It should be noted here that the relevance information specifically includes: a hot spot attention word and/or a word-of-mouth word. The hot topic attention word refers to a user's hot spot of interest on the product; the word-of-mouth word refers to a user's comment trend on the product.
优选的, 本步骤具体为: 对过滤后的信息进行分析, 获取与所述产品 相关的热点关注词和 /或口碑词。  Preferably, the step is specifically: analyzing the filtered information to obtain hot topic words and/or word-of-mouth words related to the product.
在本实施例中, 主要对过滤后保留下来的信息, 如意见评论型、 媒体、 分享等信息进行分析。 后续主要从意见评论型文本中提取口碑词。  In this embodiment, information retained after filtering, such as opinion comment type, media, sharing, and the like, is mainly analyzed. Follow-up mainly extracts word-of-mouth words from comments and comments.
在本实施例中, 所述获取与所述产品相关的热点关注词及口碑词具体 包括:  In this embodiment, the obtaining the hot topic attention word and the word-of-mouth word related to the product specifically includes:
根据所述产品、 和 /或产品同类及其相似产品的通用名词, 对过滤后的 信息进行分词处理, 获得处理结果。  According to the generic terms of the product, and/or the product and similar products, the filtered information is processed by word segmentation to obtain the processing result.
在本实施例中, 根据所述产品、 和 /或产品同类及其相似产品的通用名 词, 通过汉语词法分析系统对过滤后的信息进行分词处理, 获得处理结果, 例如, 可以通过汉语词法分析系统 ( ICTCLAS, Institute of Computing Technology Chinese Lexical Analysis System )提供的分词接口调用 ICTCLAS 内的分词算法对过滤后的信息进行分词处理, 获得处理结果。 In this embodiment, according to the generic nouns of the product, and/or the product of the same kind and similar products, the filtered information is processed by the Chinese lexical analysis system to obtain the processing result, for example, the Chinese lexical analysis system can be obtained. (ICTCLAS, Institute of Computing The Chinese word segmentation interface provided by the Technology Chinese Lexical Analysis System calls the word segmentation algorithm in ICTCLAS to perform word segmentation on the filtered information to obtain the processing result.
进一步的, 选取处理结果中达到设定出现频次(例如 7次等) 的词语, 通过预存的词库对选取结果进行筛选, 获取与所述产品相关的热点关注词 和 /或口碑词。  Further, the words in the processing result that meet the set appearance frequency (for example, 7 times, etc.) are selected, and the selected results are filtered through the pre-stored thesaurus to obtain hot topic attention words and/or word-of-mouth words related to the product.
具体为, 通过预存的分词库对所述处理结果进行校正, 获得校正结果; 通过预存的口碑词库和 /或无效词库对所述校正结果进行筛选, 获取与所述 网络产品相关的热点关注词和 /或口碑词。  Specifically, the processing result is corrected by using a pre-stored word segment to obtain a calibration result; the calibration result is filtered by a pre-stored word-of-mouth lexicon and/or an invalid vocabulary to obtain a hot spot related to the network product. Focus on words and / or word of mouth.
在本实施例中, 获取热点关注词的过程包括在名词列中去除出现频率 小于预设值(如有效词语中最高频的百分之一) 的词语; 去除单个词, 例 如人、 网等。  In this embodiment, the process of acquiring a hot topic attention word includes removing a word in the noun column whose appearance frequency is less than a preset value (such as one percent of the highest frequency in the effective word); removing a single word, such as a person, a network, or the like .
获取口碑词的过程包括在形容词列中去除出现频率小于预设值(如有 效词语中最高频的百分之一) 的词语; 在动词列中查找常用口碑词, 如坑 爹、 给力等; 将查找到的口碑词与预存的口碑词库进行比较筛选(在 excel 中完成), 获取与所述网络产品相关的口碑词。  The process of obtaining word-of-mouth words includes removing words in the adjective column that are less than the preset value (such as one percent of the highest frequency in the valid words); searching for common word-of-mouth words in the verb column, such as pits, force, etc.; The found word-of-mouth words are compared with the pre-stored word-of-mouth lexicon (completed in excel) to obtain word-of-mouth words related to the network products.
步骤 S104: 对所获取的所述相关度信息进行归类后进行统计和分析, 获取与所述产品相关的用户反馈信息。  Step S104: Perform statistic and analysis on the obtained relevance information, and obtain user feedback information related to the product.
优选的, 本步骤具体为: 对所获取的热点关注词和 /或口碑词进行归类 , 并对归类后的热点关注词及口碑词进行统计和分析, 获取与所述产品相关 的用户反馈信息。  Preferably, the step is specifically: classifying the obtained hot spot attention words and/or word-of-mouth words, and performing statistics and analysis on the classified hot spot attention words and word-of-mouth words, and obtaining user feedback related to the products. information.
具体包括将获取的热点关注词分为一类, 口碑词中的正向口碑词 (例 如, 好、 给力、 GOOD等 )分为一类, 口碑词中的负向口碑词 (例如, 差、 坑爹等)分为一类。  Specifically, it includes dividing the obtained hot spot words into one category, and the positive word-of-mouth words (for example, good, power, GOOD, etc.) in word-of-mouth words are divided into one category, negative word-of-mouth words in word-of-mouth words (for example, poor, potholes) Etc.) is divided into one category.
对分类后的热点关注词及正向口碑词和负向口碑词进行统计、分析 (包 括数量的统计以及数量之间的变化分析等, 如负向口碑词突然增多), 获取 所述用户反馈信息, 所述用户反馈信息包括定量分析报告和 /或定性分析报 告。 其中, 定量分析报告包括热点关注词及正向口碑词和负向口碑词的数 量特征、 数量之间的变化以及数量变化的原因等信息。 定性分析报告包括 用户对该产品的关注热点以及口碑评价等信息。 Statistics and analysis of classified hot words and positive word-of-mouth and negative word-of-mouth words (including statistics of quantity and analysis of changes between numbers, such as sudden increase in word-of-mouth), The user feedback information, the user feedback information includes a quantitative analysis report and/or a qualitative analysis report. Among them, the quantitative analysis report includes information such as hot topic words and positive word-of-mouth and negative word-of-mouth words, changes in quantity, and reasons for quantity changes. The qualitative analysis report includes information such as the user's hotspots on the product and word-of-mouth evaluation.
产品运营者根据该产品的定量分析报告和 /或定性分析报告可以充分了 解用户对所述产品的使用反馈情况, 便于对所述产品进行改进, 提高用户 使用的满意度。  Based on the quantitative analysis report and/or qualitative analysis report of the product, the product operator can fully understand the feedback of the user on the use of the product, and facilitate the improvement of the product to improve the satisfaction of the user.
作为本发明的另一优选实施例, 为了监测所述产品相关的同类及其相 似产品的现状, 及时了解行业动态, 为所述产品的发展和决策提供重要依 据, 所述方法还包括:  As another preferred embodiment of the present invention, in order to monitor the status quo of the products and their similar products, and to understand the industry dynamics in time, and provide important basis for the development and decision-making of the products, the method further includes:
从所述公共平台, 如微博和 /或论坛采集用户评论的与所述产品相关的 同类及其相似产品的信息。  Information from the user's peers and similar products associated with the product is collected from the public platform, such as Weibo and/or the forum.
在实际应用中, 可以预先存储与所述产品相关的同类及其相似产品的 信息, 包括同类及其相似产品的名称、 系列别名、 部分重点功能块的名称 等, 在从微博和 /或论坛采集用户评论的与产品相关的原始信息同时, 根据 所存储的同类及其相似产品的信息, 从微博和 /或论坛采集用户评论的与所 述产品相关的同类及其相似产品的信息。  In practical applications, the information of the similar products and similar products related to the products may be pre-stored, including the names of similar products and similar products, serial aliases, names of some key functional blocks, etc., from Weibo and/or forums. At the same time, the original information related to the product of the user's comment is collected, and information of the same type and similar products related to the product that the user reviews are collected from the microblog and/or forum according to the stored information of the same type and similar products.
本发明实施例通过从微博和 /或论坛采集用户评论的与产品相关的原始 信息, 并对所述原始信息进行过滤、分析后获取用户对产品的评论趋势(口 碑词 ) 以及用户对所述产品的关注热点 (热点关注词), 对所获取的热点关 注词和 /或口碑词进行归类和统计, 获取所述产品的定量分析报告和 /或定性 分析报告, 使得产品运营者可以根据所述定量分析报告和 /或定性分析报告 充分了解用户对所述产品的使用反馈情况, 便于对所述产品进行改进, 提 高用户使用的满意度。 而且由于是直接从微博和 /或论坛采集用户评论的与 产品相关的原始信息, 所述原始信息都是用户主动提供(例如发表微博, 在论坛留言等), 不需要邀请用户做调研, 从而有效的降低了成本。 同时信 息采集后自动化的处理过程, 有效的提高了效率和准确率。 另外, 由于同 时覆盖多个信息源(如腾讯微博、 新浪微博、 support平台等), 可有效避免 由于平台差异导致的偏向性、 定量数据缺乏导致准确度降低以及问卷投放 所需的高成本问题。 In the embodiment of the present invention, the original information related to the product of the user comment is collected from the microblog and/or the forum, and the original information is filtered and analyzed to obtain the user's comment trend on the product (word of mouth) and the user Product attention hotspots (hot focus words), classify and count the hot topic words and/or word-of-mouth words obtained, obtain quantitative analysis reports and/or qualitative analysis reports of the products, so that product operators can The quantitative analysis report and/or the qualitative analysis report fully understand the feedback of the user on the use of the product, facilitate the improvement of the product, and improve the satisfaction of the user. Moreover, since the original information related to the product is directly collected from the Weibo and/or the forum, the original information is provided by the user (for example, publishing Weibo, In the forum, etc.), there is no need to invite users to do research, which effectively reduces costs. At the same time, the automated processing after information collection effectively improves efficiency and accuracy. In addition, because it covers multiple information sources at the same time (such as Tencent Weibo, Sina Weibo, support platform, etc.), it can effectively avoid the bias caused by platform differences, the lack of quantitative data leads to lower accuracy and the high cost of questionnaires. problem.
实施例二:  Embodiment 2:
图 2示出了本发明实施例二提供的获取产品信息方法的具体流程, 本 实施例主要包括四部分: 信息采集、 信息过滤、 信息分析以及定量和定性 文本获取。  FIG. 2 shows a specific process of the method for obtaining product information provided by the second embodiment of the present invention. The embodiment mainly includes four parts: information collection, information filtering, information analysis, and quantitative and qualitative text acquisition.
如图 2所示,在信息采集过程中,主要通过 API和 /或网络爬虫从微博、 论坛等信息源(还可以包括内部网站的平台,如 support平台、 EXP平台等) 采集用户评论的与产品相关的原始信息, 并将采集的原始信息存储至数据 库。  As shown in Figure 2, in the process of information collection, user information is collected from information sources such as Weibo, forums, etc. (including platforms such as the support platform, EXP platform, etc.) through APIs and/or web crawlers. Raw information related to the product, and the collected raw information is stored in the database.
在信息过滤过程中, 首先需要去除杂质文本(即和所述产品完全无关 的文本信息),然后针对不同的平台进行去重和去除无效信息的处理。其中, 去重处理包括内容文本去重和内容文本以及用户名去重。 去除无效信息的 处理包括去除不相关文本信息、 官方发布的信息、 水军发布的信息以及广 告信息等。  In the information filtering process, it is first necessary to remove the impurity text (i.e., text information completely unrelated to the product), and then perform deduplication and removal of invalid information for different platforms. Among them, the deduplication processing includes content text deduplication and content text and user name deduplication. The process of removing invalid information includes removing irrelevant text information, officially released information, information published by the water army, and advertising information.
在信息分析过程中, 包括对过滤后的信息进行分类, 主要分为媒体新 闻类、 主动分享类以及建议评论类, 根据所述产品和 /或其竟争产品的通用 名词通过 ICTCLAS提供的分词接口调用 ICTCLAS内的分词算法对过滤后 的信息进行分词处理, 获得处理结果, 通过预存的分词库对所述处理结果 进行校正, 获得校正结果, 通过预存的口碑词库及无效词库对所述校正结 果进行筛选, 获取与所述产品相关的热点关注词及口碑词。 在信息分析过 程中, 还包括通过建议评论类和预存的建议词库对建议型微博进行筛选, 获取建议型文本。 In the information analysis process, the classification of the filtered information is mainly divided into a media news class, an active sharing class, and a suggestion comment class, and a word segmentation interface provided by ICTCLAS according to the general term of the product and/or its competitive product. Calling the word segmentation algorithm in ICTCLAS to perform word segmentation on the filtered information, obtaining the processing result, correcting the processing result through the pre-stored word segment, and obtaining the correction result, by using the pre-stored word-of-mouth lexicon and the invalid word bank The calibration results are filtered to obtain hot topic words and word-of-mouth words related to the product. In the process of information analysis, it also includes screening the suggested Weibo through the suggested comment class and the pre-stored suggestion vocabulary. Get suggested text.
在定性文本获取过程中, 对所获取的热点关注词及口碑词进行归类、 演绎、 分析、 统计等方法获取所述产品的定量和定性分析报告。  In the process of qualitative text acquisition, the methods of categorizing, deducting, analyzing and counting the obtained hot spot words and word-of-mouth words are used to obtain quantitative and qualitative analysis reports of the products.
实施例三:  Embodiment 3:
图 3 示出了本发明实施例三提供的获取产品信息装置的组成结构, 为 了便于说明, 仅示出了与本发明实施例相关的部分。  FIG. 3 shows a component structure of the device for acquiring product information provided in Embodiment 3 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown.
该获取产品信息装置可以是运行于各应用系统内的软件单元、 硬件单 元或者软硬件相结合的单元。  The device for acquiring product information may be a software unit, a hardware unit or a combination of hardware and software running in each application system.
该获取产品信息装置包括信息采集模块 31、 信息过滤模块 32、 信息分 析模块 33以及结果获取模块 34。 其中, 各单元的具体功能如下:  The device for acquiring product information includes an information collecting module 31, an information filtering module 32, an information analyzing module 33, and a result obtaining module 34. Among them, the specific functions of each unit are as follows:
信息采集模块 31, 用于从公共平台采集用户评论的与产品相关的原始 信息; 所述公共平台包括: 微博和 /或论坛;  The information collecting module 31 is configured to collect original information related to the product of the user comment from the public platform; the public platform includes: a microblog and/or a forum;
信息过滤模块 32,用于对所述信息采集模块采集的原始信息进行过滤; 信息分析模块 33, 用于对所述信息过滤模块过滤后的信息进行分析, 获取与所述产品相关的相关度信息; 所述相关度信息包括: 热点关注词和 / 或口碑词;  The information filtering module 32 is configured to filter the original information collected by the information collecting module. The information analyzing module 33 is configured to analyze the filtered information of the information filtering module, and obtain relevant information related to the product. The relevance information includes: a hot topic word and/or a word of mouth;
结果获取模块 34, 用于对所获取的所述相关度信息进行归类后进行统 计和分析, 获取所述产品相关的用户反馈信息。  The result obtaining module 34 is configured to classify the obtained relevance information, perform statistics and analysis, and obtain user feedback information related to the product.
进一步的, 所述装置还包括:  Further, the device further includes:
信息存储模块 35, 用于对采集的原始信息进行过滤前, 将采集的所述 原始信息按其内容特征进行分类后存储。  The information storage module 35 is configured to classify the collected original information according to its content characteristics and then store the collected original information.
所述信息分析模块 33包括:  The information analysis module 33 includes:
处理模块 331, 用于根据所述产品、 和 /或产品同类及其相似竟争产品 的通用名词, 对过滤后的信息进行分词处理, 获得处理结果;  The processing module 331, is configured to perform word segmentation processing on the filtered information according to the general terms of the product, and/or the product of the same type and similar competitive products, and obtain the processing result;
获取模块 332,用于从所述处理模块的处理结果中选取达到设定出现频 次的词语, 通过预存的词库对选取结果进行筛选, 获取所述相关度信息。 优选的, 为了监测所述产品相关竟争产品的现状, 及时了解行业动态, 为所述产品的发展和决策提供重要依据, 所述信息采集模块 31还用于从公 共平台采集用户评论的与所述产品相关的同类及其相似产品的信息。 An obtaining module 332, configured to select, from a processing result of the processing module, a set appearance frequency The second words are filtered by the pre-stored thesaurus to obtain the correlation information. Preferably, in order to monitor the status quo of the product-related competitive products, and timely understand the industry dynamics, and provide an important basis for the development and decision-making of the products, the information collecting module 31 is also used to collect user comments from the public platform. Information about the products and their similar products.
在本实施例中, 所述信息过滤模块进一步用于对采集的所述原始信息 进行过滤, 包括但不限于去重处理以及去除无效信息的处理。  In this embodiment, the information filtering module is further configured to filter the collected original information, including but not limited to a deduplication process and a process of removing invalid information.
本实施例提供的获取产品信息装置可以使用在前述对应的获取产品信 息方法, 详情参见上述获取产品信息方法实施例一和二的相关描述, 在此 不再赘述。  The device for obtaining the product information provided in this embodiment may use the corresponding method for acquiring the product information in the foregoing. For details, refer to the related descriptions of the first and second embodiments of the method for obtaining the product information, and details are not described herein again.
综上所述, 本发明实施例通过从公共平台诸如微博和 /或论坛采集用户 评论的与产品相关的原始信息, 并对所述原始信息进行过滤、 分析后获取 与所述产品相关的相关度信息, 诸如用户对产品的评论趋势(口碑词) 以 及用户对所述产品的关注热点 (热点关注词), 对所获取的热点关注词和 / 或口碑词进行归类,并对归类后的热点关注词和 /或口碑词进行统计和分析, 获取所述产品的定量分析报告和 /或定性分析报告, 使得产品运营者可以根 据所述定量分析报告和 /或定性分析报告充分了解用户对所述产品的使用反 馈情况, 便于对所述产品进行改进, 提高用户使用的满意度。 而且由于是 直接从微博和 /或论坛采集用户评论的与产品相关的原始信息, 所述原始信 息都是用户主动提供(例如发表微博, 在论坛留言等), 不需要邀请用户做 调研, 从而有效的降低了成本。 同时信息采集后自动化的处理过程, 有效 的提高了效率和准确率。 另外, 由于同时覆盖多个信息源 (如腾讯微博、 新浪微博、 support平台等), 可有效避免由于平台差异导致的偏向性、 定量 数据缺乏导致的准确率下降以及问卷投放所需的高成本问题。 而且为了监 测产品相关同类及其相似产品的现状, 及时了解行业动态, 为产品的发展 和决策提供重要依据, 在从微博和 /或论坛采集用户评论的与网络产品相关 的原始信息同时, 采集与所述产品相关的竟争产品的信息, 实用性强。 本发明实施例所述集成的模块如果以软件功能模块的形式实现并作为 独立的产品销售或使用时, 也可以存储在一个计算机可读取存储介质中。 基于这样的理解, 本发明实施例的技术方案本质上或者说对现有技术做出 贡献的部分可以以软件产品的形式体现出来, 该计算机软件产品存储在一 个存储介质中, 包括若干指令用以使得一台计算机设备(可以是个人计算 机、 服务器、 或者网络设备等)执行本发明各个实施例所述方法的全部或 部分。 而前述的存储介质包括: U盘、 移动硬盘、 只读存储器 (ROM, Read-Only Memory )、 随机存取存储器 ( RAM, Random Access Memory )、 磁碟或者光盘等各种可以存储程序代码的介质。 这样, 本发明实施例不限 制于任何特定的硬件和软件结合。 In summary, the embodiment of the present invention collects original information related to a product of a user's comment from a public platform such as a microblog and/or a forum, and filters and analyzes the original information to obtain a correlation related to the product. Degree information, such as the user's comment trend on the product (word of mouth) and the user's attention to the product (hot topic), classify the obtained hot topic words and / or word-of-mouth words, and classify The hot focus attention word and/or word-of-mouth word is used for statistics and analysis, and the quantitative analysis report and/or qualitative analysis report of the product is obtained, so that the product operator can fully understand the user pair according to the quantitative analysis report and/or the qualitative analysis report. The use feedback of the product facilitates the improvement of the product and improves the satisfaction of the user. Moreover, since the original information related to the product is directly collected from the Weibo and/or the forum, the original information is provided by the user (for example, posting Weibo, leaving a message in the forum, etc.), and the user is not required to conduct research. Thereby effectively reducing the cost. At the same time, the automated processing after information collection effectively improves efficiency and accuracy. In addition, because it covers multiple information sources at the same time (such as Tencent Weibo, Sina Weibo, support platform, etc.), it can effectively avoid the bias caused by platform differences, the accuracy rate caused by the lack of quantitative data, and the high required for questionnaires. Cost issue. In addition, in order to monitor the status of similar products and similar products, timely understand the industry dynamics, provide an important basis for product development and decision-making, and collect user reviews from Weibo and/or forums related to network products. At the same time, the collection of information on the competitive products related to the product is highly practical. The integrated modules described in the embodiments of the present invention may also be stored in a computer readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product. The computer software product is stored in a storage medium and includes a plurality of instructions. A computer device (which may be a personal computer, server, or network device, etc.) is implemented to perform all or part of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. . Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
相应的, 本发明实施例还提供一种计算机存储介质, 其中存储有计算 机程序, 该计算机程序用于执行本发明实施例的获取产品信息的方法。  Correspondingly, the embodiment of the present invention further provides a computer storage medium, wherein a computer program is stored, and the computer program is used to execute the method for obtaining product information in the embodiment of the present invention.
以上所述仅为本发明的较佳实施例而已, 并不用以限制本发明, 凡在 本发明的精神和原则之内所作的任何修改、 等同替换和改进等, 均应包含 在本发明的保护范围之内。  The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. Within the scope.

Claims

权利要求书 claims
1、 一种获取产品信息的方法, 其特征在于, 所述方法包括: 从公共平台采集用户评论的与产品相关的原始信息; 1. A method of obtaining product information, characterized in that the method includes: collecting original product-related information reviewed by users from a public platform;
对采集的所述原始信息进行过滤; Filter the collected original information;
对过滤后的信息进行分析, 获取与所述产品相关的相关度信息; 对所获取的所述相关度信息进行归类后进行统计和分析, 获取与所述 产品相关的用户反馈信息。 Analyze the filtered information to obtain correlation information related to the product; classify and perform statistics and analysis on the obtained correlation information to obtain user feedback information related to the product.
2、 如权利要求 1所述的方法, 其特征在于, 所述对采集的所述原始信 息进行过滤前, 该方法还包括: 2. The method of claim 1, wherein before filtering the collected original information, the method further includes:
将采集的所述原始信息按其内容特征进行分类后存储。 The collected original information is classified according to its content characteristics and then stored.
3、 如权利要求 1或 2所述的方法, 其特征在于, 所述对采集的所述原 始信息进行过滤包括: 对采集的所述原始信息进行去重处理以及去除无效 信息的处理。 3. The method according to claim 1 or 2, wherein filtering the collected original information includes: deduplicating and removing invalid information on the collected original information.
4、 如权利要求 1所述的方法, 其特征在于, 所述对过滤后的信息进行 分析, 获取与所述产品相关的相关度信息包括: 4. The method of claim 1, wherein analyzing the filtered information and obtaining relevance information related to the product includes:
根据所述产品、 和 /或产品同类及其相似产品的通用名词, 对过滤后的 信息进行分词处理, 获得处理结果。 According to the common nouns of the product, and/or similar products and similar products, the filtered information is subjected to word segmentation processing to obtain the processing results.
5、 如权利要求 4所述的方法, 其特征在于, 获得所述处理结果后, 所 述获取与所述产品相关的相关度信息还包括: 选取所述处理结果中达到设 定出现频次的词语, 通过预存的词库对选取结果进行歸选, 获取所述相关 度信息。 5. The method of claim 4, wherein after obtaining the processing result, obtaining the relevance information related to the product further includes: selecting words with a set frequency of occurrence in the processing result. , the selection results are sorted through the pre-stored vocabulary library, and the correlation information is obtained.
6、 如权利要求 1所述的方法, 其特征在于, 所述方法还包括: 从所述公共平台采集用户评论的与所述产品相关的同类及其相似产品 的信息。 6. The method of claim 1, further comprising: collecting information on similar and similar products related to the product reviewed by users from the public platform.
7、 一种获取产品信息的装置, 其特征在于, 所述装置包括: 信息采集模块, 用于从公共平台采集用户评论的与产品相关的原始信 息; 7. A device for obtaining product information, characterized in that the device includes: Information collection module, used to collect original product-related information from user reviews from public platforms;
信息过滤模块, 用于对所述信息采集模块采集的所述原始信息进行过 滤; An information filtering module, used to filter the original information collected by the information collection module;
信息分析模块, 用于对所述信息过滤模块过滤后的信息进行分析, 获 取与所述产品相关的相关度信息; An information analysis module, used to analyze the information filtered by the information filtering module and obtain relevance information related to the product;
结果获取模块, 用于对所获取的所述相关度信息进行归类后进行统计 和分析, 获取与所述产品相关的用户反馈信息。 The result acquisition module is used to classify the acquired correlation information and perform statistics and analysis, and obtain user feedback information related to the product.
8、 如权利要求 7所述的装置, 其特征在于, 所述装置包括: 8. The device according to claim 7, characterized in that, the device includes:
信息存储模块, 用于对采集的所述原始信息进行过滤前, 将采集的所 述原始信息按其内容特征进行分类后存储。 The information storage module is used to classify and store the collected original information according to its content characteristics before filtering the collected original information.
9、 如权利要求 7或 8所述的装置, 其特征在于, 所述信息过滤模块, 进一步用于对采集的所述原始信息进行去重处理以及去除无效信息的处 理。 9. The device according to claim 7 or 8, characterized in that the information filtering module is further used to deduplicate and remove invalid information on the collected original information.
10、 如权利要求 7所述的装置, 其特征在于, 所述信息分析模块包括: 处理模块, 用于根据所述产品、 和 /或产品同类及其相似产品的通用名 词, 对过滤后的信息进行分词处理, 获得处理结果。 10. The device of claim 7, wherein the information analysis module includes: a processing module, configured to filter the filtered information based on the product and/or common nouns of similar products and similar products. Perform word segmentation processing and obtain processing results.
11、 如权利要求 10所述的装置, 其特征在于, 所述信息分析模块还包 括: 11. The device according to claim 10, wherein the information analysis module further includes:
获取模块, 用于从所述处理模块的所述处理结果中选取达到设定出现 频次的词语, 通过预存的词库对选取结果进行筛选, 获取所述相关度信息。 The acquisition module is configured to select words with a set frequency of occurrence from the processing results of the processing module, filter the selection results through a pre-stored vocabulary library, and obtain the correlation information.
12、 如权利要求 7所述的装置, 其特征在于, 所述信息采集模块, 进 一步用于从所述公共平台采集用户评论的与所述产品相关的同类及其相似 产品的信息。 12. The device according to claim 7, characterized in that the information collection module is further configured to collect information about similar and similar products related to the product reviewed by users from the public platform.
13、 一种计算机存储介质, 其特征在于, 其中存储有计算机可执行 指令, 该计算机可执行指令用于执行所述权利要求 1至 6任一项所述的 获取产品信息的方法。 13. A computer storage medium, characterized in that computer executable data is stored therein Instructions, the computer-executable instructions are used to execute the method for obtaining product information according to any one of claims 1 to 6.
PCT/CN2013/077110 2012-06-11 2013-06-09 Method and device for obtaining product information and computer storage medium WO2013185601A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP13804547.1A EP2846271A4 (en) 2012-06-11 2013-06-09 Method and device for obtaining product information and computer storage medium
US14/404,905 US20150149383A1 (en) 2012-06-11 2013-06-09 Method and device for acquiring product information, and computer storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210190616.0 2012-06-11
CN201210190616.0A CN103488635A (en) 2012-06-11 2012-06-11 Method and device for acquiring product information

Publications (1)

Publication Number Publication Date
WO2013185601A1 true WO2013185601A1 (en) 2013-12-19

Family

ID=49757532

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/077110 WO2013185601A1 (en) 2012-06-11 2013-06-09 Method and device for obtaining product information and computer storage medium

Country Status (4)

Country Link
US (1) US20150149383A1 (en)
EP (1) EP2846271A4 (en)
CN (1) CN103488635A (en)
WO (1) WO2013185601A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537080A (en) * 2014-12-31 2015-04-22 北京畅游天下网络技术有限公司 Information recommendation method and system
CN108170841A (en) * 2018-01-16 2018-06-15 深圳市中易科技有限责任公司 A kind of mobile phone the analysis of public opinion decision-making technique based on information value
CN112200638A (en) * 2020-10-30 2021-01-08 福州大学 Water army comment detection system and method based on attention mechanism and bidirectional GRU network

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015188339A1 (en) * 2014-06-12 2015-12-17 Nokia Technologies Oy Method, apparatus, computer program product and system for reputation generation
CN105302844B (en) * 2014-08-01 2019-07-16 腾讯科技(深圳)有限公司 Internet surveillance method, apparatus and system
CN104281665B (en) * 2014-09-25 2018-05-25 北京百度网讯科技有限公司 A kind of method and apparatus of validity for definite comment
TW201619885A (en) * 2014-11-17 2016-06-01 財團法人資訊工業策進會 E-commerce reputation analysis system, method and computer readable storage medium thereof
CN105046522A (en) * 2015-06-29 2015-11-11 成都亿邻通科技有限公司 Method of improving group buying quality
CN105791091A (en) * 2016-03-02 2016-07-20 四川长虹电器股份有限公司 System and method for evaluating operation quality of official microblog and wechat public numbers
CN107229636B (en) * 2016-03-24 2021-08-13 腾讯科技(深圳)有限公司 Word classification method and device
CN106126499A (en) * 2016-06-22 2016-11-16 青岛海信传媒网络技术有限公司 User satisfaction and loyalty analyze method and device
CN106294779B (en) * 2016-08-12 2020-03-17 杭州一来二去广告有限公司 Personal brand label generation method and system
CN107016015A (en) * 2016-10-08 2017-08-04 阿里巴巴集团控股有限公司 Business datum method of summary and its system
CN106802925A (en) * 2016-12-20 2017-06-06 深圳爱拼信息科技有限公司 A kind of lawyer's intelligent Matching recommends method and server
WO2018205178A1 (en) * 2017-05-10 2018-11-15 曹修源 Text exploration and measurement system and method
GB2572541A (en) * 2018-03-27 2019-10-09 Innoplexus Ag System and method for identifying at least one association of entity
CN108717411B (en) * 2018-05-23 2022-04-08 安徽数据堂科技有限公司 Questionnaire design auxiliary system based on big data
CN108959223A (en) * 2018-06-11 2018-12-07 安徽引航科技有限公司 A kind of method that resume overview is write in intelligence
CN109377026A (en) * 2018-09-30 2019-02-22 法信公证云(厦门)科技有限公司 A kind of notary service quality control method and device
CN111209465B (en) * 2020-01-03 2023-11-07 北京秒针人工智能科技有限公司 Public opinion alarming method and device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102387207A (en) * 2011-10-21 2012-03-21 华为技术有限公司 Push method and system based on user feedback information
CN102446191A (en) * 2010-10-13 2012-05-09 北京创新方舟科技有限公司 Method for generating webpage content abstracts and equipment and system adopting same

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7516086B2 (en) * 2003-09-24 2009-04-07 Idearc Media Corp. Business rating placement heuristic
US8010480B2 (en) * 2005-09-30 2011-08-30 Google Inc. Selecting high quality text within identified reviews for display in review snippets
US20070156446A1 (en) * 2006-01-05 2007-07-05 Jolly Timothy S Internet-based marketing, productivity enhancement and referral system
US20080154698A1 (en) * 2006-12-20 2008-06-26 Microsoft Corporation Dyanmic product classification for opinion aggregation
US20090319436A1 (en) * 2008-06-18 2009-12-24 Delip Andra Method and system of opinion analysis and recommendations in social platform applications
CN101819573B (en) * 2009-09-15 2012-07-25 电子科技大学 Self-adaptive network public opinion identification method
US8356025B2 (en) * 2009-12-09 2013-01-15 International Business Machines Corporation Systems and methods for detecting sentiment-based topics
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
JP2013517563A (en) * 2010-01-15 2013-05-16 コンパス ラボズ,インク. User communication analysis system and method
US20110178885A1 (en) * 2010-01-18 2011-07-21 Wisper, Inc. System and Method for Universally Managing and Implementing Rating Systems and Methods of Use
US8433620B2 (en) * 2010-11-04 2013-04-30 Microsoft Corporation Application store tastemaker recommendations
US8185448B1 (en) * 2011-06-10 2012-05-22 Myslinski Lucas J Fact checking method and system
CN104239020A (en) * 2013-06-21 2014-12-24 Sap欧洲公司 Decision-making standard driven recommendation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102446191A (en) * 2010-10-13 2012-05-09 北京创新方舟科技有限公司 Method for generating webpage content abstracts and equipment and system adopting same
CN102387207A (en) * 2011-10-21 2012-03-21 华为技术有限公司 Push method and system based on user feedback information

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537080A (en) * 2014-12-31 2015-04-22 北京畅游天下网络技术有限公司 Information recommendation method and system
CN108170841A (en) * 2018-01-16 2018-06-15 深圳市中易科技有限责任公司 A kind of mobile phone the analysis of public opinion decision-making technique based on information value
CN112200638A (en) * 2020-10-30 2021-01-08 福州大学 Water army comment detection system and method based on attention mechanism and bidirectional GRU network

Also Published As

Publication number Publication date
US20150149383A1 (en) 2015-05-28
EP2846271A1 (en) 2015-03-11
CN103488635A (en) 2014-01-01
EP2846271A4 (en) 2015-12-23

Similar Documents

Publication Publication Date Title
WO2013185601A1 (en) Method and device for obtaining product information and computer storage medium
Abel et al. Semantics+ filtering+ search= twitcident. exploring information in social web streams
US8782046B2 (en) System and methods for predicting future trends of term taxonomies usage
US9946775B2 (en) System and methods thereof for detection of user demographic information
CN105718587A (en) Network content resource evaluation method and evaluation system
CN106980692A (en) A kind of influence power computational methods based on microblogging particular event
CN108776671A (en) A kind of network public sentiment monitoring system and method
Paltoglou Sentiment‐based event detection in T witter
Orlov et al. Using behavior and text analysis to detect propagandists and misinformers on twitter
Di Giovanni et al. VaccinEU: COVID-19 vaccine conversations on Twitter in French, German and Italian
Peng et al. Discovering the influence of sarcasm in social media responses
Boireau Determining political stances from twitter timelines: The belgian parliament case
Hwang et al. A nudge to credible information as a countermeasure to misinformation: Evidence from twitter
Agarwal et al. Accelerating automatic hate speech detection using parallelized ensemble learning models
Pierri et al. ITA-ELECTION-2022: A multi-platform dataset of social media conversations around the 2022 Italian general election
Sadman et al. Understanding the pandemic through mining covid news using natural language processing
KR101568800B1 (en) Real-time issue search word sorting method and system
Xue et al. Cross-media topic detection associated with hot search queries
Giachanou et al. Opinion retrieval in Twitter: is proximity effective?
Cherichi et al. Big data analysis for event detection in microblogs
Han et al. A real-time knowledge extracting system from social big data using distributed architecture
Lashari et al. Monitoring public opinion by measuring the sentiment of retweets on Twitter
Ikeda et al. Early detection method of service quality reduction based on linguistic and time series analysis of twitter
Mouronte-López et al. Patterns of human and bots behaviour on Twitter conversations about sustainability
Gangwar et al. Time-Based aggregation for Bi-term Topic Model to Analyze CoVID-19 Twitter Data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13804547

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14404905

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2013804547

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE