WO2013185601A1

WO2013185601A1 - Method and device for obtaining product information and computer storage medium

Info

Publication number: WO2013185601A1
Application number: PCT/CN2013/077110
Authority: WO
Inventors: 唐沐; 陈妍; 樊中一; 骆玘; 孙鹏; 牟伟成; 郭洪伟; 黄利贤; 吕虹; 胡炜; 苏楠; 张弘
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2012-06-11
Filing date: 2013-06-09
Publication date: 2013-12-19
Also published as: US20150149383A1; EP2846271A1; CN103488635A; EP2846271A4

Abstract

The present invention is applicable to the field of information processing technologies, and provides a method and a device for obtaining product information and a computer storage medium. The method comprises: collecting, from a common platform, original information related to a product and commented by a user; filtering the collected original information; analyzing the filtered information, and obtaining relevance information related to the product; and performing counting and analyzing after classifying the obtained relevance information, and obtaining user feedback information related to the product. By means of the present invention, the problem in the prior art that when the user feedback information related to the product is obtained, the cost is high, the efficiency is low, platform is biased and quantitative data cannot be obtained to improve the accuracy can be effectively solved.

Description

Method, device and computer storage for obtaining product information

The invention belongs to the information acquisition technology in the field of information processing technology, and in particular relates to a method, a device and a computer storage medium for acquiring product information. Background technique

At present, relevant user feedback information about network products, such as network product usage, existing problems, and suggestions, are mainly obtained through online questionnaire survey or forum collection.

However, the online questionnaire does not currently support the user's independent participation. Instead, it requires a large amount of human and material resources to actively invite users to participate, and adopts manual collection methods, especially on non-internal platforms, which requires a large amount of financial support and high cost. Moreover, it usually takes 3-5 days for the data to be placed and collected, and it is necessary to send a person to manually check the collected results for classification and classification, which takes a long time, is inefficient, and cannot guarantee the accuracy. In addition, the object to which the questionnaire is placed has a certain platform bias, that is, it is selective, and is aimed at the internal platform (dedicated platform), and is not random, and is performed on any common platform, which is not conducive to accuracy. improve.

The collection of the forum also requires a lot of time and effort to monitor and collect the feedback from the users on the major forum websites. The information fed back by the users can only be qualitatively classified and cannot be quantitatively analyzed.

In summary, the prior art has the problems of high cost, low efficiency, platform bias, and inability to obtain quantitative data to improve accuracy when obtaining relevant user feedback information of the network product. Summary of the invention

Embodiments of the present invention provide a method for acquiring product information, to solve the existing technology. High cost, inefficiency, platform bias, and the inability to obtain quantitative data to improve accuracy. The embodiment of the present invention is implemented by the method for obtaining product information, where the method includes:

Collecting product-related raw information from user reviews from a public platform;

Filtering the collected original information;

The filtered information is analyzed to obtain relevance information related to the product; the obtained relevance information is classified, statistically analyzed, and user feedback information related to the product is obtained.

An embodiment of the present invention provides an apparatus for acquiring product information, where the apparatus includes: an information collection module, configured to collect, from a public platform, original information related to a product reviewed by a user;

An information filtering module, configured to filter the original information collected by the information collection module;

An information analysis module, configured to analyze information filtered by the information filtering module, and obtain relevant information related to the product;

The result obtaining module is configured to perform statistic and analysis on the obtained relevance information, and obtain user feedback information related to the product.

Embodiments of the present invention provide a computer storage medium in which a computer program for executing the above method for acquiring product information is stored.

It can be seen from the above technical solution that the embodiment of the present invention collects original information related to the product of the user comment from any common platform, rather than a dedicated platform of the prior art, and filters and analyzes the original information. Obtaining relevance information related to the product, classifying, counting, and analyzing the obtained relevance information, and obtaining user feedback information ultimately related to the product, so that the product operator can perform feedback according to the user The information fully understands the user's use of the product, facilitates the improvement of the product, and improves the user's use. Intention. Moreover, since the original information related to the product that the user participates in the review is collected directly from any public platform, instead of the passive inviting user of the prior art, the original information is provided by the user actively (for example, Post microblogs, leave a message in the forum, etc.), do not need to invite users to do research, which effectively reduces costs. At the same time, it is different from the manual collection process of information collection in the prior art, and the automated processing process (including classification, statistics and analysis) after information collection can effectively improve information acquisition efficiency and accuracy. In addition, since the data is collected randomly based on any common platform, instead of the prior art, the collected data is selectively selected based on the dedicated platform, that is, the embodiment of the present invention can cover multiple information sources at the same time (such as Tencent Weibo, Sina Weibo, Support platform, etc., can effectively avoid the bias caused by platform differences, the accuracy rate caused by the lack of quantitative data and the high cost of questionnaires. DRAWINGS

1 is a flowchart of an implementation of a method for acquiring product information according to Embodiment 1 of the present invention; FIG. 2 is a specific flowchart of a method for obtaining product information according to Embodiment 2 of the present invention; A structural diagram of the information device. detailed description

In order to make the technical solutions and advantages of the present invention more comprehensible, the present invention will be further described in detail with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to explain the technical solutions of the present invention, the following description will be made by way of specific embodiments. Embodiment 1:

FIG. 1 is a flowchart showing an implementation process of a method for acquiring product information according to Embodiment 1 of the present invention. The process of the method is as follows:

Step S101: Collect original information related to the product of the user comment from the public platform.

It should be pointed out here that: The public platform refers to the internal platform or the dedicated platform. Platforms, such as common Weibo and/or major forums.

Preferably, the step is specifically: collecting raw information related to the product of the user comment from the microblog and/or the forum.

Specifically, the user's commentary and product (including the name of the product, the serial name of the product, or the name of the partial key function block) are collected from the microblog and/or forum through an application programming interface (API, Application Programming Interface) and/or a web crawler. Relevant original information, and the collected original information is stored in a database. In this embodiment, the original information is collected from the microblog and/or the forum, but is also collected from the support platform, the Exp platform, and the like.

It should be noted that, in the embodiment, when the original information is collected, the time interval of the collection (for example, every 1 hour) or multiple consecutive acquisitions may be preset.

Preferably, the embodiment further includes: storing the collected original information according to a preset rule, and the classifying according to the preset rule comprises classifying the content feature according to the original information, where the content feature of the original information includes However, it is not limited to media information, official information, advertising information, and default blacklist user comment information, as shown in Table 1:

Primary classification secondary classification characteristics treatment

Media, media, news, etc. Storage Information dissemination Official release Official account release, etc. Delete

Share Application Sharing, ##, etc. Storage Activity Advertising Advertising, Awards, etc. Delete Internal Water Army Blacklist User Delete Comments User Suggestions Contains word-of-mouth words, such as Alibaba

Comment

Irrelevant statement type fuzzy search leads to no search keyword

turn off Step S102: Filter the collected original information.

In this embodiment, the filtering the collected original information includes: performing deduplication processing on the collected original information and removing the invalid information.

For example, to deal with heavy processing:

For the Exp platform, the support platform: can be deduplicated based on text content and user name; Tencent Weibo, Sina Weibo: Threshold can be set, when the same or similar number of text content is greater than the threshold, it is considered as advertising or pure Share the class microblog to delete it.

The process of removing invalid information includes removing invalid information such as official releases, activity advertisements, internal water forces, and irrelevant statements as shown in Table 1.

Step S103: Perform analysis on the filtered information to obtain relevance information related to the product.

It should be noted here that the relevance information specifically includes: a hot spot attention word and/or a word-of-mouth word. The hot topic attention word refers to a user's hot spot of interest on the product; the word-of-mouth word refers to a user's comment trend on the product.

Preferably, the step is specifically: analyzing the filtered information to obtain hot topic words and/or word-of-mouth words related to the product.

In this embodiment, information retained after filtering, such as opinion comment type, media, sharing, and the like, is mainly analyzed. Follow-up mainly extracts word-of-mouth words from comments and comments.

In this embodiment, the obtaining the hot topic attention word and the word-of-mouth word related to the product specifically includes:

According to the generic terms of the product, and/or the product and similar products, the filtered information is processed by word segmentation to obtain the processing result.

In this embodiment, according to the generic nouns of the product, and/or the product of the same kind and similar products, the filtered information is processed by the Chinese lexical analysis system to obtain the processing result, for example, the Chinese lexical analysis system can be obtained. (ICTCLAS, Institute of Computing The Chinese word segmentation interface provided by the Technology Chinese Lexical Analysis System calls the word segmentation algorithm in ICTCLAS to perform word segmentation on the filtered information to obtain the processing result.

Further, the words in the processing result that meet the set appearance frequency (for example, 7 times, etc.) are selected, and the selected results are filtered through the pre-stored thesaurus to obtain hot topic attention words and/or word-of-mouth words related to the product.

Specifically, the processing result is corrected by using a pre-stored word segment to obtain a calibration result; the calibration result is filtered by a pre-stored word-of-mouth lexicon and/or an invalid vocabulary to obtain a hot spot related to the network product. Focus on words and / or word of mouth.

In this embodiment, the process of acquiring a hot topic attention word includes removing a word in the noun column whose appearance frequency is less than a preset value (such as one percent of the highest frequency in the effective word); removing a single word, such as a person, a network, or the like .

The process of obtaining word-of-mouth words includes removing words in the adjective column that are less than the preset value (such as one percent of the highest frequency in the valid words); searching for common word-of-mouth words in the verb column, such as pits, force, etc.; The found word-of-mouth words are compared with the pre-stored word-of-mouth lexicon (completed in excel) to obtain word-of-mouth words related to the network products.

Step S104: Perform statistic and analysis on the obtained relevance information, and obtain user feedback information related to the product.

Preferably, the step is specifically: classifying the obtained hot spot attention words and/or word-of-mouth words, and performing statistics and analysis on the classified hot spot attention words and word-of-mouth words, and obtaining user feedback related to the products. information.

Specifically, it includes dividing the obtained hot spot words into one category, and the positive word-of-mouth words (for example, good, power, GOOD, etc.) in word-of-mouth words are divided into one category, negative word-of-mouth words in word-of-mouth words (for example, poor, potholes) Etc.) is divided into one category.

Statistics and analysis of classified hot words and positive word-of-mouth and negative word-of-mouth words (including statistics of quantity and analysis of changes between numbers, such as sudden increase in word-of-mouth), The user feedback information, the user feedback information includes a quantitative analysis report and/or a qualitative analysis report. Among them, the quantitative analysis report includes information such as hot topic words and positive word-of-mouth and negative word-of-mouth words, changes in quantity, and reasons for quantity changes. The qualitative analysis report includes information such as the user's hotspots on the product and word-of-mouth evaluation.

Based on the quantitative analysis report and/or qualitative analysis report of the product, the product operator can fully understand the feedback of the user on the use of the product, and facilitate the improvement of the product to improve the satisfaction of the user.

As another preferred embodiment of the present invention, in order to monitor the status quo of the products and their similar products, and to understand the industry dynamics in time, and provide important basis for the development and decision-making of the products, the method further includes:

Information from the user's peers and similar products associated with the product is collected from the public platform, such as Weibo and/or the forum.

In practical applications, the information of the similar products and similar products related to the products may be pre-stored, including the names of similar products and similar products, serial aliases, names of some key functional blocks, etc., from Weibo and/or forums. At the same time, the original information related to the product of the user's comment is collected, and information of the same type and similar products related to the product that the user reviews are collected from the microblog and/or forum according to the stored information of the same type and similar products.

In the embodiment of the present invention, the original information related to the product of the user comment is collected from the microblog and/or the forum, and the original information is filtered and analyzed to obtain the user's comment trend on the product (word of mouth) and the user Product attention hotspots (hot focus words), classify and count the hot topic words and/or word-of-mouth words obtained, obtain quantitative analysis reports and/or qualitative analysis reports of the products, so that product operators can The quantitative analysis report and/or the qualitative analysis report fully understand the feedback of the user on the use of the product, facilitate the improvement of the product, and improve the satisfaction of the user. Moreover, since the original information related to the product is directly collected from the Weibo and/or the forum, the original information is provided by the user (for example, publishing Weibo, In the forum, etc.), there is no need to invite users to do research, which effectively reduces costs. At the same time, the automated processing after information collection effectively improves efficiency and accuracy. In addition, because it covers multiple information sources at the same time (such as Tencent Weibo, Sina Weibo, support platform, etc.), it can effectively avoid the bias caused by platform differences, the lack of quantitative data leads to lower accuracy and the high cost of questionnaires. problem.

Embodiment 2:

FIG. 2 shows a specific process of the method for obtaining product information provided by the second embodiment of the present invention. The embodiment mainly includes four parts: information collection, information filtering, information analysis, and quantitative and qualitative text acquisition.

As shown in Figure 2, in the process of information collection, user information is collected from information sources such as Weibo, forums, etc. (including platforms such as the support platform, EXP platform, etc.) through APIs and/or web crawlers. Raw information related to the product, and the collected raw information is stored in the database.

In the information filtering process, it is first necessary to remove the impurity text (i.e., text information completely unrelated to the product), and then perform deduplication and removal of invalid information for different platforms. Among them, the deduplication processing includes content text deduplication and content text and user name deduplication. The process of removing invalid information includes removing irrelevant text information, officially released information, information published by the water army, and advertising information.

In the information analysis process, the classification of the filtered information is mainly divided into a media news class, an active sharing class, and a suggestion comment class, and a word segmentation interface provided by ICTCLAS according to the general term of the product and/or its competitive product. Calling the word segmentation algorithm in ICTCLAS to perform word segmentation on the filtered information, obtaining the processing result, correcting the processing result through the pre-stored word segment, and obtaining the correction result, by using the pre-stored word-of-mouth lexicon and the invalid word bank The calibration results are filtered to obtain hot topic words and word-of-mouth words related to the product. In the process of information analysis, it also includes screening the suggested Weibo through the suggested comment class and the pre-stored suggestion vocabulary. Get suggested text.

In the process of qualitative text acquisition, the methods of categorizing, deducting, analyzing and counting the obtained hot spot words and word-of-mouth words are used to obtain quantitative and qualitative analysis reports of the products.

Embodiment 3:

FIG. 3 shows a component structure of the device for acquiring product information provided in Embodiment 3 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown.

The device for acquiring product information may be a software unit, a hardware unit or a combination of hardware and software running in each application system.

The device for acquiring product information includes an information collecting module 31, an information filtering module 32, an information analyzing module 33, and a result obtaining module 34. Among them, the specific functions of each unit are as follows:

The information collecting module 31 is configured to collect original information related to the product of the user comment from the public platform; the public platform includes: a microblog and/or a forum;

The information filtering module 32 is configured to filter the original information collected by the information collecting module. The information analyzing module 33 is configured to analyze the filtered information of the information filtering module, and obtain relevant information related to the product. The relevance information includes: a hot topic word and/or a word of mouth;

The result obtaining module 34 is configured to classify the obtained relevance information, perform statistics and analysis, and obtain user feedback information related to the product.

Further, the device further includes:

The information storage module 35 is configured to classify the collected original information according to its content characteristics and then store the collected original information.

The information analysis module 33 includes:

The processing module 331, is configured to perform word segmentation processing on the filtered information according to the general terms of the product, and/or the product of the same type and similar competitive products, and obtain the processing result;

An obtaining module 332, configured to select, from a processing result of the processing module, a set appearance frequency The second words are filtered by the pre-stored thesaurus to obtain the correlation information. Preferably, in order to monitor the status quo of the product-related competitive products, and timely understand the industry dynamics, and provide an important basis for the development and decision-making of the products, the information collecting module 31 is also used to collect user comments from the public platform. Information about the products and their similar products.

In this embodiment, the information filtering module is further configured to filter the collected original information, including but not limited to a deduplication process and a process of removing invalid information.

The device for obtaining the product information provided in this embodiment may use the corresponding method for acquiring the product information in the foregoing. For details, refer to the related descriptions of the first and second embodiments of the method for obtaining the product information, and details are not described herein again.

In summary, the embodiment of the present invention collects original information related to a product of a user's comment from a public platform such as a microblog and/or a forum, and filters and analyzes the original information to obtain a correlation related to the product. Degree information, such as the user's comment trend on the product (word of mouth) and the user's attention to the product (hot topic), classify the obtained hot topic words and / or word-of-mouth words, and classify The hot focus attention word and/or word-of-mouth word is used for statistics and analysis, and the quantitative analysis report and/or qualitative analysis report of the product is obtained, so that the product operator can fully understand the user pair according to the quantitative analysis report and/or the qualitative analysis report. The use feedback of the product facilitates the improvement of the product and improves the satisfaction of the user. Moreover, since the original information related to the product is directly collected from the Weibo and/or the forum, the original information is provided by the user (for example, posting Weibo, leaving a message in the forum, etc.), and the user is not required to conduct research. Thereby effectively reducing the cost. At the same time, the automated processing after information collection effectively improves efficiency and accuracy. In addition, because it covers multiple information sources at the same time (such as Tencent Weibo, Sina Weibo, support platform, etc.), it can effectively avoid the bias caused by platform differences, the accuracy rate caused by the lack of quantitative data, and the high required for questionnaires. Cost issue. In addition, in order to monitor the status of similar products and similar products, timely understand the industry dynamics, provide an important basis for product development and decision-making, and collect user reviews from Weibo and/or forums related to network products. At the same time, the collection of information on the competitive products related to the product is highly practical. The integrated modules described in the embodiments of the present invention may also be stored in a computer readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product. The computer software product is stored in a storage medium and includes a plurality of instructions. A computer device (which may be a personal computer, server, or network device, etc.) is implemented to perform all or part of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. . Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Correspondingly, the embodiment of the present invention further provides a computer storage medium, wherein a computer program is stored, and the computer program is used to execute the method for obtaining product information in the embodiment of the present invention.

The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. Within the scope.

Claims

claims

1. A method of obtaining product information, characterized in that the method includes: collecting original product-related information reviewed by users from a public platform;

Filter the collected original information;

Analyze the filtered information to obtain correlation information related to the product; classify and perform statistics and analysis on the obtained correlation information to obtain user feedback information related to the product.

2. The method of claim 1, wherein before filtering the collected original information, the method further includes:

The collected original information is classified according to its content characteristics and then stored.

3. The method according to claim 1 or 2, wherein filtering the collected original information includes: deduplicating and removing invalid information on the collected original information.

4. The method of claim 1, wherein analyzing the filtered information and obtaining relevance information related to the product includes:

According to the common nouns of the product, and/or similar products and similar products, the filtered information is subjected to word segmentation processing to obtain the processing results.

5. The method of claim 4, wherein after obtaining the processing result, obtaining the relevance information related to the product further includes: selecting words with a set frequency of occurrence in the processing result. , the selection results are sorted through the pre-stored vocabulary library, and the correlation information is obtained.

6. The method of claim 1, further comprising: collecting information on similar and similar products related to the product reviewed by users from the public platform.

7. A device for obtaining product information, characterized in that the device includes: Information collection module, used to collect original product-related information from user reviews from public platforms;

An information filtering module, used to filter the original information collected by the information collection module;

An information analysis module, used to analyze the information filtered by the information filtering module and obtain relevance information related to the product;

The result acquisition module is used to classify the acquired correlation information and perform statistics and analysis, and obtain user feedback information related to the product.

8. The device according to claim 7, characterized in that, the device includes:

The information storage module is used to classify and store the collected original information according to its content characteristics before filtering the collected original information.

9. The device according to claim 7 or 8, characterized in that the information filtering module is further used to deduplicate and remove invalid information on the collected original information.

10. The device of claim 7, wherein the information analysis module includes: a processing module, configured to filter the filtered information based on the product and/or common nouns of similar products and similar products. Perform word segmentation processing and obtain processing results.

11. The device according to claim 10, wherein the information analysis module further includes:

The acquisition module is configured to select words with a set frequency of occurrence from the processing results of the processing module, filter the selection results through a pre-stored vocabulary library, and obtain the correlation information.

12. The device according to claim 7, characterized in that the information collection module is further configured to collect information about similar and similar products related to the product reviewed by users from the public platform.

13. A computer storage medium, characterized in that computer executable data is stored therein Instructions, the computer-executable instructions are used to execute the method for obtaining product information according to any one of claims 1 to 6.