CN109145115B - Product public opinion discovery method, device, computer equipment and storage medium - Google Patents

Product public opinion discovery method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN109145115B
CN109145115B CN201811005075.3A CN201811005075A CN109145115B CN 109145115 B CN109145115 B CN 109145115B CN 201811005075 A CN201811005075 A CN 201811005075A CN 109145115 B CN109145115 B CN 109145115B
Authority
CN
China
Prior art keywords
public opinion
public
data
time period
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811005075.3A
Other languages
Chinese (zh)
Other versions
CN109145115A (en
Inventor
雷航
洪楷
刘伟
张学亮
王月瑶
陈乃华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Chengdu Co Ltd
Original Assignee
Tencent Technology Chengdu Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Chengdu Co Ltd filed Critical Tencent Technology Chengdu Co Ltd
Priority to CN201811005075.3A priority Critical patent/CN109145115B/en
Publication of CN109145115A publication Critical patent/CN109145115A/en
Application granted granted Critical
Publication of CN109145115B publication Critical patent/CN109145115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a product public opinion discovery method, a device, computer equipment and a storage medium, which extracts text data of each information record of a preset information source; converting text data into data vectors; classifying the information records according to the data vectors to obtain public sentiment categories to which the information records belong; and when the information records in the public sentiment categories in the preset time period meet the quantity condition, determining the discovery result corresponding to the public sentiment categories. Because the information records are not classified through the keywords of the text data, but are classified through the data vectors of the whole text data, the loss of semantic information can be avoided, and the classification accuracy is improved, so that the product public opinion discovery accuracy is improved.

Description

Product public opinion discovery method, device, computer equipment and storage medium
Technical Field
The application relates to the technical field of data mining, in particular to a product public opinion discovery method, a product public opinion discovery device, computer equipment and a storage medium.
Background
With the continuous development of the internet, people's daily life is more and more influenced by the internet, and it is more and more common to watch news, shop, communicate with each other, etc. on the internet. When a certain product used breaks down, a product user always transmits and discusses the product on the network in the first time, so that the monitoring of the public sentiment of a specific product becomes more and more important, and through the product public sentiment monitoring, a product provider can discover an emergent event as soon as possible so as to take a reasonable action and further avoid the public sentiment from continuously expanding. For example, when the product is a game, when a game player encounters a system failure, the game player can log in a corresponding forum or official website to issue related comments.
The traditional product public sentiment discovery method classifies the public sentiments based on keywords, but the accuracy rate is lower when the public sentiments are classified simply through the keywords. Therefore, the traditional product public opinion discovery method has the problem of low accuracy.
Disclosure of Invention
In view of the above, it is necessary to provide a product public opinion discovery method, apparatus, computer device and storage medium capable of improving accuracy.
A product public opinion discovery method, the method comprising:
extracting text data of each information record of a preset information source;
converting the text data into a data vector;
classifying the information records according to the data vectors to obtain public sentiment categories to which the information records belong;
and when the information records in the public opinion categories meet the quantity condition within a preset time period, determining the discovery results corresponding to the public opinion categories.
A product public opinion discovery apparatus, the apparatus comprising:
the text extraction module is used for extracting text data of each information record of a preset information source;
the vector conversion module is used for converting the text data into a data vector;
the public opinion classification module is used for classifying the information records according to the data vector to obtain public opinion categories to which the information records belong;
and the public opinion discovery module is used for determining a discovery result corresponding to the public opinion category when the information records in the public opinion category meet the quantity condition within a preset time period.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
extracting text data of each information record of a preset information source;
converting the text data into a data vector;
classifying the information records according to the data vectors to obtain public sentiment categories to which the information records belong;
and when the information records in the public opinion categories meet the quantity condition within a preset time period, determining the discovery results corresponding to the public opinion categories.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
extracting text data of each information record of a preset information source;
converting the text data into a data vector;
classifying the information records according to the data vectors to obtain public sentiment categories to which the information records belong;
and when the information records in the public opinion categories meet the quantity condition within a preset time period, determining the discovery results corresponding to the public opinion categories.
According to the product public opinion discovery method, the product public opinion discovery device, the computer equipment and the storage medium, text data of each information record of a preset information source is extracted; converting text data into data vectors; classifying the information records according to the data vectors to obtain public sentiment categories to which the information records belong; and when the information records in the public sentiment categories in the preset time period meet the quantity condition, determining the discovery result corresponding to the public sentiment categories. Because the information records are not classified through the keywords of the text data, but are classified through the data vectors of the whole text data, the loss of semantic information can be avoided, and the classification accuracy is improved, so that the product public opinion discovery accuracy is improved.
Drawings
FIG. 1 is a diagram of an application environment of a product public opinion discovery method in an embodiment;
FIG. 2 is a flow chart illustrating a method for product public opinion discovery in one embodiment;
FIG. 3 is a flowchart illustrating a product public opinion discovery method according to an embodiment;
FIG. 4 is an exemplary diagram of an alert notification for a product public opinion discovery method in one embodiment;
FIG. 5 is a trend chart of public sentiment quantity for the product public sentiment discovery method in an embodiment;
FIG. 6 is a diagram of a public opinion description of a product public opinion discovery method in an embodiment;
FIG. 7 is a diagram of another example of an alert notification for a product public opinion discovery method in an embodiment;
FIG. 8 is a detailed page of the public opinion description of the product public opinion discovery method in an embodiment;
FIG. 9 is a diagram illustrating a process of training a word embedding model, a process of training a neural network classification model, and a process of classifying product opinions according to an embodiment of the method for finding product opinions;
fig. 10 is a block diagram illustrating a product public opinion discovery apparatus according to an embodiment;
FIG. 11 is a diagram illustrating an internal structure of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The product public sentiment discovery method can be used for product public sentiment monitoring, such as login public sentiment, recharge public sentiment, Canton public sentiment, system failure public sentiment and the like of a game product, and can provide help for decision making of a product provider. The product and the opinion discovering party can be applied to the application environment as shown in figure 1. Wherein the terminal 102 communicates with the server 104 via a network. The product public opinion discovery method can be operated on the terminal 102, the server 104 corresponding to the preset information source can send the information record to the terminal 102 through the network, the terminal 102 receives each information record of the preset information source, and text data of each information record of the preset information source is extracted; converting the text data into a data vector; classifying the information records according to the data vectors to obtain public sentiment categories to which the information records belong; and when the information records in the public opinion categories meet the quantity condition within a preset time period, determining the discovery results corresponding to the public opinion categories. The terminal 102 may be, but not limited to, various servers, personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a method for product public opinion discovery is provided, which may be executed in the terminal 102 in fig. 1. The product public opinion discovery method comprises the following steps:
s202, extracting text data of each information record of the preset information source.
The preset information source can be an official forum, a WeChat circle, a Baidu post bar and other information sources corresponding to the product. The information record may be in the form of an article, post, comment, reply to a comment, and the like. The text data may comprise text content in the information record presented in text form, or the text data may comprise text content recorded in the information record; the text data may also include text content into which data presented in other forms (e.g., emoticons, pictures, etc.) is converted.
And S204, converting the text data into a data vector.
The data vector is a vector with semantics. The data vector expresses the same semantics as the text data.
And S206, classifying the information records according to the data vectors to obtain the public sentiment categories of the information records.
After the text data is converted into the data vector with the semantics, the data vector can be classified according to the semantics of the data vector, so that the information records corresponding to the data vector are classified, and the public opinion category to which the information records belong is obtained. The public sentiment category can include product price public sentiment, product quality public sentiment, etc., for example, in the game product, the public sentiment category can include login public sentiment, recharge public sentiment, katton public sentiment, system failure public sentiment, etc.
After the public opinion category of the information record is obtained, the public opinion category of the information record can be stored in a database.
And S208, when the information records in the public opinion categories in the preset time period meet the quantity condition, determining the discovery results corresponding to the public opinion categories.
The preset time period may be within 10 minutes, within 20 minutes, within 30 minutes, within 1 hour, within two hours, within 1 day, within 1 week, etc. prior to the current time point. The current time point may be a time point of the current time acquired in real time. The time unit of the current time point can be accurate to 1 minute, also can be accurate to 1 second, still can be accurate to 1 hour. The information record in the public sentiment category in the preset time period can be the information record which is newly added in the time period and belongs to the public sentiment category; and also can be information records belonging to the public opinion category in the time period.
The quantity condition may be a condition of the quantity of information records in the public opinion category within a preset time period; the quantity condition may also be a condition of a newly added quantity of information records in a public opinion category within a preset time period. The discovery result may be the public sentiment corresponding to the public sentiment category when the information records in the public sentiment category in the preset time period satisfy the quantity condition.
Based on the product public opinion discovery method of the embodiment, text data of each information record of a preset information source is extracted; converting text data into data vectors; classifying the information records according to the data vectors to obtain public sentiment categories to which the information records belong; and when the information records in the public sentiment categories in the preset time period meet the quantity condition, determining the discovery result corresponding to the public sentiment categories. Because the information records are not classified through the keywords of the text data, but are classified through the data vectors of the whole text data, the loss of semantic information can be avoided, and the classification accuracy is improved, so that the product public opinion discovery accuracy is improved.
In one embodiment, converting the text data into a data vector includes: preprocessing the text data to obtain a text to be word segmented; performing word segmentation processing on a text to be segmented to obtain semantic words of text data; and converting each semantic word into a word vector to obtain a data vector of the text data.
Preprocessing may include deleting punctuation, deleting web addresses, deleting text portions that do not contain actual semantics, such as numbers. The text data is preprocessed, so that the obtained text to be participled is a text part with actual semantics. The method comprises the steps of carrying out word segmentation processing on a text to be segmented with actual semantics to obtain semantic words of the text to be segmented, namely obtaining the semantic words of the text data. It is understood that the number of semantic words corresponding to text data corresponding to one information record is at least 1. And converting each semantic word into a word vector containing semantics to obtain a data vector of the text data, so that the text data is conveniently classified in a data vector form, namely the information records are conveniently classified.
According to the product public opinion discovery method based on the embodiment, due to the fact that the text data are preprocessed, the text to be segmented is obtained; performing word segmentation processing on a text to be segmented to obtain semantic words of text data; and converting each semantic word into a word vector to obtain a data vector of the text data. Therefore, the information records are classified through the data vector of the whole text data, the information records are not classified through the keywords of the text data, so that the loss of semantic information can be avoided, the classification accuracy is improved, and the product public opinion discovery accuracy is improved.
In one embodiment, converting each semantic word into a word vector to obtain a data vector of text data includes: and converting each semantic word into a word vector based on the word embedding model to obtain a data vector of the text data.
The word embedding model is used to convert semantic words in text form into word vectors in vector form. The word embedding model may be a fasttext model (fast text classifier model), doc2vec (an article vector model), a GloVe model (Global vectors for word representation), etc.
Through a mode based on a word embedding model, each semantic word is converted into a word vector, and a more accurate data vector of text data can be obtained, so that the accuracy of product public opinion discovery is further improved.
In one embodiment, the pre-processing includes at least one of deleting punctuation, deleting web addresses, and deleting numbers. Therefore, unnecessary text content without actual semantics can be deleted, the accuracy of word segmentation processing is improved, and resources are saved, so that the accuracy of product public opinion discovery is improved, and system resources are saved.
Further, preprocessing may also include deleting stop words. Therefore, resources can be further saved under the condition that the accuracy of product public opinion discovery is low in reduction. It will be appreciated that in embodiments where the preprocessing does not include deleting stop words, the accuracy of product discovery is higher.
In one embodiment, the data vector may be a vector having no less than a predetermined number of dimensions, such as 300. Therefore, the data vector comprises more semantic information, the classification result is more accurate, and the product public opinion discovery accuracy is further improved.
In one embodiment, classifying the information records according to the data vector to obtain a public opinion category to which the information records belong includes:
and classifying the information records according to the data vector based on a neural network classification model to obtain the public sentiment category to which the information records belong.
And classifying the input data vectors through a neural network classification model, namely classifying the information records corresponding to the data vectors to obtain the public opinion categories to which the information records belong. Because the input data vectors are classified through the neural network classification model, the obtained classification result is more accurate, and therefore, the accuracy of product public opinion discovery can be further improved.
It will be appreciated that in other embodiments, rather than using neural network classification models, other classifiers may be used to classify information records according to data vectors. Therefore, the information records are classified not by the keywords of the text data but by the data vectors of the whole text data, so that the loss of semantic information can be avoided, the classification accuracy is improved, and the product public opinion discovery accuracy is improved.
Further, based on the neural network classification model, classifying the information records according to the data vector to obtain public opinion categories to which the information records belong, including: inputting the data vector into the neural network classification model through an input layer of the neural network classification model; weighting the data vectors through a hidden layer of a neural network classification model to obtain a classification result; and outputting a classification result through an output layer of the neural network classification model, wherein the classification result corresponds to the public opinion category.
The neural network classification model comprises an input layer, an output layer and a hidden layer between the input layer and the output layer. Inputting the data vector into a neural network classification model through the input layer; weighting the data vectors through the hidden layer to obtain a classification result; and outputting a classification result through the output layer, wherein the classification result can be a serial number corresponding to the public sentiment category, so that the public sentiment category to which the information record belongs is obtained. Because the input data vectors are classified through the neural network classification model, the obtained classification result is more accurate, and therefore, the accuracy of product public opinion discovery can be further improved.
In one embodiment, the weight of the hidden layer in the neural network classification model is determined by a target classification result and a training classification result obtained by training a training sample; the training sample comprises a target classification result and a training data vector, the data structure of the training data vector is the same as that of the data vector, and the training data vector corresponds to the target classification result.
The target classification result is a classification result expected after the training data vector in the training sample is input into the neural network classification model. The training classification result is a classification result actually obtained after the training data vector is input into the neural network classification model in training. The training data vectors correspond to the target classification results, one training data vector corresponding to one target classification result in one training sample. Meanwhile, the training data vectors are also input into the neural network classification model, so that the data structures of the training data vectors and the data vectors are the same.
Further, determining the weight of the hidden layer in the neural network classification model according to the target classification result and a training classification result obtained by training the training sample may include: and when the loss function value obtained according to the target classification result and the training classification result reaches a preset optimization condition, determining the weight of a hidden layer in the neural network classification model. The preset optimization condition may be that the loss function value reaches a preset value, or that the change of the loss function value is smaller than a preset value within a preset time period. Thus, an optimal neural network classification model is obtained, and the neural network classification model is determined.
In one embodiment, the neural network classification model is a multi-layer neural network classification model. The multi-layer neural network classification model may be a classification model of a network structure composed of a plurality of perceptrons. For example, the multi-layer neural network classification model is a three-layer neural network classification model. As another example, the number of nodes at each level may be 300, 100, and 7, respectively. Therefore, the classification accuracy can be further improved, and the product public opinion classification accuracy is further improved.
In one embodiment, the public sentiment categories include login public sentiment, recharge public sentiment, katton public sentiment and system failure public sentiment. The login public opinion refers to information record about account login; the recharging public opinion refers to information record about account recharging; the Kadun public opinion refers to information record about system fluency in the use process of a product; the system fault public opinion refers to information record about system faults in the product using process. Therefore, the product public opinion discovery method is particularly suitable for public opinion monitoring of network products, such as game products. The method and the system can find the service problems for the product project group, so that the project group can be processed early, the influence of faults on users is reduced, and the viscosity of the users is improved.
In one embodiment, when information records in public sentiment categories in a preset time period satisfy a quantity condition, determining a discovery result corresponding to the public sentiment categories comprises: and when the information record in the public sentiment category in the preset time period is greater than or equal to the preset threshold value, determining the discovery result corresponding to the public sentiment category.
In this embodiment, if the information records in the public sentiment category in the preset time period are the information records belonging to the public sentiment category in the time period, the quantity condition is that the newly added quantity of the information records in the public sentiment category in the preset time period is greater than or equal to the preset threshold value. If the information records in the public opinion category in the preset time period can be the information records which are newly added in the time period and belong to the public opinion category, the quantity condition is that the quantity of the information records in the public opinion category in the preset time period is greater than or equal to a preset threshold value. The discovery result may be a public opinion corresponding to the public opinion category when the new number of information records belonging to the public opinion category in the preset time period is greater than or equal to a preset threshold.
Based on the product public opinion discovery method of the embodiment, when the number of newly added information records in the public opinion category in the preset time period is greater than or equal to the preset threshold, the discovery result corresponding to the public opinion category is determined. Therefore, the interference of historical information records can be avoided, and the accuracy of product public opinion discovery is further improved.
In one embodiment, the determining process of the preset threshold includes: for each time point in the first historical time period, determining the number of information records of public opinion categories in the second historical time period of each time point to obtain the number of first historical public opinions; obtaining a second historical public opinion quantity at a time point according to the maximum value of the first historical public opinion quantity in the third historical time period; and determining a preset threshold according to the average value and the standard deviation of the second historical public sentiment quantity of each time point in the first historical time period.
The time length of the first historical time period is greater than that of the second historical time period and is also greater than that of the third historical time period. The time length of the third historical time period is greater than the time length of the second historical time period.
In one embodiment, as shown in FIG. 3, the first historical period of time may be the first 7 days of the current time point. The second historical time period may include k minutes before the current time point, and k may take values of 10 minutes, 20 minutes, and 30 minutes. Optionally, the second historical time period may include at least one time period. Thus, the finding result can be determined in about 10 minutes at minimum. The third history time period may include a time two hours before and after the current time point. If with XiThe number of the first historical public sentiments of the time point i is expressed by MiAnd the maximum value of the first historical public sentiment quantity in the third historical time period, namely the second historical public sentiment quantity, and the unit time of the time point is 1 minute. The second historical public opinion number Mi=max(Xi-120,Xi-119,...,Xi+119,Xi+120) Wherein X isi-120Representing a second quantity of historical public sentiment, X, 120 minutes before time point ii-119Representing a second quantity of historical public sentiment, X, 119 minutes before the time point ii+119Represents the second historical public opinion number, X, 119 minutes after the time point ii+120Represents the second quantity of historical public sentiments 120 minutes after the time point i. The average value of the second historical public sentiment quantity of each time point in the first historical time period can be represented as avg (M)i). Standard deviation std (M) of second historical public sentiment quantity at each time point in first historical time periodi)。
Referring to fig. 3, in one embodiment, the determining the predetermined threshold according to the average and standard deviation of the second historical public sentiment quantity at each time point in the first historical time period comprises: and determining a preset threshold value according to a preset scaling, a preset fixed scaling value and the average value and standard deviation of the second historical public sentiment quantity of each time point in the first historical time period. For example, the preset threshold may be BiExpressed, the determination formula may be:
Bi=m*avg(Mi)+3*std(Mi)+n
where m represents a preset scaling and n represents a preset fixed scaling value. The preset scaling and the preset fixed scaling value may be determined empirically, for example, may be adjusted according to the second historical public opinion quantity.
Based on the product public opinion discovery method of the embodiment, the number of the public opinion category information records in the second historical time period of each time point is determined for each time point in the first historical time period to obtain the number of the first historical public opinions; then, obtaining a second historical public sentiment quantity of the time point according to the maximum value of the first historical public sentiment quantity in the third historical time period; and finally, determining a preset threshold value according to the average value and the standard deviation of the second historical public sentiment quantity of each time point in the first historical time period. The mode of confirming the preset threshold value can obtain more reasonable preset threshold value, thereby improving the accuracy of product public opinion discovery.
In one embodiment, when the information record in the public opinion category in the preset time period is greater than or equal to the preset threshold, determining the discovery result corresponding to the public opinion category, and then further comprising: and sending out an alarm notice according to the discovery result.
The alarm notification can be sent out in a pop-up window mode, and can also be sent out in an information display mode with different fonts. The alarm notification can also be sent by sending an instant message or a mobile phone short message; it may also be sent via a WeChat public number. The alert notification may issue an alert in the form of a sound or a light. For example, the alert notification in one example may be: "8 line-cutting abnormal public opinions of 12:00XX product in the last 20 minutes exceed the threshold value of 7". In another example, the alert notification may be as shown in fig. 4, where the alert notification is sent out in the form of WeChat information, and the alert content includes: time of occurrence, impact on business (i.e., product), impact on situation and possible cause, etc. Therefore, the product provider can find the problem of the product conveniently and process the problem as soon as possible, and the influence of the problem on the user is reduced.
In one embodiment, when the information records in the public opinion category meet the quantity condition within the preset time period, the method determines the discovery result corresponding to the public opinion category, and then further includes: and displaying the statistical result of the public opinion categories.
The statistical result comprises a public opinion quantity trend graph based on each public opinion category, and the public opinion quantity trend graph counts the quantity or the increasing quantity of information records belonging to the public opinion category at different time points. The statistical results may also include detailed information of the information records in each public opinion category. Therefore, a statistical result is provided for a product provider, the provider can conveniently check the statistical result, and the statistical result can be used as a decision basis.
In one embodiment, the trend graph of public sentiment quantity may be as shown in fig. 5, wherein the abscissa represents time and the ordinate represents public sentiment quantity. The public opinion quantity trend graph represents the statistical result of the sum of the abnormal public opinions of all the public opinion categories on a time axis.
In one embodiment, the detailed information of the information record of the statistical result of the public opinion category of a game product may be as shown in fig. 6, and includes: product name, problem type, game account number, mobile phone model, login mode, mobile phone system, system type, abnormal time, problem description, related screenshot and other information.
In one embodiment, the alert notification may be sent via a WeChat public number. The alert notification page may be as shown in FIG. 7, including location content and live data. The positioning content comprises the occurrence time, the product name, namely the influence service, and the discovery result, namely the influence condition. The field data comprises statistical data of public sentiment categories, such as public sentiment analysis trend graphs. Further, the public opinion description detail page shown in fig. 8 can be entered through the alarm notification page, and the public opinion category can be viewed through the public opinion description detail page to find the statistical result of each information record in the result, wherein the statistical result can include the public opinion recording time and the public opinion description.
In one embodiment, the product public opinion discovery method includes: extracting text data of each information record of a preset information source; preprocessing the text data to obtain a text to be word segmented, wherein the preprocessing comprises punctuation deletion, website deletion and digit deletion; performing word segmentation processing on a text to be segmented to obtain semantic words of text data; converting each semantic word into a word vector based on the word embedding model to obtain a data vector of the text data; classifying the information records according to the data vector based on a neural network classification model to obtain public opinion categories to which the information records belong; when the information record in the public opinion category in the preset time period is greater than or equal to a preset threshold value, determining a discovery result corresponding to the public opinion category; and sending out an alarm notice according to the discovery result.
The word embedding model can be a fasttext model, a training process of the word embedding model, a training process of a neural network classification model, and a process pair of product public opinion classification, as shown in fig. 9.
In the training process of the word embedding model, the processed object may be original text, and the original text may be text content in a text form. Preprocessing the original text to obtain a text to be word segmented; and performing word segmentation processing on the text to be segmented to obtain semantic words. And taking a sample pair consisting of the semantic words and word vectors corresponding to the semantic words as a training sample, and training the word embedding model.
In the training process of the neural network classification model, the processed object may be training data, and the data structure of the training data may correspond to the data structure of the text data of the information record. After the training data are preprocessed and word segmentation processed, vector conversion is carried out through a trained word embedding model, and a training data vector is obtained. And taking a sample pair consisting of the training data vector and the target classification result as a training sample to train the neural network classification model.
In the process of product public opinion classification, text data of a new information record is used as a processing object, and after the text data is subjected to preprocessing, word segmentation processing and vector conversion, a data vector is obtained, the obtained data vector is input into a trained neural network classification model for classification and prediction, and a classification result, namely a public opinion category to which the information record belongs, is obtained.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 10, there is provided a product public opinion discovery apparatus operating in a terminal 102 in fig. 1, including:
the text extraction module 1002 is configured to extract text data of each information record of a preset information source;
a vector conversion module 1004, configured to convert the text data into a data vector;
a public opinion classification module 1006, configured to classify the information record according to the data vector to obtain a public opinion category to which the information record belongs;
a public opinion discovery module 1008, configured to determine a discovery result corresponding to the public opinion category when the information records in the public opinion category meet a quantity condition within a preset time period.
Based on the product public opinion discovery device of the embodiment, extracting text data of each information record of a preset information source; converting text data into data vectors; classifying the information records according to the data vectors to obtain public sentiment categories to which the information records belong; and when the information records in the public sentiment categories in the preset time period meet the quantity condition, determining the discovery result corresponding to the public sentiment categories. Because the information records are not classified through the keywords of the text data, but are classified through the data vectors of the whole text data, the loss of semantic information can be avoided, and the classification accuracy is improved, so that the product public opinion discovery accuracy is improved.
In one embodiment, the apparatus further includes a preprocessing module and a word segmentation module.
The preprocessing module is used for preprocessing the text data to obtain a text to be segmented;
the word segmentation module is used for carrying out word segmentation processing on the text to be word segmented to obtain semantic words of the text data;
the vector conversion module 1004 is configured to convert each semantic word into a word vector, so as to obtain a data vector of the text data.
In one embodiment, the vector conversion module 1004 is configured to convert each semantic word into a word vector based on a word embedding model, so as to obtain a data vector of the text data.
In one embodiment, the pre-processing includes at least one of deleting punctuation, deleting web addresses, and deleting numbers.
In one embodiment, the public opinion classification module 1006 is configured to classify the information record according to the data vector based on a neural network classification model to obtain a public opinion category to which the information record belongs.
In one embodiment, the public opinion classification module 1006 is configured to input the data vector into the neural network classification model through an input layer of the neural network classification model; weighting the data vectors through a hidden layer of the neural network classification model to obtain a classification result; and outputting the classification result through an output layer of the neural network classification model, wherein the classification result corresponds to the public sentiment category.
In one embodiment, the weight of the hidden layer in the neural network classification model is determined by a target classification result and a training classification result obtained by training a training sample; the training sample comprises the target classification result and a training data vector, the training data vector has the same data structure as the data vector, and the training data vector corresponds to the target classification result.
In one embodiment, the public opinion categories include login public opinion, recharge public opinion, katton public opinion, and system failure public opinion.
In one embodiment, the public opinion discovery module 1008 is configured to determine a discovery result corresponding to the public opinion category when the information record in the public opinion category within a preset time period is greater than or equal to a preset threshold.
In one embodiment, the apparatus further includes a threshold determination module. The threshold determination module includes:
the first quantity determining unit is used for determining the quantity of the information records of the public opinion categories in a second historical time period of each time point for each time point in a first historical time period to obtain a first historical public opinion quantity;
the second quantity determining unit is used for obtaining a second historical public opinion quantity of the time point according to the maximum value of the first historical public opinion quantity in a third historical time period;
and the preset threshold value determining unit is used for determining the preset threshold value according to the average value and the standard deviation of the second historical public opinion quantity of each time point in the first historical time period.
In one embodiment, the apparatus further includes:
and the alarm sending module is used for sending an alarm notice according to the discovery result.
In one embodiment, the apparatus further includes:
and the result display module is used for displaying the statistical result of the public opinion categories.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a product public opinion discovery method.
Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the product public opinion discovery method when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the product public opinion discovery method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. A product public opinion discovery method, the method comprising:
extracting text data of each information record of the game product corresponding to a preset information source;
converting semantic words of the text data into data vectors; the semantics of the data vector are the same as the semantics of the corresponding text data;
classifying the information records according to the data vectors based on a neural network classification model to obtain public opinion categories to which the information records belong; the public sentiment categories comprise login public sentiment, recharge public sentiment, Kadun public sentiment and system failure public sentiment;
when the newly increased number of the information records in the public opinion categories in a preset time period is greater than or equal to a preset threshold value, determining product public opinion discovery results corresponding to the public opinion categories;
the process for determining the preset threshold value comprises the following steps:
for each time point in a first historical time period, determining the number of the information records of the public opinion categories in a second historical time period of each time point to obtain a first historical public opinion number;
taking the maximum value of the first historical public sentiment quantity in the third historical time period of each time point as the second historical public sentiment quantity of the time point; the time length of the third historical time period is greater than the time length of the second historical time period;
and determining the preset threshold value according to a preset scaling ratio, a preset fixed scaling value, and the average value and standard deviation of the second historical public opinion quantity of each time point in the first historical time period.
2. The method of claim 1, wherein converting semantic words of the text data into data vectors comprises:
preprocessing the text data to obtain a text to be word segmented;
performing word segmentation processing on the text to be word segmented to obtain semantic words of the text data;
and converting each semantic word into a word vector to obtain a data vector of the text data.
3. The method of claim 2, wherein converting each semantic word into a word vector to obtain a data vector of the text data comprises:
and converting each semantic word into a word vector based on a word embedding model to obtain a data vector of the text data.
4. The method of claim 2, wherein preprocessing comprises at least one of deleting punctuation, deleting web addresses, and deleting numbers.
5. The method of claim 1, wherein the classifying the information record according to the data vector based on the neural network classification model to obtain a public opinion category to which the information record belongs comprises:
inputting the data vector into the neural network classification model through an input layer of the neural network classification model;
weighting the data vectors through a hidden layer of the neural network classification model to obtain a classification result;
and outputting the classification result through an output layer of the neural network classification model, wherein the classification result corresponds to the public sentiment category.
6. The method according to claim 1, wherein the weight of the hidden layer in the neural network classification model is determined by a target classification result and a training classification result obtained by training a training sample; the training sample comprises the target classification result and a training data vector, the training data vector has the same data structure as the data vector, and the training data vector corresponds to the target classification result.
7. The method of claim 6, wherein the step of determining weights of hidden layers in the neural network classification model according to the target classification result and the training classification result obtained by training the training samples comprises:
determining the weight of a hidden layer in the neural network classification model when a loss function value obtained according to a target classification result and a training classification result reaches a preset optimization condition; the training classification result is obtained by training a training sample.
8. The method of claim 1, wherein the neural network classification model is a multi-layer neural network classification model.
9. The method of claim 1, wherein the login public opinion refers to information record about account login; the recharging public opinion refers to information record about account recharging; the Kadun public opinion refers to information record about system fluency in the product using process; the system fault public opinion refers to information record about system faults in the product using process.
10. The method as claimed in claim 1, wherein when the number of new information records in the public opinion category in a preset time period is greater than or equal to a preset threshold, determining a product public opinion discovery result corresponding to the public opinion category, then further comprising:
and sending an alarm notice according to the product public opinion discovery result.
11. The method of claim 10, wherein the form of the alert notification is sent comprises: the method comprises the steps of popping up a window, displaying information in different fonts, and sending instant messages and mobile phone short messages.
12. The method as claimed in any one of claims 1 to 11, wherein when the number of new information records in the public opinion category within a preset time period is greater than or equal to a preset threshold, determining a product public opinion discovery result corresponding to the public opinion category, then further comprising:
and displaying the statistical result of the public opinion category.
13. A product public opinion discovery apparatus, the apparatus comprising:
the text extraction module is used for extracting text data of each information record of the game product corresponding to the preset information source;
the vector conversion module is used for converting semantic words of the text data into data vectors; the semantics of the data vector are the same as the semantics of the corresponding text data;
the public opinion classification module is used for classifying the information records according to the data vector based on a neural network classification model to obtain public opinion categories to which the information records belong; the public sentiment categories comprise login public sentiment, recharge public sentiment, Kadun public sentiment and system failure public sentiment;
the public opinion discovery module is used for determining a product public opinion discovery result corresponding to the public opinion category when the newly increased number of the information records in the public opinion category in a preset time period is greater than or equal to a preset threshold value;
a threshold determination module comprising:
the first quantity determining unit is used for determining the quantity of the information records of the public opinion categories in a second historical time period of each time point for each time point in a first historical time period to obtain a first historical public opinion quantity;
a second quantity determination unit, configured to use a maximum value of the first historical public sentiments quantity in a third historical time period of each time point as a second historical public sentiment quantity of the time point; the time length of the third historical time period is greater than the time length of the second historical time period;
and the preset threshold value determining unit is used for determining the preset threshold value according to a preset scaling, a preset fixed scaling value and the average value and standard deviation of the second historical public opinion quantity of each time point in the first historical time period.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any of claims 1 to 12.
15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 12.
CN201811005075.3A 2018-08-30 2018-08-30 Product public opinion discovery method, device, computer equipment and storage medium Active CN109145115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811005075.3A CN109145115B (en) 2018-08-30 2018-08-30 Product public opinion discovery method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811005075.3A CN109145115B (en) 2018-08-30 2018-08-30 Product public opinion discovery method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109145115A CN109145115A (en) 2019-01-04
CN109145115B true CN109145115B (en) 2020-11-24

Family

ID=64829497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811005075.3A Active CN109145115B (en) 2018-08-30 2018-08-30 Product public opinion discovery method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109145115B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488666A (en) * 2020-12-15 2021-03-12 北京易兴元石化科技有限公司 Network-based petroleum comprehensive data processing method and device and storage medium
CN114048317A (en) * 2021-11-19 2022-02-15 盐城金堤科技有限公司 Public opinion text classification method and device, electronic equipment and computer storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550269A (en) * 2015-12-10 2016-05-04 复旦大学 Product comment analyzing method and system with learning supervising function
CN107977397A (en) * 2017-09-08 2018-05-01 华瑞新智科技(北京)有限公司 Internet user's notice index calculation method and system based on deep learning
CN108334605A (en) * 2018-02-01 2018-07-27 腾讯科技(深圳)有限公司 File classification method, device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160170966A1 (en) * 2014-12-10 2016-06-16 Brian Kolo Methods and systems for automated language identification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550269A (en) * 2015-12-10 2016-05-04 复旦大学 Product comment analyzing method and system with learning supervising function
CN107977397A (en) * 2017-09-08 2018-05-01 华瑞新智科技(北京)有限公司 Internet user's notice index calculation method and system based on deep learning
CN108334605A (en) * 2018-02-01 2018-07-27 腾讯科技(深圳)有限公司 File classification method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN109145115A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN108563722B (en) Industry classification method, system, computer device and storage medium for text information
CN109272396B (en) Customer risk early warning method, device, computer equipment and medium
WO2020037942A1 (en) Risk prediction processing method and apparatus, computer device and medium
CN108536800B (en) Text classification method, system, computer device and storage medium
CN107808011B (en) Information classification extraction method and device, computer equipment and storage medium
US10599774B1 (en) Evaluating content items based upon semantic similarity of text
CN109376237B (en) Client stability prediction method, device, computer equipment and storage medium
US11348012B2 (en) System and method for forming predictions using event-based sentiment analysis
CN108520041B (en) Industry classification method and system of text, computer equipment and storage medium
US20190182195A1 (en) Event-Based Scoring of Communication Messages
CN112995414B (en) Behavior quality inspection method, device, equipment and storage medium based on voice call
WO2019227629A1 (en) Text information generation method and apparatus, computer device and storage medium
CN109145115B (en) Product public opinion discovery method, device, computer equipment and storage medium
CN112926308A (en) Method, apparatus, device, storage medium and program product for matching text
CN113806660B (en) Data evaluation method, training device, electronic equipment and storage medium
CN110705282A (en) Keyword extraction method and device, storage medium and electronic equipment
CN115603955B (en) Abnormal access object identification method, device, equipment and medium
CN111464687A (en) Strange call request processing method and device
CN115619245A (en) Portrait construction and classification method and system based on data dimension reduction method
US10671654B2 (en) Estimating probability of spreading information by users on micro-weblogs
CN114722954A (en) Content exception handling method and device for evaluation information
US20210073247A1 (en) System and method for machine learning architecture for interdependence detection
CN114239602A (en) Session method, apparatus and computer program product
CN113792163B (en) Multimedia recommendation method and device, electronic equipment and storage medium
CN113722496B (en) Triple extraction method and device, readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant