CN112650906A - Internet user comment analysis method and system based on big data text analysis - Google Patents

Internet user comment analysis method and system based on big data text analysis Download PDF

Info

Publication number
CN112650906A
CN112650906A CN202011535936.6A CN202011535936A CN112650906A CN 112650906 A CN112650906 A CN 112650906A CN 202011535936 A CN202011535936 A CN 202011535936A CN 112650906 A CN112650906 A CN 112650906A
Authority
CN
China
Prior art keywords
data
information
user
text
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011535936.6A
Other languages
Chinese (zh)
Inventor
张才俊
江帆
邢巍
张波
郭园园
王佳佳
张晓慧
吴涛
周云
于盟
赵舰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Co ltd Customer Service Center
Original Assignee
State Grid Co ltd Customer Service Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Co ltd Customer Service Center filed Critical State Grid Co ltd Customer Service Center
Priority to CN202011535936.6A priority Critical patent/CN112650906A/en
Publication of CN112650906A publication Critical patent/CN112650906A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention discloses an internet user comment analysis method based on big data text analysis, which comprises the following steps: acquiring and sending user evaluation information; carrying out data cleaning processing on the user evaluation information to obtain and send target pure text information; extracting user emotion data in the target pure text information, classifying the user emotion data according to a set rule, generating and sending a classification report; performing word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data; performing word vectorization operation on the word text data, and removing interference information to obtain vectorized words; and (5) utilizing a naive Bayes algorithm to operate the vector-quantized words, and outputting and displaying an operation result. The invention also discloses an internet user comment analysis system based on big data text analysis. The invention can effectively improve the efficiency and the accuracy of the comment analysis of the user.

Description

Internet user comment analysis method and system based on big data text analysis
Technical Field
The invention relates to the technical field of data analysis, in particular to an internet user comment analysis method and system based on big data text analysis.
Background
For an App operation unit, user experience is crucial, however, an internet operation unit has a limited way to obtain opinions and suggestions of users on products, and in most cases, the opinions and suggestions can only be obtained through a search engine, a wechat public number questionnaire and an App built-in "opinion and suggestion" function module, and analysis of user comments is only limited to a mode of combining a spreadsheet with manual combing, and the mode has the following defects:
firstly, the analysis sample quantity is seriously insufficient, so that the analysis conclusion is not accurate.
Secondly, the source of the analysis sample has certain limitation, and the output result can only focus on a certain aspect.
Thirdly, the efficiency is low, and the rigor of the analysis result is insufficient.
Fourthly, a large amount of human resources are occupied.
Disclosure of Invention
In order to overcome the problems or at least partially solve the problems, the embodiment of the invention provides an internet user comment analysis method and system based on big data text analysis, which can help an App operation unit flexibly and quickly master operation results, know user opinions and suggestions in time, improve an operation method and optimize an operation strategy; can effectively improve the analysis efficiency and the accuracy.
The embodiment of the invention is realized by the following steps:
in a first aspect, an embodiment of the present invention provides an internet user comment analysis method based on big data text analysis, including the following steps:
acquiring and sending user evaluation information;
carrying out data cleaning processing on the user evaluation information to obtain and send target pure text information;
extracting user emotion data in the target pure text information, classifying the user emotion data according to a set rule, generating and sending a classification report;
performing word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data;
performing word vectorization operation on the word text data, and removing interference information to obtain vectorized words;
and (5) utilizing a naive Bayes algorithm to operate the vector-quantized words, and outputting and displaying an operation result.
In order to better help an App operation unit to flexibly and quickly master operation results and timely know user opinions and suggestions, the operation unit is further helped to improve an operation method and optimize an operation strategy. Firstly, acquiring and sending user evaluation information, wherein the user evaluation information comprises online data and offline data, and the comprehensiveness of the data is ensured; then, data cleaning processing is carried out on the user evaluation information to obtain and send target pure text information, wherein the data cleaning processing method means that useless information, low-value information and repeated information in data are effectively removed, and reliable target pure text information is transferred to a database for a text big data system to carry out analysis operation; after the target pure text information is obtained, extracting user emotion data in the target pure text information, classifying the user emotion data according to a set rule, generating and sending a classification report, analyzing user emotion, and timely and effectively knowing user emotion information, namely satisfaction condition information; then, performing word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data, so as to facilitate the subsequent calculation of stop words and word vectors; after word segmentation operation is carried out on the text information, word vectorization operation is carried out on the word segmentation text data, stop words and other interference information are removed, and vectorized words are obtained; and then, carrying out operation on the vector-quantized words by using a naive Bayes algorithm, generating and outputting operation results, and then displaying the operation results through a visualization tool, wherein the operation results comprise information such as keyword clouds, word frequency ratio charts, user emotional tendency analysis and the like.
According to the method, the user comment information of the whole channel is acquired through automatic information acquisition and manual sorting, after data are cleaned, a big data information system analyzes the user comment text, multi-dimensional text analysis is performed on the aspects of emotion analysis, machine learning, knowledge maps, data extraction and the like, various analysis results are output finally according to a mode specified by an App operator, analysis conclusions are summarized, and analysis charts are output, so that the App operator can flexibly and quickly master operation achievements, user opinions and suggestions can be known in time, the operation method is improved, the operation strategy is optimized, the use requirements of users are met to the greatest extent, and continuously-developed economic benefits and social benefits are brought to the operator. The data collected by each channel can be acquired simultaneously, the number of data samples is large, the analysis result is more accurate, and the referential performance is stronger; multiple analysis sources, namely multi-channel information comprehensive analysis, can also analyze single-channel information, output results cover multiple directions, and analysis results can be directionally output according to actual requirements; the analysis result is relatively accurate, the analysis efficiency is high, and the analysis result and the optimization conclusion of each dimension can be provided in a very short time; the human resource cost is greatly saved.
Based on the first aspect, in some embodiments of the present invention, the method for acquiring and sending user rating information includes the following steps:
and deriving corresponding information through a third-party mobile promotion data analysis platform, using one or more modes of a third-party public opinion acquisition tool, a data capture tool and a third-party SDK to read information content, and acquiring and sending online user evaluation information.
Based on the first aspect, in some embodiments of the present invention, the method for acquiring and sending user rating information further includes the following steps:
acquiring and sending offline user evaluation information;
and carrying out digital processing on the evaluation information of the offline user to obtain target supplementary user evaluation information.
Based on the first aspect, in some embodiments of the present invention, the method for performing data cleansing processing on user evaluation information to obtain and send target plain text information includes the following steps:
and removing useless data, low-value data and repeated data in the user evaluation information to obtain and send target pure text information.
Based on the first aspect, in some embodiments of the present invention, the method for extracting user emotion data in target plain text information, classifying the user emotion data according to a predetermined rule, and generating and sending a classification report includes the following steps:
extracting user emotion data in the target pure text information;
analyzing the user emotion data through an emotion dictionary and a machine learning mode, classifying the user emotion data according to a set rule, and generating and sending a classification report.
In a second aspect, an internet user comment analysis system based on big data text analysis in an embodiment of the present invention includes an evaluation acquisition module, a data cleansing module, an emotion classification module, a word segmentation processing module, a vectorization module, and a result output module, where:
the evaluation acquisition module is used for acquiring and sending user evaluation information;
the data cleaning module is used for carrying out data cleaning processing on the user evaluation information to obtain and send target pure text information;
the emotion classification module is used for extracting user emotion data in the target pure text information, classifying the user emotion data according to a set rule, generating and sending a classification report;
the word segmentation processing module is used for performing word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data;
the vectorization module is used for carrying out word vectorization operation on the word segmentation text data and removing interference information to obtain vectorized words;
and the result output module is used for calculating the vector-quantized words by using a naive Bayes algorithm, and outputting and displaying the calculation result.
In order to better help an App operation unit to flexibly and quickly master operation results and timely know user opinions and suggestions, the operation unit is further helped to improve an operation method and optimize an operation strategy. Firstly, acquiring and sending user evaluation information through an evaluation acquisition module, wherein the user evaluation information comprises online data and offline data, and the comprehensiveness of the data is ensured; then, data cleaning processing is carried out on the user evaluation information through a data cleaning module to obtain and send target pure text information, the data cleaning processing method means that useless information, low-value information and repeated information in data are effectively removed, and reliable target pure text information is transferred to a database to be used for a text big data system to carry out analysis operation; after the target pure text information is obtained, extracting user emotion data in the target pure text information through an emotion classification module, classifying the user emotion data according to a set rule, generating and sending a classification report, analyzing user emotion, and timely and effectively knowing user emotion information, namely satisfaction condition information; then, the word segmentation processing module performs word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data, so that the calculation of stop words and word vectors can be conveniently carried out subsequently; after word segmentation operation is carried out on the text information, word vectorization operation is carried out on the word segmentation text data through a vectorization module, stop words and other interference information are removed, and vectorized words are obtained; and then, operating the vector-quantized words by using a naive Bayes algorithm through a result output module to generate and output an operation result, and displaying the operation result through a visualization tool, wherein the operation result comprises information such as keyword cloud, a word frequency ratio chart, user emotional tendency analysis and the like.
According to the system method, the user comment information of the whole channel is acquired through automatic information acquisition and a manual sorting mode, after data are cleaned, a big data information system analyzes the user comment text, multi-dimensional text analysis is performed on the aspects of emotion analysis, machine learning, knowledge maps, data extraction and the like, various analysis results are output finally according to a mode specified by an App operator, analysis conclusions are summarized, and analysis charts are output to help an App operation unit to flexibly and quickly master operation results, user opinions and suggestions are known in time, an operation method is improved, operation strategies are optimized, the use requirements of users are met to the greatest extent, and continuously-developable economic benefits and social benefits are brought to the operation unit. The data collected by each channel can be acquired simultaneously, the number of data samples is large, the analysis result is more accurate, and the referential performance is stronger; multiple analysis sources, namely multi-channel information comprehensive analysis, can also analyze single-channel information, output results cover multiple directions, and analysis results can be directionally output according to actual requirements; the analysis result is relatively accurate, the analysis efficiency is high, and the analysis result and the optimization conclusion of each dimension can be provided in a very short time; the human resource cost is greatly saved.
Based on the second aspect, in some embodiments of the present invention, the evaluation obtaining module includes an online sub-module, configured to obtain and send online user evaluation information through one or more of exporting corresponding information from the third-party mobile promotion data analysis platform, using the third-party public opinion collecting tool, using the data capturing tool, and reading information content by using the third-party SDK.
Based on the second aspect, in some embodiments of the present invention, the evaluation obtaining module further includes an offline information sub-module and a digital processing sub-module, wherein:
the offline information submodule is used for acquiring and sending offline user evaluation information;
and the digital processing sub-module is used for carrying out digital processing on the evaluation information of the offline user so as to obtain the target supplement user evaluation information.
Based on the second aspect, in some embodiments of the present invention, the data cleansing module includes a redundancy removing sub-module, configured to remove useless data, low-value data, and duplicate data in the user evaluation information to obtain and send target plain text information.
Based on the second aspect, in some embodiments of the present invention, the emotion classification module includes an extraction sub-module and a classification sub-module, where:
the extraction submodule is used for extracting user emotion data in the target pure text information;
and the classification submodule is used for analyzing the user emotion data in an emotion dictionary and machine learning mode, classifying the user emotion data according to a set rule, and generating and sending a classification report.
The embodiment of the invention at least has the following advantages or beneficial effects:
the embodiment of the invention provides an internet user comment analysis method based on big data text analysis, which comprises the steps of firstly, obtaining and sending user evaluation information, wherein the user evaluation information comprises online data and offline data, and the comprehensiveness of the data is ensured; then, carrying out data cleaning processing on the user evaluation information to obtain and send target pure text information; after the target pure text information is obtained, extracting user emotion data in the target pure text information, classifying the user emotion data according to a set rule, generating and sending a classification report, analyzing user emotion, and timely and effectively knowing user emotion information, namely satisfaction condition information; then, performing word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data, so as to facilitate the subsequent calculation of stop words and word vectors; after word segmentation operation is carried out on the text information, word vectorization operation is carried out on the word segmentation text data, stop words and other interference information are removed, and vectorized words are obtained; and then, carrying out operation on the vector-quantized words by using a naive Bayes algorithm, generating and outputting an operation result, and then displaying the operation result through a visualization tool. According to the method, the user comment information of the whole channel is acquired through automatic information acquisition and manual sorting, after data are cleaned, a big data information system analyzes the user comment text, multi-dimensional text analysis is performed on the aspects of emotion analysis, machine learning, knowledge maps, data extraction and the like, various analysis results are output finally according to a mode specified by an App operator, analysis conclusions are summarized, and analysis charts are output, so that the App operator can flexibly and quickly master operation achievements, user opinions and suggestions can be known in time, the operation method is improved, the operation strategy is optimized, the use requirements of users are met to the greatest extent, and continuously-developed economic benefits and social benefits are brought to the operator.
The embodiment of the invention also provides an internet user comment analysis system based on big data text analysis, which is characterized in that user evaluation information is obtained and sent by the evaluation obtaining module, and the user evaluation information comprises online data and offline data, so that the comprehensiveness of the data is ensured; then, data cleaning processing is carried out on the user evaluation information through a data cleaning module to obtain and send target pure text information; after the target pure text information is obtained, extracting user emotion data in the target pure text information through an emotion classification module, classifying the user emotion data according to a set rule, generating and sending a classification report, analyzing user emotion, and timely and effectively knowing user emotion information, namely satisfaction condition information; then, the word segmentation processing module performs word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data, so that the calculation of stop words and word vectors can be conveniently carried out subsequently; after word segmentation operation is carried out on the text information, word vectorization operation is carried out on the word segmentation text data through a vectorization module, stop words and other interference information are removed, and vectorized words are obtained; and then, the result output module is used for operating the vector-quantized words by using a naive Bayes algorithm to generate and output an operation result, and then the operation result is displayed by a visualization tool. According to the system method, the user comment information of the whole channel is acquired through automatic information acquisition and a manual sorting mode, after data are cleaned, a big data information system analyzes the user comment text, multi-dimensional text analysis is performed on the aspects of emotion analysis, machine learning, knowledge maps, data extraction and the like, various analysis results are output finally according to a mode specified by an App operator, analysis conclusions are summarized, and analysis charts are output to help an App operation unit to flexibly and quickly master operation results, user opinions and suggestions are known in time, an operation method is improved, operation strategies are optimized, the use requirements of users are met to the greatest extent, and continuously-developable economic benefits and social benefits are brought to the operation unit.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flowchart of an Internet user comment analysis method based on big data text analysis according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of an internet user comment analysis system based on big data text analysis according to an embodiment of the present invention.
Icon: 100. an evaluation acquisition module; 110. an on-line sub-module; 120. an offline information submodule; 130. a digital processing sub-module; 200. a data cleaning module; 210. a redundancy removal submodule; 300. an emotion classification module; 310. extracting a submodule; 320. a classification submodule; 400. a word segmentation processing module; 500. a vectorization module; 600. and a result output module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the same element.
Examples
As shown in fig. 1, in a first aspect, an embodiment of the present invention provides an internet user comment analysis method based on big data text analysis, including the following steps:
s1, acquiring and sending user evaluation information;
s2, carrying out data cleaning processing on the user evaluation information to obtain and send target pure text information;
s3, extracting user emotion data in the target pure text information, classifying the user emotion data according to a set rule, generating and sending a classification report;
s4, performing word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data;
s5, performing word vectorization operation on the word segmentation text data, and removing interference information to obtain vectorized words;
and S6, utilizing a naive Bayes algorithm to operate the vector-quantized words, and outputting and displaying an operation result.
In order to better help an App operation unit to flexibly and quickly master operation results and timely know user opinions and suggestions, the operation unit is further helped to improve an operation method and optimize an operation strategy. Firstly, acquiring and sending user evaluation information, wherein the user evaluation information comprises online data and offline data, and the comprehensiveness of the data is ensured; then, data cleaning processing is carried out on the user evaluation information to obtain and send target pure text information, wherein the data cleaning processing method means that useless information, low-value information and repeated information in data are effectively removed, and reliable target pure text information is transferred to a database for a text big data system to carry out analysis operation; after the target pure text information is obtained, extracting user emotion data in the target pure text information, classifying the user emotion data according to a set rule, generating and sending a classification report, analyzing user emotion, and timely and effectively knowing user emotion information, namely satisfaction condition information; then, performing word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data, so as to facilitate the subsequent calculation of stop words and word vectors; after word segmentation operation is carried out on the text information, word vectorization operation is carried out on the word segmentation text data, stop words and other interference information are removed, and vectorized words are obtained; and then, carrying out operation on the vector-quantized words by using a naive Bayes algorithm, generating and outputting operation results, and then displaying the operation results through a visualization tool, wherein the operation results comprise information such as keyword clouds, word frequency ratio charts, user emotional tendency analysis and the like.
According to the method, the user comment information of the whole channel is acquired through automatic information acquisition and manual sorting, after data are cleaned, a big data information system analyzes the user comment text, multi-dimensional text analysis is performed on the aspects of emotion analysis, machine learning, knowledge maps, data extraction and the like, various analysis results are output finally according to a mode specified by an App operator, analysis conclusions are summarized, and analysis charts are output, so that the App operator can flexibly and quickly master operation achievements, user opinions and suggestions can be known in time, the operation method is improved, the operation strategy is optimized, the use requirements of users are met to the greatest extent, and continuously-developed economic benefits and social benefits are brought to the operator. The data collected by each channel can be acquired simultaneously, the number of data samples is large, the analysis result is more accurate, and the referential performance is stronger; multiple analysis sources, namely multi-channel information comprehensive analysis, can also analyze single-channel information, output results cover multiple directions, and analysis results can be directionally output according to actual requirements; the analysis result is relatively accurate, the analysis efficiency is high, and the analysis result and the optimization conclusion of each dimension can be provided in a very short time; the human resource cost is greatly saved.
Based on the first aspect, in some embodiments of the present invention, the method for acquiring and sending user rating information includes the following steps:
and deriving corresponding information through a third-party mobile promotion data analysis platform, using one or more modes of a third-party public opinion acquisition tool, a data capture tool and a third-party SDK to read information content, and acquiring and sending online user evaluation information.
The obtained user evaluation information comprises online user evaluation information and offline user evaluation information, and when the online user evaluation information is obtained, the one or more data obtaining modes are adopted to obtain data, so that the efficiency and the accuracy of data obtaining are improved.
Based on the first aspect, in some embodiments of the present invention, the method for acquiring and sending user rating information further includes the following steps:
acquiring and sending offline user evaluation information;
and carrying out digital processing on the evaluation information of the offline user to obtain target supplementary user evaluation information.
The obtained user evaluation information comprises online user evaluation information and offline user evaluation information so as to ensure the comprehensiveness of the data. After the offline user evaluation information is acquired, the offline user evaluation information is subjected to digital processing, wherein the digital processing refers to the step of converting the offline user evaluation information into computer-readable digital data so as to obtain target supplementary user evaluation information, so that subsequent operation can be performed to be beneficial supplementary to the online user evaluation information content.
Based on the first aspect, in some embodiments of the present invention, the method for performing data cleansing processing on user evaluation information to obtain and send target plain text information includes the following steps:
and removing useless data, low-value data and repeated data in the user evaluation information to obtain and send target pure text information.
In order to ensure the efficiency of subsequent data processing and reduce the data processing load, after user evaluation information is obtained, data cleaning processing is carried out on the user evaluation information, interference information such as useless data, low-value data and repeated data in the user evaluation information is removed, and target pure text information is obtained and sent to be used for a text big data system to carry out analysis operation.
Based on the first aspect, in some embodiments of the present invention, the method for extracting user emotion data in target plain text information, classifying the user emotion data according to a predetermined rule, and generating and sending a classification report includes the following steps:
extracting user emotion data in the target pure text information;
analyzing the user emotion data through an emotion dictionary and a machine learning mode, classifying the user emotion data according to a set rule, and generating and sending a classification report.
In order to timely and effectively know the emotion attitude condition of the user for the APP, the emotion of the user is analyzed, so that an APP operation unit can timely master user feedback and further adjust an operation strategy. Firstly, extracting user emotion data in target pure text information, and then analyzing user emotion according to the user emotion data through an emotion dictionary and a machine learning mode.
As shown in fig. 2, in a second aspect, an internet user comment analysis system based on big data text analysis according to an embodiment of the present invention includes an evaluation acquisition module 100, a data cleansing module 200, an emotion classification module 300, a word segmentation processing module 400, a vectorization module 500, and a result output module 600, where:
an evaluation acquisition module 100, configured to acquire and send user evaluation information;
the data cleaning module 200 is used for performing data cleaning processing on the user evaluation information to obtain and send target pure text information;
the emotion classification module 300 is used for extracting user emotion data in the target pure text information, classifying the user emotion data according to a set rule, generating and sending a classification report;
a word segmentation processing module 400, configured to perform word segmentation on text information in the classification report by using a chinese word segmentation framework to obtain and send word segmentation text data;
a vectorization module 500, configured to perform word vectorization on the word-segmented text data, and remove interference information to obtain a vectorized word;
and a result output module 600, configured to perform operations on the vector-quantized words by using a naive bayes algorithm, and output and display an operation result.
In order to better help an App operation unit to flexibly and quickly master operation results and timely know user opinions and suggestions, the operation unit is further helped to improve an operation method and optimize an operation strategy. Firstly, user evaluation information is acquired and sent through the evaluation acquisition module 100, and the user evaluation information comprises online data and offline data, so that the comprehensiveness of the data is ensured; then, data cleaning processing is carried out on the user evaluation information through a data cleaning module 200 to obtain and send target pure text information, wherein the data cleaning processing method means that useless information, low-value information and repeated information in data are effectively removed, and reliable target pure text information is transferred to a database to be used for a text big data system to carry out analysis operation; after the target pure text information is obtained, extracting user emotion data in the target pure text information through an emotion classification module 300, classifying the user emotion data according to a set rule, generating and sending a classification report, analyzing user emotion, and timely and effectively knowing user emotion information, namely satisfaction condition information; then, the word segmentation processing module 400 performs word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data, so as to facilitate the subsequent calculation of stop words and word vectors; after the word segmentation operation is performed on the text information, the word vectorization operation is performed on the word segmentation text data through the vectorization module 500, and stop words and other interference information are removed to obtain vectorized words; then, the result output module 600 utilizes a naive Bayes algorithm to operate the vector-quantized words, generate and output operation results, and then displays the operation results through a visualization tool, wherein the operation results comprise information such as keyword clouds, word frequency proportion charts, user emotional tendency analysis and the like.
According to the system method, the user comment information of the whole channel is acquired through automatic information acquisition and a manual sorting mode, after data are cleaned, a big data information system analyzes the user comment text, multi-dimensional text analysis is performed on the aspects of emotion analysis, machine learning, knowledge maps, data extraction and the like, various analysis results are output finally according to a mode specified by an App operator, analysis conclusions are summarized, and analysis charts are output to help an App operation unit to flexibly and quickly master operation results, user opinions and suggestions are known in time, an operation method is improved, operation strategies are optimized, the use requirements of users are met to the greatest extent, and continuously-developable economic benefits and social benefits are brought to the operation unit. The data collected by each channel can be acquired simultaneously, the number of data samples is large, the analysis result is more accurate, and the referential performance is stronger; multiple analysis sources, namely multi-channel information comprehensive analysis, can also analyze single-channel information, output results cover multiple directions, and analysis results can be directionally output according to actual requirements; the analysis result is relatively accurate, the analysis efficiency is high, and the analysis result and the optimization conclusion of each dimension can be provided in a very short time; the human resource cost is greatly saved.
Based on the second aspect, as shown in fig. 2, in some embodiments of the present invention, the above-mentioned evaluation obtaining module 100 includes an online sub-module 110, configured to obtain and send online user evaluation information through one or more of third party mobile promotion data analysis platform corresponding information derivation, third party public opinion collecting tool use, data capturing tool use, and third party SDK reading information content.
The obtained user evaluation information comprises online user evaluation information and offline user evaluation information, and when the online user evaluation information is obtained, the online sub-module 110 obtains data by adopting one or more data obtaining modes, so that the efficiency and the accuracy of data obtaining are improved.
Based on the second aspect, as shown in fig. 2, in some embodiments of the present invention, the above-mentioned evaluation acquisition module 100 further includes an offline information sub-module 120 and a digital processing sub-module 130, where:
the offline information sub-module 120 is used for acquiring and sending offline user evaluation information;
and the digital processing sub-module 130 is configured to perform digital processing on the offline user evaluation information to obtain target supplementary user evaluation information.
The obtained user evaluation information comprises online user evaluation information and offline user evaluation information so as to ensure the comprehensiveness of the data. After the offline user evaluation information is obtained by the offline information sub-module 120, the offline user evaluation information is digitized by the digital processing sub-module 130, where the digitized processing refers to converting the offline user evaluation information into computer-readable digitized data to obtain target supplementary user evaluation information, so as to perform subsequent operations as a beneficial supplement to the online user evaluation information content.
Based on the second aspect, as shown in fig. 2, in some embodiments of the present invention, the data cleansing module 200 includes a redundancy removing sub-module 210 for removing useless data, low-value data, and duplicate data in the user evaluation information to obtain and send the target plain text information.
In order to ensure the efficiency of subsequent data processing and reduce the data processing load, after the user evaluation information is obtained, the redundancy removing submodule 210 is used for carrying out data cleaning processing on the user evaluation information, and removing interference information such as useless data, low-value data and repeated data in the user evaluation information so as to obtain and send target pure text information for a text big data system to carry out analysis operation.
Based on the second aspect, as shown in fig. 2, in some embodiments of the present invention, the emotion classification module 300 includes an extraction sub-module 310 and a classification sub-module 320, where:
the extraction submodule 310 is used for extracting user emotion data in the target pure text information;
and the classification submodule 320 is configured to analyze the user emotion data through an emotion dictionary and a machine learning manner, classify the user emotion data according to a predetermined rule, generate and send a classification report.
In order to timely and effectively know the emotion attitude condition of the user for the APP, the emotion of the user is analyzed, so that an APP operation unit can timely master user feedback and further adjust an operation strategy. Firstly, the user emotion data in the target pure text information is extracted through the extraction submodule 310, and then the classification submodule 320 analyzes the user emotion according to the user emotion data through an emotion dictionary and a machine learning mode.
In summary, embodiments of the present invention provide an internet user comment analysis method and system based on big data text analysis, which acquire user comment information of all channels through automatic information acquisition and manual arrangement, wash data, analyze a user comment text by a big data information system, perform multidimensional text analysis in the aspects of emotion analysis, machine learning, knowledge mapping, data extraction, and the like, and finally output various analysis results in a manner specified by an App operator, summarize analysis conclusions, and output analysis charts, so as to help the App operator to flexibly and quickly grasp operation results, know user opinions and suggestions in time, improve an operation method, optimize an operation strategy, meet user requirements to the greatest extent, and bring sustainable and developable economic benefits and social benefits to the operator. The data collected by each channel can be acquired simultaneously, the number of data samples is large, the analysis result is more accurate, and the referential performance is stronger; multiple analysis sources, namely multi-channel information comprehensive analysis, can also analyze single-channel information, output results cover multiple directions, and analysis results can be directionally output according to actual requirements; the analysis result is relatively accurate, the analysis efficiency is high, and the analysis result and the optimization conclusion of each dimension can be provided in a very short time; the human resource cost is greatly saved.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (10)

1. An Internet user comment analysis method based on big data text analysis is characterized by comprising the following steps:
acquiring and sending user evaluation information;
carrying out data cleaning processing on the user evaluation information to obtain and send target pure text information;
extracting user emotion data in the target pure text information, classifying the user emotion data according to a set rule, generating and sending a classification report;
performing word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data;
performing word vectorization operation on the word text data, and removing interference information to obtain vectorized words;
and (5) utilizing a naive Bayes algorithm to operate the vector-quantized words, and outputting and displaying an operation result.
2. The internet user comment analyzing method based on big data text analysis of claim 1, wherein the method for obtaining and sending user evaluation information comprises the following steps:
and deriving corresponding information through a third-party mobile promotion data analysis platform, using one or more modes of a third-party public opinion acquisition tool, a data capture tool and a third-party SDK to read information content, and acquiring and sending online user evaluation information.
3. The internet user comment analyzing method based on big data text analysis according to claim 2, wherein the method for obtaining and sending user evaluation information further comprises the following steps:
acquiring and sending offline user evaluation information;
and carrying out digital processing on the evaluation information of the offline user to obtain target supplementary user evaluation information.
4. The internet user comment analyzing method based on big data text analysis as claimed in claim 1, wherein the method for performing data cleansing processing on user evaluation information to obtain and send target plain text information comprises the following steps:
and removing useless data, low-value data and repeated data in the user evaluation information to obtain and send target pure text information.
5. The internet user comment analysis method based on big data text analysis of claim 1, wherein the method for extracting user emotion data in target plain text information, classifying the user emotion data according to a given rule, and generating and sending a classification report comprises the following steps:
extracting user emotion data in the target pure text information;
analyzing the user emotion data through an emotion dictionary and a machine learning mode, classifying the user emotion data according to a set rule, and generating and sending a classification report.
6. The utility model provides an internet user comment analytic system based on big data text analysis which characterized in that, includes that evaluation acquisition module, data wash module, emotion classification module, word segmentation processing module, vectorization module and result output module, wherein:
the evaluation acquisition module is used for acquiring and sending user evaluation information;
the data cleaning module is used for carrying out data cleaning processing on the user evaluation information to obtain and send target pure text information;
the emotion classification module is used for extracting user emotion data in the target pure text information, classifying the user emotion data according to a set rule, generating and sending a classification report;
the word segmentation processing module is used for performing word segmentation operation on the text information in the classification report by using a Chinese word segmentation frame to obtain and send word segmentation text data;
the vectorization module is used for carrying out word vectorization operation on the word segmentation text data and removing interference information to obtain vectorized words;
and the result output module is used for calculating the vector-quantized words by using a naive Bayes algorithm, and outputting and displaying the calculation result.
7. The system of claim 6, wherein the comment acquisition module comprises an online sub-module, and is configured to acquire and transmit online user comment information through one or more of information export by a third-party mobile promotion data analysis platform, information content reading by a third-party public opinion collection tool, a data capture tool, and a third-party SDK.
8. The internet user comment analyzing system based on big data text analysis of claim 7 wherein the comment acquisition module further comprises an offline information sub-module and a digital processing sub-module, wherein:
the offline information submodule is used for acquiring and sending offline user evaluation information;
and the digital processing sub-module is used for carrying out digital processing on the evaluation information of the offline user so as to obtain the target supplement user evaluation information.
9. The Internet user comment analysis system based on big data text analysis of claim 6, wherein the data cleansing module comprises a redundancy removing sub-module for removing useless data, low-value data and repeated data in user evaluation information to obtain and send target pure text information.
10. The system of claim 6, wherein the sentiment classification module comprises an extraction sub-module and a classification sub-module, and wherein:
the extraction submodule is used for extracting user emotion data in the target pure text information;
and the classification submodule is used for analyzing the user emotion data in an emotion dictionary and machine learning mode, classifying the user emotion data according to a set rule, and generating and sending a classification report.
CN202011535936.6A 2020-12-22 2020-12-22 Internet user comment analysis method and system based on big data text analysis Pending CN112650906A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011535936.6A CN112650906A (en) 2020-12-22 2020-12-22 Internet user comment analysis method and system based on big data text analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011535936.6A CN112650906A (en) 2020-12-22 2020-12-22 Internet user comment analysis method and system based on big data text analysis

Publications (1)

Publication Number Publication Date
CN112650906A true CN112650906A (en) 2021-04-13

Family

ID=75359338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011535936.6A Pending CN112650906A (en) 2020-12-22 2020-12-22 Internet user comment analysis method and system based on big data text analysis

Country Status (1)

Country Link
CN (1) CN112650906A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559174A (en) * 2013-09-30 2014-02-05 东软集团股份有限公司 Semantic emotion classification characteristic value extraction method and system
CN103605658A (en) * 2013-10-14 2014-02-26 北京航空航天大学 Search engine system based on text emotion analysis
CN104778240A (en) * 2015-04-08 2015-07-15 重庆理工大学 Micro blog text data classification method on basis of multi-feature fusion
JPWO2013179340A1 (en) * 2012-05-30 2016-01-14 株式会社日立製作所 Information analysis system and information analysis method
CN105740228A (en) * 2016-01-25 2016-07-06 云南大学 Internet public opinion analysis method
CN109523295A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 A kind of information processing method, storage medium and server
CN111309859A (en) * 2020-01-21 2020-06-19 上饶市中科院云计算中心大数据研究院 Scenic spot network public praise emotion analysis method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2013179340A1 (en) * 2012-05-30 2016-01-14 株式会社日立製作所 Information analysis system and information analysis method
CN103559174A (en) * 2013-09-30 2014-02-05 东软集团股份有限公司 Semantic emotion classification characteristic value extraction method and system
CN103605658A (en) * 2013-10-14 2014-02-26 北京航空航天大学 Search engine system based on text emotion analysis
CN104778240A (en) * 2015-04-08 2015-07-15 重庆理工大学 Micro blog text data classification method on basis of multi-feature fusion
CN105740228A (en) * 2016-01-25 2016-07-06 云南大学 Internet public opinion analysis method
CN109523295A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 A kind of information processing method, storage medium and server
CN111309859A (en) * 2020-01-21 2020-06-19 上饶市中科院云计算中心大数据研究院 Scenic spot network public praise emotion analysis method and device

Similar Documents

Publication Publication Date Title
Batool et al. Precise tweet classification and sentiment analysis
CN109165294B (en) Short text classification method based on Bayesian classification
CN107977798B (en) Risk assessment method for quality of electronic commerce product
CN105868310B (en) Data processing method and device and electronic equipment
CN110781315A (en) Food safety knowledge map and construction method of related intelligent question-answering system
CN111339286B (en) Method for exploring mechanism research conditions based on theme visualization
CN111859046A (en) Water pollution tracing system and method based on pollution element source analysis
CN107330076B (en) Network public opinion information display system and method
CN112183078A (en) Text abstract determining method and device
CN113505242A (en) Method and system for automatically embedding knowledge graph
CN116663664A (en) Customer marketing scene data analysis system and method based on NLP algorithm
CN113094512A (en) Fault analysis system and method in industrial production and manufacturing
Rani et al. Study and comparision of vectorization techniques used in text classification
Mahgoub et al. Sentiment analysis: Amazon electronics reviews using bert and textblob
CN116011447B (en) E-commerce comment analysis method, system and computer readable storage medium
CN112650906A (en) Internet user comment analysis method and system based on big data text analysis
KR20130021945A (en) Method and apparatus for auto extracting information of product
Mundargi et al. PrePy-A Customize Library for Data Preprocessing in Python
CN113032653A (en) Big data-based public opinion monitoring platform
CN114185875A (en) Big data unified analysis and processing system based on cloud computing
Bayu et al. Classification of Indonesian Population’s Level Happiness on Twitter Data Using N-Gram, Naïve Bayes, and Big Data Technology
KR20110026154A (en) Analyzation of internet user behavior and character using semantic metadata
AU2021106572A4 (en) A recommendation system and method for e-commerce using machine learning
JP2014135017A (en) Program, terminal device, and data processing method
CN117033714A (en) Unstructured data acquisition technology based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210413