CN117114005A

CN117114005A - Comment data processing method and device, computer equipment and storage medium

Info

Publication number: CN117114005A
Application number: CN202310861015.6A
Authority: CN
Inventors: 张瑾
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-11-24

Abstract

The application relates to a comment data processing method, a comment data processing device, computer equipment, a storage medium and a computer program product. The method comprises the following steps: preprocessing comment data in a target application to obtain word vectors of the comment data, extracting semantic features of the word vectors of the comment data based on a weighted fusion model of the comment data trained in advance to obtain comment objects and comment properties corresponding to the comment data, performing cluster analysis on the comment objects to obtain a set of comment objects, wherein the set of comment objects comprises at least two comment objects, and sequencing the set of comment objects based on scoring results of the comment properties to obtain an evaluation result of the target application. According to the method, comment data of the target application are extracted and analyzed to obtain comment objects and comment properties, the set of the comment objects is ordered according to the scoring result of the comment properties, specific comment object types are screened out, and the accurate direction of upgrading and reconstruction of the target application is provided.

Description

Comment data processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a comment data processing method, apparatus, computer device, storage medium, and computer program product.

Background

In the development of random information technology, more and more clients choose to download APP at a mobile phone end and process various services on the APP, the general mode is to download APP at various application stores, scoring and comments about a certain APP are provided at various large application stores, and the comments are different, so that analysis is required for a plurality of comments.

In the existing mode, comment data are crawled from an application store, the comment data are extracted and analyzed according to the scoring and browsing amount, the analysis effect is poor, and maintenance and upgrading of APP by developers are not facilitated.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a comment data processing method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve the analysis effect of comment data.

In a first aspect, the present application provides a method for processing comment data. The method comprises the following steps:

preprocessing comment data in a target application to obtain word vectors of the comment data;

Based on a weighted fusion model of pre-trained comment data, extracting semantic features of word vectors of the comment data to obtain comment objects and comment properties corresponding to the comment data;

performing cluster analysis on the comment objects to obtain a set of comment objects, wherein the set of comment objects comprises at least two comment objects;

and sorting the set of the evaluation objects based on the scoring result of the comment property to obtain the evaluation result of the target application.

In one embodiment, the preprocessing the comment data in the target application to obtain a word vector of the comment data includes:

comment data of a target application is obtained;

cutting words from the comment data to obtain comment data after cutting words;

and coding and extracting features of the comment data after the word segmentation to obtain word vectors of the comment data.

In one embodiment, the weighted fusion model of the comment data includes an LSTM model and a CRF model, and the semantic feature extraction is performed on word vectors of the comment data based on the weighted fusion model of the pre-trained comment data to obtain comment objects and comment properties corresponding to the comment data, where the method includes:

Based on an LSTM model in a weighted fusion model of pre-trained comment data, extracting semantic features of word vectors of the comment data to obtain comment objects of the comment data and semantic features of the comment objects;

and labeling semantic features of comment objects based on a CRF model in a weighted fusion model of the pre-trained comment data to obtain comment properties corresponding to the comment data.

In one embodiment, the performing cluster analysis on the comment object to obtain the set of comment objects includes:

randomly dividing the comment objects to obtain at least two groups of comment objects;

selecting one comment object from the at least two groups of comment objects as a target comment object respectively;

and carrying out cluster analysis on each target comment object to obtain a set of the comment objects.

In one embodiment, the ranking the set of rating objects based on the rating result of the comment property to obtain the rating result of the target application includes:

weighting the sorting results of the comment properties according to the scoring results of the comment data to obtain scoring results of the comment properties;

Scoring each comment object under the set of comment objects based on scoring results of comment properties to obtain scoring results of each comment object;

and sorting the set of the evaluation objects according to the scoring result of each evaluation object to obtain the evaluation result of the target application.

In one embodiment, the training method of the weighted fusion model of comment data includes:

acquiring weighted fusion parameters of the LSTM model and the CRF model;

obtaining a predicted comment object and predicted comment properties according to the LSTM model, the CRF model and comment sample data of the target application;

and if the predicted comment object is not successfully matched with the comment sample data manual annotation result of the target application, or if the predicted comment property is not successfully matched with the comment sample data manual annotation result of the target application, adjusting the weighted fusion parameters, retraining until the predicted comment object is successfully matched with the comment sample data manual annotation result of the target application, and storing the weighted fusion parameters of the training.

In a second aspect, the application further provides a comment data processing device. The device comprises:

the preprocessing module is used for preprocessing comment data in the target application to obtain word vectors of the comment data;

the processing module is used for extracting semantic features of word vectors of the comment data based on a weighted fusion model of the pre-trained comment data to obtain comment objects and comment properties corresponding to the comment data;

the classification module is used for carrying out cluster analysis on the comment objects to obtain a set of the comment objects, wherein the set of the comment objects comprises at least two comment objects;

and the sorting module is used for sorting the set of the evaluation objects based on the scoring result of the comment property to obtain the evaluation result of the target application.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

The method comprises the steps of preprocessing comment data in target application to obtain word vectors of the comment data, expressing the comment data in a vector form, facilitating subsequent analysis of characteristics in the comment data, extracting semantic features of the word vectors of the comment data based on a weighted fusion model of trained comment data to obtain comment objects and comment properties corresponding to the comment data, focusing on semantics of the comment data based on the weighted fusion model, extracting to obtain accurate comment objects and comment properties, carrying out clustering analysis on the comment objects to obtain a set of comment objects, clustering the comment objects from massive comment data, sorting the comment objects into a plurality of types, saving upgrading and maintaining time of developers, sorting the set of comment objects based on scoring results of the comment properties to obtain an evaluation result of the target application, sorting the set of the comment objects, screening out specific comment object types, facilitating upgrading and modifying the target application by the developer, extracting and analyzing the comment data of the target application to obtain the comment objects and the comment properties according to the sorting results of the comment objects, screening the comment objects, and sorting the comment objects according to the sorting results of the comment objects, and providing the sorting results of the comment objects.

Drawings

FIG. 1 is an application environment diagram of a method of processing comment data in one embodiment;

FIG. 2 is a flow diagram of a method of processing comment data in one embodiment;

FIG. 3 is a flow diagram of a training method of a weighted fusion model of comment data in one embodiment;

FIG. 4 is a flow diagram of a method of comment extraction and analysis for an application store in one embodiment;

FIG. 5 is a block diagram of a processing apparatus for comment data in one embodiment;

fig. 6 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The comment data processing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. The data storage system may store data that the processing server 102 needs to process. The data storage system may be integrated on the processing server 102 or may be located on a cloud or other network server. The processing server 102 pre-processes comment data in the target application to obtain word vectors of the comment data, the processing server 102 extracts semantic features of the word vectors of the comment data based on a weighted fusion model of the comment data trained in advance to obtain comment objects and comment properties corresponding to the comment data, the processing server 102 performs cluster analysis on the comment objects to obtain a set of comment objects, the set of comment objects comprises at least two comment objects, and the processing server 102 sorts the set of comment objects based on scoring results of the comment properties to obtain evaluation results of the target application.

The processing server 102 may be implemented as a stand-alone server or a server cluster including a plurality of servers.

In one embodiment, as shown in fig. 2, a method for processing comment data is provided, and an example that the method is applied to the processing server in fig. 1 is described, including:

s202, preprocessing the comment data in the target application to obtain word vectors of the comment data.

The target application can be an APP in the financial field, specifically, a bank APP at a mobile phone end, a certain bank APP can be online in different application stores, comments and scores of the APP can be provided in various application stores, therefore, multiple comments are required to be analyzed so as to quickly screen and extract negative language and defect functions in APP use, and therefore, when the work function is optimized, the APP can be developed and innovated in a targeted manner, so that the popularity of the APP is improved and the problem of user loss of the APP is alleviated.

The comment data comprises comment content and comment scores, wherein the comment content is comments, suggestions, comments and the like of a user on the APP, generally, the comment content can be divided into two types according to whether the content is effective, one is effective comment content, the effective comment content is provided with an object, the object refers to a function, a page and other modules in the APP, the other is ineffective comment content, the ineffective comment content generally only comprises comment properties, the comment properties are generally attitudes of the user on the APP, and no specific object exists, such as that a certain APP is too bad. For the scoring of different comments, it may be an objective numeric evaluation that measures the nature of the comment.

The comment data of the target application needs to be acquired in different application stores, and massive comment data can be extracted from different comment pages for the stores through a crawler tool or other text extraction tools.

After comment data of the target application are obtained, the comment data are preprocessed, and word vectors of the comment data are obtained. The pretreatment method comprises the following steps: the method comprises the steps of performing word segmentation and filtering on comment data, screening out invalid comment content, performing word segmentation processing on valid comment content to obtain comment data after word segmentation, performing feature extraction on the comment data after word segmentation to obtain word vectors of the comment data, and obtaining position information of the comment data in a vector space, so that subsequent results can conveniently perform semantic extraction, labeling and clustering analysis on the comment data.

S204, based on a weighted fusion model of the comment data trained in advance, extracting semantic features of word vectors of the comment data to obtain comment objects and comment properties corresponding to the comment data.

The weighted fusion model of comment data can be obtained by fusing a neural network model and a conditional random field, and the fusion mode comprises the following steps: and determining the weighted weights of the neural network model and the conditional random field in the fusion model training stage, and configuring the neural network model and the conditional random field according to the weighted weights after training to obtain a weighted fusion model.

The neural network model may be an LSTM (long short-term memory neural networks) model, which is a time-circulating neural network and is specifically designed to solve the long-term dependence problem of a general circulating neural network (Recurrent Neural Networks, RNN). LSTM nerve units contain three gating: an input door, a forget door, and an output door. The long-term and short-term memory of LSTM is accomplished by gating units, which are used to adjust the information delivered by the delay sequence at each point in time, capturing the long-memory dependencies in the data sequence. The LSTM model may extract semantic features in the word vector.

The Conditional Random Field (CRF) CRF is a discriminant undirected graph model. The discriminant model models the condition distribution. The CRF can be used in combination with the LSTM, the CRF marks semantic features of the word vectors to obtain marking results, the marking results can be attitude features of the user on each function in the APP, for example, the word vectors of comment data are subjected to semantic feature extraction, and then the semantic features of each word vector are marked to obtain attitude features of each word vector, so that the popularity of modules such as functions, pages and the like of target applications corresponding to important comment data can be rapidly and accurately extracted from massive comment data.

The comment object corresponding to the comment data may be an entity or a virtual object in the comment data, taking the bank APP as an example, the comment object corresponding to the comment data of the bank APP may be a virtual object of the comment data in the bank APP in general case of a function (such as a sweep, a transfer function, etc.) page (such as a payment page, an information page, etc.), and in specific case, the comment data also includes an entity object, for example, a reservation function includes a website recommendation, where a website is the entity object.

The comment property corresponding to the comment data may be the attitudes of the user to the comment object, for example, good use, bad use, very bad use, good service attitudes, and the like.

S206, performing cluster analysis on the comment objects to obtain a comment object set, wherein the comment object set comprises at least two comment objects.

The clustering analysis is an iterative solution clustering analysis algorithm, and generally comprises the steps of dividing data into a plurality of groups, randomly selecting an object in each group as a clustering center of the group, then calculating the distance between the clustering center and other objects in each group to obtain a new group, randomly selecting a new clustering center according to the new group, and the like until the clustering center does not change or the square sum of errors is minimum, and storing the current clustering center and the group to which the clustering center belongs.

Specifically, cluster analysis is performed on comment objects, so that a set of comment objects (a group to which a cluster center belongs) at least includes two comment objects. Through cluster analysis on comment objects, a plurality of comment objects are divided into sets of a small number of comment objects, so that the calculation difficulty is reduced, and the comment objects are conveniently checked and marked manually.

S208, sorting the set of comment objects based on the scoring result of the comment property to obtain the evaluation result of the target application.

The scoring result of the comment property can be directly obtained according to the scoring result of the comment data, the scoring result of the comment data can be the score of the comment data in an application store, normalization processing can be carried out on the score, and the 5-score, the 10-score and the 100-score scores are converted into 0-1 scores.

The scoring result of the comment property can also be obtained according to the marking result of the comment property, and the scoring result of the comment property is obtained by extracting semantic features of word vectors of comment data, for example, classifying the result of the comment property and giving different scoring of different types of comment properties.

The comment objects correspond to comment properties, the scoring results corresponding to the comment objects can be obtained according to the scoring results of the comment properties, the overall scoring results of the set of the comment objects are further obtained, and the comment objects are ranked to obtain the evaluation results of the target application.

Specifically, taking comment objects as an A1 function, an A2 function, a B1 page, and a B2 page as examples, the set a of comment objects includes: a1 function and A2 function, the set of comment objects B includes: b1 page and B2 page. And (3) according to scoring results (A1 function 90 score, A2 function 80 score, B1 page 80 score and B2 page 70 score) of each comment property, obtaining overall scoring results of a set A of comment objects and a set B of comment objects, and sorting the comment objects according to the overall scoring results to obtain evaluation results of target applications, wherein the evaluation results of the target applications comprise a certain number of comment objects and comment properties which are ranked in front, and a developer is used for carrying out targeted optimization on the targets according to the evaluation results of the target applications.

In the processing method of comment data, comment data in target application is preprocessed to obtain word vectors of the comment data, the comment data is expressed in a vector form, the subsequent analysis of characteristics in the comment data is facilitated, semantic feature extraction is performed on the word vectors of the comment data based on a weighted fusion model of trained comment data to obtain comment objects and comment properties corresponding to the comment data, the semantics of the comment data are focused based on the weighted fusion model, accurate comment objects and comment properties are obtained through extraction, the comment objects are subjected to clustering analysis to obtain a set of comment objects, the comment objects are clustered in massive comment data, a plurality of comment objects are classified into a plurality of main types, maintenance time of developers is saved, the set of comment objects is ordered based on scoring results of the comment properties to obtain evaluation results of the target application, and a specific comment object type is screened out.

In one embodiment, preprocessing comment data in a target application to obtain word vectors of the comment data includes: and obtaining comment data of the target application, performing word segmentation on the comment data to obtain the comment data after word segmentation, and performing coding and feature extraction on the comment data after word segmentation to obtain word vectors of the comment data.

The method for acquiring comment data of the target application can be to acquire comment data from a background interface of an application store in a crawler mode, wherein the crawler is a web crawler, and is a program or script for automatically capturing web information according to a certain rule.

The word segmentation of the comment data may be performed by a method of jieba chinese word segmentation, where jieba (junction) word segmentation is a Python chinese word segmentation component. Realizing efficient word graph scanning based on the Trie structure, and generating a Directed Acyclic Graph (DAG) formed by Chinese characters in sentences; the calculation of the maximum probability path is realized by adopting the memorization search, and the maximum segmentation combination based on word frequency is found out; for the unregistered words, a model based on Chinese character position probability is adopted, and a Viterbi algorithm is used. Three word segmentation modes are supported: (a) A precision mode, which attempts to cut the sentence most precisely, suitable for text analysis; (b) The full mode scans all words which can form words in sentences, so that the speed is very high, but ambiguity cannot be resolved; (c) And the search engine mode is used for segmenting the long words again on the basis of the accurate mode, so that the recall rate is improved, and the method is suitable for word segmentation of the search engine. And supporting traditional Chinese word segmentation and custom dictionary. In addition, jieba segmentation also discloses the c++ version: cppJieba; and CppJiebaPy, invoking CppJieba by Python.

Specifically, when the jieba word segmentation is adopted to segment the comment data, the semantic relation of each word to be selected can be primarily considered, and the comment data is segmented based on the semantic relation among the words to be selected, so that a final word segmentation result is obtained.

The comment data after word segmentation can be input into a feature project, and the comment data after word segmentation is subjected to coding treatment in the feature project, wherein the coding treatment can be one-hot coding, which is also called one-hot coding, and is also called one-bit effective coding. The method is to encode N states using N-bit state registers, each with its own register bit, and at any time only one of the bits is valid.

Specifically, one-hot encoding is adopted to encode the comment data after word segmentation, and the encoding characteristics of the comment data after word segmentation are obtained.

The method comprises the steps of obtaining coding features of comment data after word segmentation, and extracting the features to obtain word vectors of the comment data. Feature extraction may be performed by word2vec, word2vec representing words in vector space, or words in the form of vectors. In the word vector space, words of similar meaning appear together, while different words are spatially far apart, also referred to as semantic relationships. Word embedding provides a method of converting text into a numeric vector. Word2vec is the language context in which words are reconstructed. The main goal of a sentence is a context around words or sentences in spoken or written language into the context of the current Word, word2vec learning the vector representation of the Word by context. Word2Vec solved two methods including Skip-gram and CBOW. Wherein the CBOW framework predicts a word itself with the context of the word as input; the Skip-gram framework uses a word as an input to predict the context of the word.

Specifically, feature extraction is carried out on coding features of comment data after the segmentation by using word2vec, so that word vectors of the comment data are obtained. The word vector may be at least one of a comment vector, a weighted evaluation vector, and a keyword average vector. Word2Vec is a Word embedding model based on a neural network, semantic and grammar relations are captured by learning a distributed representation of words, word2Vec is suitable for large-scale text data, richer semantic relations can be learned, and Word representations obtained by the Word2Vec model are continuous vectors and can be used for other natural language processing tasks.

Wherein, the average vector means that all word vectors in the comment are averaged to obtain an integral comment vector. The weighted average vector refers to weighted average of each word vector, and different weights can be given according to importance of the words or TF-IDF weights. The keyword vectors refer to selecting some keywords in the comments, extracting word vectors corresponding to the keywords, and taking the word vectors as vector representations of the comments.

In the embodiment, word vectors of comment data are obtained by performing word segmentation, coding and feature extraction on the comment data, vector representation of the comment data in a word vector space is obtained, a basis is provided for subsequent semantic extraction, labeling and clustering analysis on the comment data, and accuracy of target application upgrading transformation direction prediction is improved.

In one embodiment, the weighted fusion model of comment data includes an LSTM model and a CRF model, and semantic feature extraction is performed on word vectors of comment data based on the weighted fusion model of comment data trained in advance to obtain comment objects and comment properties corresponding to the comment data, including: based on an LSTM model in a weighted fusion model of the pre-trained comment data, extracting semantic features of word vectors of the comment data to obtain comment objects of the comment data and semantic features of the comment objects; and labeling semantic features of the comment objects based on a CRF model in a weighted fusion model of the pre-trained comment data to obtain comment properties corresponding to the comment data.

The word vector of comment data is the input of feature engineering, and LSTM (long-short term memory network) and CRF (conditional random field) are two models commonly used for sequence modeling and labeling tasks, and the input and feature engineering have the following relations:

inputs of LSTM: LSTM is a variant of Recurrent Neural Network (RNN) for processing sequence data. Its input is a sequence of data, such as a word sequence of text, an audio sequence of speech, etc. The elements in each sequence may be encoded into a vector representation, such as using Word embedding to convert text words into a vector representation, and such as using Word2vec to convert comment data into Word vectors.

Input of CRF: CRF is a probability map model used for sequence labeling tasks such as named entity recognition, part-of-speech labeling and the like. The input to the CRF is a sequence of data, typically a sequence of features extracted by LSTM or other model. These features may include parts of speech, word context, syntactic information, etc. for describing the features of each location in the sequence, i.e. the nature of the comments of the individual comment objects.

Wherein, the feature engineering plays an important role in the sequence modeling and labeling task. Feature extraction and conversion of the raw sequence data is typically required before using the LSTM or other model. For example, in a text sequence, part-of-speech tagging, bag of words representation, TF-IDF feature extraction, etc. may be performed. These features can help the model capture structural and semantic information in the sequence. Feature engineering plays a key role before LSTM and CRF are used, which can help capture important features in sequences and provide meaningful input sequences. The good feature engineering design can improve the performance of the model, improve the modeling capability of the model on the sequence data, and further improve the accuracy and generalization capability of the sequence labeling task.

The association between LSTM and CRF is that LSTM can be used as a feature extractor to extract semantic features in the sequence data. These feature sequences can be used as inputs to the CRF for modeling sequence labeling tasks. LSTM is able to learn long-term dependencies in the sequence, while CRF uses these features to build a globally annotated conditional probability model. Thus, LSTM and CRF are commonly used in combination to construct an end-to-end sequence modeling system that learns feature representations from input sequences and makes annotation predictions.

In the embodiment, the word vectors of the comment data are subjected to semantic feature extraction by using a weighted fusion model of the comment data trained in advance, so that comment objects of the comment data and semantic features of the comment objects are obtained, the semantic features of the comment objects are labeled, comment properties corresponding to the comment data are obtained, and the comment properties can be accurately extracted from the comment data.

In one embodiment, performing cluster analysis on comment objects to obtain a set of comment objects includes: randomly dividing comment objects to obtain at least two groups of comment objects; selecting one comment object from at least two groups of comment objects as a target comment object respectively; and carrying out cluster analysis on each target comment object to obtain a set of comment objects.

Wherein, the clustering analysis refers to a machine learning technology for grouping data points according to a certain rule, the clustering analysis belongs to a method in unsupervised learning, and is also a common technology for statistical data analysis in many fields, and the types of the clustering analysis include: k-means clustering, mean-shift clustering algorithm, density-based noise application spatial clustering DBSCAN, and the like,

specifically, taking cluster analysis as K-means cluster analysis as an example, randomly dividing a plurality of comment objects into at least two groups of comment objects, randomly selecting one comment object from the at least two groups of comment objects as a target comment object, namely, a clustering center, calculating the distance between the clustering center of each group and other unselected comment objects, and re-dividing the plurality of comment objects into at least two groups of comment objects until no (or the minimum number of) comment objects are re-allocated to different groups, no (or the minimum number of) cluster centers are changed again, and obtaining the set of comment objects.

The comment object set comprises at least two comment objects with the same or similar comment categories, and for the comment data of the bank APP, the comment objects comprise a function A1 and a function A2 which belong to the comment object set A.

In the embodiment, the comment objects are clustered by adopting K-means clustering, so that the comment objects are classified rapidly, and a set of comment objects is obtained.

In one embodiment, based on scoring results of comment properties, sorting the set of comment objects to obtain an evaluation result of the target application includes: weighting the sorting results of the comment properties according to the scoring results of the comment data to obtain scoring results of the comment properties; scoring each comment object under the set of comment objects based on scoring results of comment properties to obtain scoring results of each comment object; and sorting the set of comment objects according to the scoring result of each comment object to obtain the evaluation result of the target application.

The scoring result of the comment data may be a separate score of the comment data in the application store, or may be an overall score of the target application in each application store, for example, the separate score may be a score in a certain piece of comment data of the target application, and the overall score is an overall score of comment data of the target application in the application store.

The scores may be normalized to convert 5-minute, 10-minute, and 100-minute scores to scores of 0-1. And the scoring of the comment data is convenient to analyze.

The comment properties can be ranked from positive evaluation to negative evaluation according to the comment property semantic degree, and a ranking result of the comment properties is obtained.

Specifically, the ranking results of the comment properties are weighted according to the individual scores and/or the overall scores of the comment data, and scoring results of the comment properties are obtained.

After the scoring result of the comment property is obtained, the comment property is associated with the comment object, and further, the scoring result of the comment object is obtained according to the scoring result of the comment property, and specifically, the scoring result of the collection of the comment objects can be calculated for the scoring result of each comment object.

And sequencing scoring results of the collection of the comment objects to obtain an evaluation result of the target application. It should be noted that, the evaluation result of the target application not only includes the ranking condition of scoring of the large category of the aggregate of comment objects, which can provide the developer with a large direction of improvement of the target application, but also includes the ranking condition of scoring of each comment object under each aggregate, which can refine the small direction of improvement of the target application.

In this embodiment, the ranking results of the comment properties are weighted according to the scoring results of the comment data, and the set of comment objects is ranked according to the scoring results of the comment objects, so that a more accurate evaluation result for the target is obtained, and an improvement sequence of the target application and key objects needing improvement are provided for the developer.

In one embodiment, as shown in fig. 3, a training method of a weighted fusion model of comment data includes:

s302, acquiring weighted fusion parameters of an LSTM model and a CRF model;

the weighted fusion parameters can be a set of super-parameters and feature combinations in the model, the weighted fusion parameters can be adjusted, the super-parameters in the model are changed, different feature combinations are used, each weighted fusion parameter is ensured to train under one data set, and the performance of each weighted fusion parameter is evaluated.

S304, obtaining a predicted comment object and predicted comment properties according to the LSTM model and the CRF model and comment sample data of the target application;

and processing comment sample data of the target application according to the initial LSTM model and the initial CRF model to obtain a predicted comment object and predicted comment properties.

S306, judging whether the comment sample data manual labeling results of the predicted comment objects and the target application are successfully matched, if so, executing S308, and if not, adjusting the weighted fusion parameters, and executing S304.

And S308, judging that the predicted comment property is not successfully matched with the comment sample data manual labeling result of the target application, if the matching is successful, executing S310, and if the matching is unsuccessful, adjusting the weighted fusion parameters, and executing S304.

The manual annotation result comprises the annotation result of the comment object and the annotation result of the comment property, and is marked for comment sample data based on the judgment of the testers, so that the manual annotation result is obtained.

The method can adopt a similarity matching mode to judge whether the comment object is successfully matched with the comment object in the comment sample data manual labeling result of the target application or not, and judge whether the comment property is successfully matched with the labeling result of the comment property in the comment sample data manual labeling result of the target application or not.

S310, saving the weighted fusion parameters of the training.

And the weighted fusion parameters are stored as inherent parameters of the LSTM model and the CRF model, so that training of the model is completed.

After training, the model is continuously fine-tuned and fused according to the actual test conditions, and the weighted fusion parameters are fine-tuned, so that the suitability of the model for data is improved, and the accuracy is improved.

In the embodiment, the weighted fusion parameters of the model are obtained by comparing the prediction labeling result and the manual labeling result, so that the fusion effect of the model is improved, and the accuracy of model prediction is further improved.

The downloading amount of the existing APP in each big APP store is increased, scoring and comments about a certain APP are provided in each big APP store, and the comments are different, so that analysis is required for the comments, so that negative language and defect functions in APP use can be quickly screened and extracted, and thus, when the working performance is optimized, targeted development and innovation can be realized, and in one embodiment, as shown in fig. 4, a comment extraction and analysis method of the APP store is provided, which comprises the following steps:

input data portion:

s402, comment data of the target application are obtained.

A data preprocessing section:

s404, word segmentation is carried out on the comment data, and comment data after word segmentation is obtained.

The characteristic engineering part:

s406, coding and extracting features of the comment data after word segmentation to obtain word vectors of the comment data.

S408, preprocessing the comment data in the target application to obtain word vectors of the comment data.

Output result part:

s410, carrying out semantic feature extraction on word vectors of comment data based on an LSTM model in a weighted fusion model of the pre-trained comment data to obtain comment objects of the comment data and semantic features of the comment objects.

The weighted fusion model of the comment data comprises an LSTM model and a CRF model.

S412, marking semantic features of the comment objects based on a CRF model in the weighted fusion model of the pre-trained comment data, and obtaining comment properties corresponding to the comment data.

S414, randomly dividing the comment objects to obtain at least two groups of comment objects.

S416, selecting one comment object from at least two groups of comment objects as a target comment object respectively;

s418, performing cluster analysis on each target comment object to obtain a set of comment objects.

Wherein the set of comment objects includes at least two comment objects.

S420, weighting the sorting results of the comment properties according to the scoring results of the comment data to obtain the scoring results of the comment properties.

S422, scoring each comment object under the set of comment objects based on scoring results of comment properties to obtain scoring results of each comment object;

S424, sorting the set of comment objects according to the scoring result of each comment object to obtain the evaluation result of the target application.

The evaluation result of the target application comprises comment top views and information, namely comment objects and comment properties which are ranked the most forward.

Model training part:

s426, acquiring weighted fusion parameters of an LSTM model and a CRF model;

S428, obtaining a predicted comment object and predicted comment properties according to the LSTM model and the CRF model and comment sample data of the target application;

S430, judging whether the comment sample data manual labeling results of the predicted comment objects and the target application are successfully matched, if so, executing S432, and if not, adjusting the weighted fusion parameters and executing S428.

S432, judging that the predicted comment property is not successfully matched with the comment sample data manual labeling result of the target application, if the matching is successful, executing S434, if the matching is not successful, adjusting the weighted fusion parameters, and executing S428.

S434, the weighted fusion parameters of the training are saved.

In the embodiment, the comment data in the target application is preprocessed to obtain word vectors of the comment data, the comment data is expressed in a vector form, the subsequent analysis of characteristics in the comment data is facilitated, the semantic feature extraction is performed on the word vectors of the comment data based on a weighted fusion model of the trained comment data to obtain comment objects and comment properties corresponding to the comment data, the attention is paid to the semantics of the comment data based on the weighted fusion model, accurate comment objects and comment properties are extracted, the comment objects are subjected to clustering analysis to obtain a set of comment objects, the comment objects are clustered from a large number of comment data, a plurality of comment objects are classified into a plurality of main types, upgrading maintenance time of developers is saved, the set of comment objects is ordered based on scoring results of the comment properties to obtain evaluation results of the target application, a specific comment object type is screened out, and upgrading transformation of the target application is facilitated for the developers.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a comment data processing device for realizing the comment data processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the processing device for comment data provided below may refer to the limitation of the processing method for comment data hereinabove, and will not be described herein.

In one embodiment, as shown in fig. 5, there is provided a comment data processing apparatus including: a preprocessing module 502, a processing module 504, a classification module 506, and a ranking module 508, wherein:

the preprocessing module 502 is configured to preprocess comment data in a target application to obtain a word vector of the comment data;

the processing module 504 is configured to perform semantic feature extraction on word vectors of comment data based on a weighted fusion model of the pre-trained comment data, so as to obtain comment objects and comment properties corresponding to the comment data;

the classification module 506 is configured to perform cluster analysis on the comment objects to obtain a set of comment objects, where the set of comment objects includes at least two comment objects;

and the ordering module 508 is used for ordering the set of comment objects based on the scoring result of the comment property to obtain the evaluation result of the target application.

In one embodiment, the comment data processing apparatus further includes: the acquisition module is used for acquiring comment data of the target application; word segmentation is carried out on the comment data, and comment data after word segmentation are obtained; and coding and extracting features of the comment data after the word segmentation to obtain word vectors of the comment data.

In one embodiment, the weighted fusion model of the comment data includes an LSTM model and a CRF model, and the processing module 504 is further configured to perform semantic feature extraction on word vectors of the comment data based on the LSTM model in the weighted fusion model of the comment data that is trained in advance, to obtain a comment object of the comment data and semantic features of the comment object; and labeling semantic features of the comment objects based on a CRF model in a weighted fusion model of the pre-trained comment data to obtain comment properties corresponding to the comment data.

In one embodiment, the classification module 506 is further configured to randomly divide the comment objects into at least two groups of comment objects; selecting one comment object from at least two groups of comment objects as a target comment object respectively; and carrying out cluster analysis on each target comment object to obtain a set of comment objects.

In one embodiment, the ranking module 508 is further configured to weight the ranking result of the comment property according to the scoring result of the comment data, so as to obtain the scoring result of the comment property; scoring each comment object under the set of comment objects based on scoring results of comment properties to obtain scoring results of each comment object; and sorting the set of comment objects according to the scoring result of each comment object to obtain the evaluation result of the target application.

In one embodiment, the comment data processing apparatus further includes: the training module is used for acquiring weighted fusion parameters of the LSTM model and the CRF model; according to the LSTM model, the CRF model and comment sample data of the target application, obtaining a predicted comment object and predicted comment properties; if the comment sample data manual labeling results of the predicted comment object and the target application are not successfully matched, or if the comment property of the predicted comment and the comment sample data manual labeling results of the target application are not successfully matched, adjusting the weighted fusion parameters, retraining until the comment sample data manual labeling results of the predicted comment object and the target application are successfully matched, and storing the weighted fusion parameters of the training.

The respective modules in the processing apparatus for evaluation data described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing the evaluation result data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of processing comment data.

It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

based on a weighted fusion model of the comment data trained in advance, extracting semantic features of word vectors of the comment data to obtain comment objects and comment properties corresponding to the comment data;

and sorting the set of comment objects based on the scoring result of the comment property to obtain the evaluation result of the target application.

In one embodiment, the processor when executing the computer program further performs the steps of:

Comment data of a target application is obtained; word segmentation is carried out on the comment data, and comment data after word segmentation are obtained; and coding and extracting features of the comment data after the word segmentation to obtain word vectors of the comment data.

based on an LSTM model in a weighted fusion model of the pre-trained comment data, extracting semantic features of word vectors of the comment data to obtain comment objects of the comment data and semantic features of the comment objects; and labeling semantic features of the comment objects based on a CRF model in a weighted fusion model of the pre-trained comment data to obtain comment properties corresponding to the comment data.

randomly dividing comment objects to obtain at least two groups of comment objects; selecting one comment object from at least two groups of comment objects as a target comment object respectively; and carrying out cluster analysis on each target comment object to obtain a set of comment objects.

weighting the sorting results of the comment properties according to the scoring results of the comment data to obtain scoring results of the comment properties; scoring each comment object under the set of comment objects based on scoring results of comment properties to obtain scoring results of each comment object; and sorting the set of comment objects according to the scoring result of each comment object to obtain the evaluation result of the target application.

acquiring weighted fusion parameters of an LSTM model and a CRF model; according to the LSTM model, the CRF model and comment sample data of the target application, obtaining a predicted comment object and predicted comment properties; if the comment sample data manual labeling results of the predicted comment object and the target application are not successfully matched, or if the comment property of the predicted comment and the comment sample data manual labeling results of the target application are not successfully matched, adjusting the weighted fusion parameters, retraining until the comment sample data manual labeling results of the predicted comment object and the target application are successfully matched, and storing the weighted fusion parameters of the training.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of processing comment data, the method comprising:

2. The method of claim 1, wherein preprocessing the comment data in the target application to obtain a word vector of the comment data comprises:

comment data of a target application is obtained;

cutting words from the comment data to obtain comment data after cutting words;

3. The method according to claim 1, wherein the weighted fusion model of the comment data includes an LSTM model and a CRF model, the semantic feature extraction is performed on word vectors of the comment data based on the weighted fusion model of the pre-trained comment data, to obtain comment objects and comment properties corresponding to the comment data, including:

4. The method of claim 3, wherein said performing a cluster analysis on said assessment objects results in a collection of said assessment objects, comprising:

5. The method of claim 4, wherein the ranking the set of rating objects based on the scoring results of the comment properties to obtain the rating results of the target application comprises:

6. A method according to claim 3, wherein the training method of the weighted fusion model of comment data comprises:

acquiring weighted fusion parameters of the LSTM model and the CRF model;

7. A comment data processing apparatus, characterized in that the apparatus includes:

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.