CN114266596A - Electronic commerce public opinion data processing method, device, storage medium and electronic equipment - Google Patents

Electronic commerce public opinion data processing method, device, storage medium and electronic equipment Download PDF

Info

Publication number
CN114266596A
CN114266596A CN202111580784.6A CN202111580784A CN114266596A CN 114266596 A CN114266596 A CN 114266596A CN 202111580784 A CN202111580784 A CN 202111580784A CN 114266596 A CN114266596 A CN 114266596A
Authority
CN
China
Prior art keywords
data
public opinion
classification
commerce
evaluated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111580784.6A
Other languages
Chinese (zh)
Inventor
刘岩
王三鹏
林睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202111580784.6A priority Critical patent/CN114266596A/en
Publication of CN114266596A publication Critical patent/CN114266596A/en
Pending legal-status Critical Current

Links

Images

Abstract

The disclosure provides an E-commerce public opinion data processing method, an E-commerce public opinion data processing device, a storage medium and electronic equipment, and relates to the technical field of data processing. The E-commerce public opinion data processing method comprises the following steps: classifying the E-commerce public opinion data based on a preset classification rule, and determining the data category of the E-commerce public opinion data; when the data category of the E-commerce public opinion data is the E-commerce activity information category, performing first preprocessing on the E-commerce public opinion data to obtain line newspaper data to be evaluated; when the data category of the E-commerce public opinion data is other categories, obtaining effective data to be evaluated by carrying out effectiveness analysis on the E-commerce public opinion data; and determining the risk level of the line report data to be evaluated and/or the effective data to be evaluated through a risk level evaluation model. The risk of the E-commerce public opinion data is more finely monitored through multi-stage processing such as classification, analysis and evaluation.

Description

Electronic commerce public opinion data processing method, device, storage medium and electronic equipment
Technical Field
The disclosure relates to the technical field of data processing, and in particular, to a method and an apparatus for processing e-commerce public opinion data, a computer readable storage medium, and an electronic device.
Background
The internet is a large and complex system containing a large amount of internet public opinion data which can be propagated through the network, and the need for risk monitoring of these data is derived in order to realize further understanding of these data and to take appropriate countermeasures.
In the related art, text information of public opinion data in a network is usually converted into feature vectors containing semantic information, and the obtained feature vectors are subjected to a deep learning model to determine the risk status of the public opinion data. Because the public sentiment data in the network contain various content types, the public sentiment data are subjected to risk grade classification singly, so that the model processing difficulty is higher, and the risk monitoring is not accurate.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The disclosure provides an e-commerce public opinion data processing method, an e-commerce public opinion data processing device, a computer readable storage medium and an electronic device, thereby solving the problem of inaccurate risk monitoring of e-commerce public opinion data in related technologies at least to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the disclosure, there is provided an e-commerce public opinion data processing method, including: classifying E-commerce public opinion data based on a preset classification rule, and determining the data category of the E-commerce public opinion data; when the data category of the E-commerce public opinion data is an E-commerce activity information category, performing first preprocessing on the E-commerce public opinion data to obtain line report data to be evaluated; when the data category of the E-commerce public opinion data is other categories, obtaining effective data to be evaluated by carrying out effectiveness analysis on the E-commerce public opinion data; and determining the risk level of the to-be-evaluated wire-report data and/or the to-be-evaluated effective data through a risk level evaluation model.
In an exemplary embodiment of the disclosure, when the data category of the e-commerce public opinion data is an e-commerce activity information category, obtaining line newspaper data to be evaluated by performing a first preprocessing on the e-commerce public opinion data, includes: and when the data category of the E-commerce public opinion data is the E-commerce activity information category, performing link analysis and keyword replacement pretreatment on the E-commerce public opinion data to obtain the to-be-evaluated wire-newspaper data.
In an exemplary embodiment of the disclosure, before performing link parsing and keyword replacement preprocessing on the e-commerce public opinion data to obtain the wire newspaper data to be evaluated, the method further includes: and carrying out data format preprocessing on the E-commerce public opinion data to obtain standardized E-commerce public opinion data.
In an exemplary embodiment of the present disclosure, the method further comprises: and triggering risk data early warning when the link analysis result of the E-commerce public opinion data meets a preset condition.
In an exemplary embodiment of the present disclosure, the preset condition includes that the frequency of occurrence of the e-commerce activity object corresponding to the link parsing result exceeds a preset threshold
In an exemplary embodiment of the present disclosure, the determining a risk level classification result of the data to be evaluated and/or the valid data to be evaluated by the risk level evaluation model includes: respectively adopting each word vectorization submodel in the first word vector layer to carry out word vectorization processing on the line newspaper data to be evaluated and/or the effective data to be evaluated to obtain word vector data to be evaluated corresponding to each word vectorization submodel in the first word vector layer; classifying the word vector data to be evaluated corresponding to each word vectorization submodel in the first word vector layer by respectively adopting each classification submodel in the first classification layer to obtain a risk grade classification result corresponding to each classification submodel in the first classification layer; and weighting the risk grade classification results corresponding to each classification submodel in the first classification layer to obtain the risk grade of the to-be-evaluated wire-report data and/or the to-be-evaluated effective data.
In an exemplary embodiment of the disclosure, when the data category of the e-commerce public opinion data is other category, obtaining effective data to be evaluated by performing effectiveness analysis on the e-commerce public opinion data, including: when the data type of the E-commerce public opinion data is other types, performing second preprocessing on the E-commerce public opinion data to obtain second preprocessing data; performing effectiveness classification on the second preprocessed data through an effectiveness classification model; and when the validity classification result of the second preprocessing data is valid, taking the second preprocessing data as the valid data to be evaluated.
In an exemplary embodiment of the disclosure, the effectiveness classification model includes a second word vector layer and a second classification layer, and the effectiveness classification of the second preprocessed data by the effectiveness classification model includes: respectively adopting each word vectorization submodel in the second word vector layer to carry out word vectorization processing on the second preprocessed data to obtain network word vector data corresponding to each word vectorization submodel in the second word vector layer; classifying network word vector data corresponding to each word vectorization submodel in the second word vector layer by respectively adopting each classification submodel in the second classification layer to obtain an effectiveness classification result corresponding to each classification submodel in the second classification layer; and weighting the effectiveness classification results corresponding to the classification submodels in the second classification layer to obtain the effectiveness classification result of the second preprocessing data.
According to a second aspect of the present disclosure, there is provided an electronic commerce public opinion data processing apparatus comprising: the data classification module is used for classifying the E-commerce public opinion data based on a preset classification rule and determining the data category of the E-commerce public opinion data; the first data determination module is used for performing first preprocessing on the E-commerce public opinion data to obtain to-be-evaluated line newspaper data when the data category of the E-commerce public opinion data is an E-commerce activity information category; the second data determination module is used for carrying out effectiveness analysis on the E-commerce public opinion data to obtain effective data to be evaluated when the data category of the E-commerce public opinion data is other categories; and the risk evaluation module is used for determining the risk level of the to-be-evaluated wire-report data and/or the to-be-evaluated effective data through a risk level evaluation model.
According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method for processing e-commerce public opinion data.
According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the above-mentioned method for processing E-commerce public opinion data via executing the executable instructions.
The technical scheme of the disclosure has the following beneficial effects:
in the process of processing the E-commerce public opinion data, classifying the E-commerce public opinion data based on a preset classification rule, determining the data category of the E-commerce public opinion data, and when the data category of the E-commerce public opinion data is an E-commerce activity information category, performing first preprocessing on the E-commerce public opinion data to obtain line newspaper data to be evaluated; when the data category of the E-commerce public opinion data is other categories, obtaining effective data to be evaluated by carrying out effectiveness analysis on the E-commerce public opinion data; and determining the risk level of the line report data to be evaluated and/or the effective data to be evaluated through a risk level evaluation model. On the one hand, the data processing levels and categories are refined through the multi-stage processing of data classification, analysis, evaluation and the like, the processing difficulty and the data calculation amount of the risk level evaluation model are reduced, the risk level evaluation of the E-commerce public opinion data can be more accurate, and therefore the intelligent monitoring of the E-commerce public opinion data is achieved. On the other hand, when the E-commerce public opinion data is E-commerce activity information type data, the first preprocessing operation on the E-commerce public opinion data is beneficial to improving the quality of the data, mining the implicit information of the line newspaper data, and improving the accuracy of risk level assessment.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is apparent that the drawings in the following description are only some embodiments of the present disclosure, and that other drawings can be obtained from those drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart illustrating an e-commerce public opinion data processing method in the present exemplary embodiment;
FIG. 2 is a flow chart illustrating one method of obtaining valid data to be evaluated in the exemplary embodiment;
FIG. 3 illustrates a flow chart of classifying the effectiveness of second pre-processed data in the present exemplary embodiment;
FIG. 4 is a diagram illustrating a two-part classification integration model architecture in the exemplary embodiment;
fig. 5 is a flowchart illustrating a method for determining a risk level classification result of line report data to be evaluated and/or effective data to be evaluated in the exemplary embodiment;
fig. 6 is a view showing a structure of processing data of the public opinion electricity company in the present exemplary embodiment;
fig. 7 is a block diagram showing a configuration of an e-commerce public opinion data processing apparatus in the present exemplary embodiment;
fig. 8 shows an electronic device for implementing the above method in the present exemplary embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
Herein, "first", "second", etc. are labels for specific objects, and do not limit the number or order of the objects.
In the related art, public opinion data in a network has various content types, and especially in the e-commerce public opinion data, a large amount of data with long length but no semantic information, such as link data and key data, often exist in some synonymous key fields, and are usually treated as ordinary text data, and the public opinion data not only has poor representation capability, but also easily ignores information hidden behind links in the public opinion data, and cannot judge the hidden risk of the public opinion data. In the related technology, when text information of public opinion data in a network is converted into feature vectors containing semantic information, and the obtained feature vectors are subjected to a deep learning model to judge the risk condition of the public opinion data, because an evaluation result is closely related to linguistic data for word vector conversion, a single classification structure is adopted, so that the generalization capability in a specific field is not ideal, and the classification evaluation of parallel classes is directly carried out on various types of public opinion data, so that the difficulty is increased for model processing.
In view of one or more of the above problems, the exemplary embodiments of the present disclosure provide an e-commerce public opinion data processing method, which can perform intelligent monitoring analysis on e-commerce public opinion data.
Fig. 1 shows a schematic flow of the e-commerce public opinion data processing in the present exemplary embodiment, including the following steps S110 to S140:
step S110, classifying the E-commerce public opinion data based on a preset classification rule, and determining the data category of the E-commerce public opinion data;
step S120, when the data type of the E-commerce public opinion data is the E-commerce activity information type, carrying out first preprocessing on the E-commerce public opinion data to obtain line report data to be evaluated;
step S130, when the data category of the E-commerce public opinion data is other categories, obtaining effective data to be evaluated by carrying out effectiveness analysis on the E-commerce public opinion data;
step S140, determining the risk level of the to-be-evaluated wire-report data and/or the to-be-evaluated effective data through a risk level evaluation model.
In the above-mentioned public opinion data processing process of electricity merchant, on the one hand, through the multi-stage processing such as classification, analysis and aassessment to the data, the level and the classification of having refined data processing have not only reduced the processing degree of difficulty and the data calculation volume of risk grade evaluation model, but also can make the risk grade aassessment to public opinion data of electricity merchant more accurate to realize the intelligent monitoring to public opinion data of electricity merchant. On the other hand, when the E-commerce public opinion data is E-commerce activity information type data, the first preprocessing operation on the E-commerce public opinion data is beneficial to improving the quality of the data, mining the implicit information of the line newspaper data, and improving the accuracy of risk level assessment.
Each step in fig. 1 will be described in detail below.
And step S110, classifying the E-commerce public opinion data based on a preset classification rule, and determining the data category of the E-commerce public opinion data.
The e-commerce public opinion data refers to public opinion and information related data of an e-commerce platform, and can be e-commerce related data such as chatting record data, commodity detail link data, coupon link data, e-commerce community notification data, key data and the like related to commodities. The preset classification rule is a classification strategy based on rules, and can be a classification task based on text keywords and regular matching.
It should be noted that the preset classification rules can be divided into at least two categories. When the preset rule is a two-classification task, the E-commerce public opinion data can be classified into E-commerce activity information categories and other categories; when the preset classification rule is a three-classification task, the E-commerce public opinion data can be classified into an E-commerce activity information category, a virtual product sale category and other categories.
The other categories are categories formed by E-commerce public opinion data sets which do not belong to specific categories, and can include common chat records, expressions, pictures, community notifications and the like. When the E-commerce public opinion data is classified into E-commerce activity information categories and other categories, the other categories are categories which do not belong to the E-commerce activity information categories; when the E-commerce public opinion data is classified into an E-commerce activity information category, a virtual product sale category and other categories, the other categories are categories which do not belong to the E-commerce activity information category and the virtual product sale category.
The above-mentioned category of the e-commerce activity information is directed to e-commerce activity data such as a bonus, a free gift, a free trial, a social forum activity, a bonus survey, a free resource, etc., and may be chat data including a short link of a commodity or a coupon link. The virtual product sale category is virtual product data for selling software forum account numbers, e-commerce agency coupons, forum credit rechargeable cards, personal enterprise rechargeable codes and the like based on an automatic card issuing platform, wherein the automatic card issuing platform is a virtual commodity platform sale mode.
And step S120, when the data type of the E-commerce public opinion data is the E-commerce activity information type, performing first preprocessing on the E-commerce public opinion data to obtain the line report data to be evaluated.
The first preprocessing operation refers to a series of data processing operations performed on the data of the e-commerce public opinion belonging to the online reporting activity, and aims to improve the data quality so as to facilitate the subsequent risk assessment.
In an optional implementation mode, when the data category of the e-commerce public opinion data is an e-commerce activity information category, the data to be evaluated is obtained by performing first preprocessing on the e-commerce public opinion data, and the data to be evaluated can be specifically processed in the following manner: and when the data category of the E-commerce public opinion data is the E-commerce activity information category, performing link analysis and keyword replacement pretreatment on the E-commerce public opinion data to obtain the line newspaper data to be evaluated.
The link included in the e-commerce public opinion data may be a URL (Uniform Resource Locator) of the product, a product coupon URL, a URL of a product competitor, a picture URL, or other network link addresses. The link is analyzed, so that the source of the URL in the E-commerce activity information type E-commerce public opinion data can be traced, and information such as SKU (Stock Keeping Unit) of goods behind the URL, the batch number of the coupon and the like can be obtained. A SKU herein represents a collection of items that are identical in size, color, product, model, etc.
Keyword replacement may include synonym replacement, URL replacement, key replacement, and the like. The synonym replacement can comprise synonymy replacement of the e-commerce platform name and the like so as to unify the expression of the same entity name, and the URL replacement and the key replacement are to replace a lengthy URL character string and a key character string without semantic information with a short text object so as to improve the representation capability of data on the semantic information.
In the process, when the first preprocessing operation is carried out, the mining of the information behind the link is carried out on the E-commerce public opinion data belonging to the E-commerce activity information category, and the keyword replacement operation is carried out, so that the quality of the E-commerce public opinion data can be improved, and the accuracy of the subsequent model classification and identification is further improved.
In an optional implementation mode, before link analysis and keyword replacement preprocessing are carried out on the e-commerce public opinion data to obtain the line newspaper data to be evaluated, data format preprocessing can be carried out on the e-commerce public opinion data to obtain standardized e-commerce public opinion data.
The data format preprocessing can comprise the unification of punctuation marks, the processing of messy code characters, the capital and small case processing of English and the like, so that the E-commerce public opinion data format is standardized, when the link analysis and the replacement operation are carried out, the related data can be accurately identified, the identification error or omission is avoided, and the accuracy and the efficiency of the subsequent processing are improved.
In an optional implementation manner, the risk data early warning may be triggered when the link parsing result of the e-commerce public opinion data meets a preset condition.
The link analysis result refers to the analysis result of the URL link data in the e-commerce public opinion data of the e-commerce activity information category, and may be data related to e-commerce activity objects (e.g., product SKU, coupon lot number, etc.).
The preset condition may be that the frequency of occurrence of the e-commerce activity object corresponding to the link analysis result exceeds a preset threshold, for example, the frequency of occurrence of a SKU in unit time exceeds a certain threshold, and risk data early warning may be triggered.
It should be noted that the preset condition may also be configured by the service provider according to its own needs, and different thresholds are set according to different numerical characteristics of the analyzed e-commerce activity object.
And step S130, when the data type of the E-commerce public opinion data is other types, obtaining effective data to be evaluated by carrying out effectiveness analysis on the E-commerce public opinion data.
The effectiveness analysis refers to the operation of classifying the effectiveness of other categories of E-commerce public opinion data so as to filter invalid data in the other categories of E-commerce public opinion data. The effective data to be evaluated refers to effective E-commerce public opinion data obtained through effectiveness binary classification operation.
In an alternative embodiment, when the data category of the e-commerce public opinion data is other category, obtaining effective data to be evaluated by performing effectiveness analysis on the e-commerce public opinion data may be implemented by the steps shown in fig. 2, and specifically includes the following steps S210 to S230:
step S210, when the data type of the E-commerce public opinion data is other types, carrying out second preprocessing on the E-commerce public opinion data to obtain second preprocessing data;
step S220, carrying out effectiveness classification on the second preprocessing data through an effectiveness classification model;
in step S230, when the validity classification result of the second preprocessed data is valid, the second preprocessed data is used as valid data to be evaluated.
The process can filter out invalid data based on the effectiveness classification result, and determine effective data needing to be further evaluated.
The second preprocessing can include preprocessing such as data format, entity abstraction, can improve the standardization of data expression, and then improves the accuracy of the public opinion data validity analysis of the electricity merchant, reduces the classification and identification difficulty of validity classification models.
In an alternative embodiment, the validity classification model includes a second word vector layer and a second classification layer, and the validity classification of the second preprocessed data by the validity classification model may be implemented by the steps shown in fig. 3, which specifically include the following steps S310 to S330:
step S310, respectively adopting each word vectorization submodel in the second word vector layer to carry out word vectorization processing on the second preprocessed data to obtain network word vector data corresponding to each word vectorization submodel in the second word vector layer;
step S320, adopting each classification submodel in the second classification layer respectively, and carrying out classification processing on the network word vector data corresponding to each word vectorization submodel in the second word vector layer to obtain an effectiveness classification result corresponding to each classification submodel in the second classification layer;
step S330, the effectiveness classification results corresponding to the classification submodels in the second classification layer are weighted to obtain the effectiveness classification results of the second preprocessing data.
The network word vector data refers to data obtained by performing word vectorization on the second preprocessed data, and is output data of the second word vector layer.
The effectiveness classification model belongs to a classification integration model and comprises a second word vector layer and a second classification layer. The second Word vector layer may include a plurality of Word vectorization submodels, which may include Word2Vec, BERT, ERNIE, and other Word vectorization submodels. The second classification Layer may include a plurality of classification submodels, which may be designed as a binary model to classify the input second preprocessed data into valid and invalid types, which may include CNN (convolutional Neural Network), LSTM (Long Short-Term Memory ), Attention, MLP (Multi-Layer Perceptron, Multi-Layer Perceptron Neural Network), RCNN (regional convolutional Neural Network), and other classification submodels.
It should be noted that Word2vec refers to a Word vectorization representation method, and the obtained Word vector includes information such as Word order semantics; BERT is a text pre-training model, and efficient vector representation of text information can be obtained by performing unsupervised learning on mass data; ERNIE belongs to a text pre-training model that incorporates knowledge-maps. The CNN performs convolution calculation by using a one-dimensional convolution kernel in the field of natural language processing to obtain the numerical expression of the text; the LSTM encodes the continuous text sequence in the natural language processing field, thereby obtaining the numerical expression of the text; the Attention model is a deep learning method, which learns parameters according to input contents and achieves the effect of paying Attention to important words by giving different weights to the input contents; the MLP encodes the features in a full-connected manner; RCNN is a cascade model of RNN (Current Neural Network Recurrent Neural Network) plus CNN, where the RNN unit used may be LSTM. And all the word vectorization submodels in the second word vector layer are kept completely the same in function and input and output interfaces, and all the classification submodels in the second classification layer are kept completely the same in function and input and output interfaces.
The effectiveness classification model has a bipartite graph structure, as shown in fig. 4, the second preprocessed data is used as a text data 401 input model, the outputs of each Word vectorization submodel Word2Vec 402, BERT 403, ERNIE 404 in the second Word vector layer are respectively used as the inputs of each classification submodel CNN 405, LSTM 406, Attention 407, MLP 408, RCNN 409 in the second classification layer, different Word vector submodels are combined with the classification submodels, a weighting voting mechanism 410 is used to perform weighting operation on the effectiveness classification results output by each classification submodel in the second classification layer and the weighting coefficients of the corresponding combination, and finally the obtained effectiveness classification results are used as a text type 411. It should be noted that the weight coefficients herein can be obtained by continuous learning and training of the effectiveness classification model.
In the process, the effectiveness classification model adopts an integrated classification model, so that the generalization capability of the model is improved, and the discrimination capability of the data effectiveness classification result is enhanced.
Step S140, determining the risk level of the to-be-evaluated wire-report data and/or the to-be-evaluated effective data through a risk level evaluation model.
The risk level evaluation model also belongs to a classification integration model, is used for evaluating the risk level of data, and can be divided into three types: high risk, medium risk, low risk three levels.
It should be noted that, in the actual implementation process, the risk levels may be classified more finely according to actual requirements, and are not specifically limited herein.
In an optional implementation manner, the risk level assessment model includes a first word vector layer and a first classification layer, and the risk level assessment model determines a risk level classification result of the to-be-assessed wire-report data and/or the to-be-assessed valid data, which may be implemented by the steps shown in fig. 5, specifically including the following steps S510 to S530:
step S510, respectively adopting each word vectorization submodel in the first word vector layer to perform word vectorization processing on the line newspaper data to be evaluated and/or the effective data to be evaluated to obtain word vector data to be evaluated corresponding to each word vectorization submodel in the first word vector layer;
step S520, classifying the word vector data to be evaluated corresponding to each word vectorization submodel in the first word vector layer by respectively adopting each classification submodel in the first classification layer to obtain a risk grade classification result corresponding to each classification submodel in the first classification layer;
step S530, weighting the risk grade classification results corresponding to each classification submodel in the first classification layer to obtain the risk grade of the to-be-evaluated wire-report data and/or the to-be-evaluated effective data.
The risk level evaluation model comprises a first word vector layer and a first classification layer, wherein the first word vector layer can comprise a plurality of word vectorization submodels, the first classification layer can comprise a plurality of classification submodels, the classification submodels can be designed into three classification models, and the input wire report data to be evaluated and/or effective data to be evaluated are subjected to risk level classification. The function and the input and output interface of each word vectorization submodel in the first word vector layer are all kept the same, and the function and the input and output interface of each classification submodel in the first classification layer are all kept the same.
The risk level evaluation model may also adopt a bipartite graph structure as shown in fig. 4, the line report data to be evaluated and/or the effective data to be evaluated are input models of text data 401, the output of each Word vectorization submodel Word2Vec 402, BERT 403, ERNIE 404 in the first Word vector layer is respectively used as the input of each classification submodel CNN 405, LSTM 406, orientation 407, MLP 408, RCNN 409 in the first classification layer, different Word vector submodels are combined with the classification submodels, a weighted voting mechanism 410 is adopted to perform weighted operation on the risk level classification results output by each classification submodel in the first classification layer and the weight coefficients of the corresponding combinations, and finally the obtained risk level classification results are used as text types 411. It should be noted that the weight coefficients herein can be obtained by continuous learning and training adaptation of the risk level assessment model.
It should be noted that the structure of the two-part graph classification integration model shown in fig. 4 is applicable to both the effectiveness classification model and the risk level assessment model, the word vectorization submodels and the classification submodels included in the effectiveness classification model and the risk level assessment model described above are of the same type, and different types and different numbers of submodels can be selected for the effectiveness classification model and the risk level assessment model according to the requirements in the actual application process, which is not specifically limited herein.
In an optional implementation manner, when the risk level of the line report data to be evaluated or the valid data to be evaluated exceeds a preset risk level, risk data early warning is triggered.
For example, when the risk level is high risk, risk early warning is triggered to provide real-time early warning service so as to take timely countermeasures against the risk data in time.
Through the risk monitoring on the E-commerce public opinion data, the early warning of the risk conditions of selling the commodity in an extremely low or even 0-element mode and the like caused by commodity price marking errors or coupon configuration errors and the like can be realized.
Fig. 6 shows a structure diagram of data processing of e-commerce public opinion. The e-commerce public opinion data 601 is input to the message category classification model 602 to determine the data category of the e-commerce public opinion data, and the message category classification model 602 may classify the data into an e-commerce activity information category 603, a virtual product sale category 604, and other categories 605. When the data type of the e-commerce public opinion data is determined to be the e-commerce activity information type 603, the e-commerce public opinion data is subjected to the first preprocessing in the step S61 to obtain the line newspaper data to be evaluated. When the data category of the e-commerce public opinion data is determined to be the other category 605, after the second preprocessing of step S62, the invalid data is filtered by the validity classification model 606 to screen out valid data to be evaluated, and the validity classification model 606 can classify the data into two types, namely invalid 607 and valid 608. Inputting the obtained to-be-evaluated line report data and/or the screened to-be-evaluated effective data into a risk grade evaluation model 609 to evaluate the risk grade of the to-be-evaluated line report data and/or the screened to-be-evaluated effective data, classifying the data into three grade types of high risk 610, medium risk 611 and low risk 612 by the risk grade evaluation model 609, and executing S63 risk data early warning when the risk grade result is the high risk 610. In addition, a wire-report derivation branch is added, the step S64 of wire-report mining is performed on the e-commerce public opinion data belonging to the e-commerce activity information category 603, and the step S64 of wire-report mining may be to analyze links included in the e-commerce public opinion data and trigger the step S63 of risk data early warning according to the analysis result.
Through three-layer cascade of the message type classification model, the effectiveness classification model and the risk level assessment model, the complex public opinion data monitoring task is disassembled into three simpler classification tasks, and the difficulty of model training is reduced.
The accuracy and the recall rate index can be effectively improved through multi-level classification in the process of processing the E-commerce public opinion data, and as shown in table 1, the accuracy and the recall rate index of an effectiveness classification model and a risk level evaluation model in a multi-level classification strategy and a single-level classification strategy adopted in the process of processing the E-commerce public opinion data are compared. The single-layer classification strategy is to directly classify the E-commerce public opinion data by effectiveness and risk grade by adopting corresponding classification submodels. The single-group classification strategy refers to selecting a classification optimal solution from a sub-model combination consisting of a word vectorization sub-model in a word vector layer and a classification sub-model in a classification layer as a corresponding classification result.
TABLE 1
Figure BDA0003426955130000121
Figure BDA0003426955130000131
An exemplary embodiment of the present disclosure also provides an e-commerce public opinion data processing apparatus, as shown in fig. 7, the e-commerce public opinion data processing apparatus 700 may include: the method comprises the following steps:
the data classification module 710 is used for classifying the e-commerce public opinion data based on preset classification rules and determining the data category of the e-commerce public opinion data;
the first data determining module 720 is used for performing first preprocessing on the E-commerce public opinion data to obtain line newspaper data to be evaluated when the data category of the E-commerce public opinion data is the E-commerce activity information category;
the second data determining module 730 is used for obtaining effective data to be evaluated by performing effectiveness analysis on the E-commerce public opinion data when the data type of the E-commerce public opinion data is other types;
and the risk evaluation module 740 is configured to determine the risk level of the line report data to be evaluated and/or the effective data to be evaluated through the risk level evaluation model.
In an alternative embodiment, the first data determining module 720 may be configured to: and when the data category of the E-commerce public opinion data is the E-commerce activity information category, performing link analysis and keyword replacement pretreatment on the E-commerce public opinion data to obtain the line newspaper data to be evaluated.
In an optional implementation manner, before performing link parsing and keyword replacement preprocessing on the e-commerce public opinion data to obtain the wire newspaper data to be evaluated, the method further includes: and the format preprocessing module is used for carrying out data format preprocessing on the E-commerce public opinion data to obtain standardized E-commerce public opinion data.
In an optional implementation manner, the e-commerce public opinion data processing apparatus 700 further includes: and the early warning triggering module is used for triggering the early warning of the risk data when the link analysis result of the E-commerce public opinion data meets the preset condition.
In an optional implementation manner, the preset condition in the early warning triggering module includes that the frequency of occurrence of the e-commerce activity object corresponding to the link analysis result exceeds a preset threshold.
In an alternative embodiment, the risk level assessment model comprises a first word vector layer and a first classification layer, and the risk assessment module 740 may be configured to: respectively adopting each word vectorization submodel in the first word vector layer to carry out word vectorization processing on the line newspaper data to be evaluated and/or the effective data to be evaluated to obtain word vector data to be evaluated corresponding to each word vectorization submodel in the first word vector layer; classifying the word vector data to be evaluated corresponding to each word vectorization submodel in the first word vector layer by respectively adopting each classification submodel in the first classification layer to obtain a risk grade classification result corresponding to each classification submodel in the first classification layer; and weighting the risk grade classification results corresponding to each classification submodel in the first classification layer to obtain the risk grade of the to-be-evaluated wire-report data and/or the to-be-evaluated effective data.
In an alternative embodiment, the second data determining module 730 may include: the second preprocessing module is used for performing second preprocessing on the E-commerce public opinion data to obtain second preprocessed data when the data type of the E-commerce public opinion data is other types; the effectiveness classification module is used for classifying the effectiveness of the second preprocessed data through an effectiveness classification model; and the to-be-evaluated valid data determining module is used for taking the second preprocessed data as the to-be-evaluated valid data when the validity classification result of the second preprocessed data is valid.
In an alternative embodiment, the validity classification model includes a second word vector layer and a second classification layer, and the validity classification module may be configured to: respectively adopting each word vectorization submodel in the second word vector layer to carry out word vectorization processing on the second preprocessed data to obtain network word vector data corresponding to each word vectorization submodel in the second word vector layer; classifying network word vector data corresponding to each word vectorization submodel in the second word vector layer by respectively adopting each classification submodel in the second classification layer to obtain an effectiveness classification result corresponding to each classification submodel in the second classification layer; and weighting the effectiveness classification results corresponding to the classification submodels in the second classification layer to obtain the effectiveness classification result of the second preprocessed data.
The detailed details of each part in the above-mentioned e-commerce public opinion data processing apparatus 700 have been described in detail in the embodiment of the method part, and the details that are not disclosed can be referred to the content of the embodiment of the method part, and thus are not described again.
The exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method for processing e-commerce public opinion data of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing an electronic device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the electronic device. The program product may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The exemplary embodiment of the present disclosure also provides an electronic device capable of implementing the above-mentioned e-commerce public opinion data processing method. An electronic device 800 according to such an exemplary embodiment of the present disclosure is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present disclosure.
As shown in fig. 8, electronic device 800 may take the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: at least one processing unit 810, at least one memory unit 820, a bus 830 that couples various system components including the memory unit 820 and the processing unit 88, and a display unit 840.
The storage unit 820 stores program code that may be executed by the processing unit 810 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above in this specification. For example, processing unit 810 may perform any one or more of the method steps of fig. 1-3, 5.
The storage unit 820 may include readable media in the form of volatile storage units, such as a random access storage unit (RAM)821 and/or a cache storage unit 822, and may further include a read only storage unit (ROM) 823.
Storage unit 820 may also include a program/utility 824 having a set (at least one) of program modules 825, such program modules 825 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the exemplary embodiments of the present disclosure.
Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the following claims.

Claims (11)

1. A method for processing E-commerce public opinion data is characterized by comprising the following steps:
classifying E-commerce public opinion data based on a preset classification rule, and determining the data category of the E-commerce public opinion data;
when the data category of the E-commerce public opinion data is an E-commerce activity information category, performing first preprocessing on the E-commerce public opinion data to obtain line report data to be evaluated;
when the data category of the E-commerce public opinion data is other categories, obtaining effective data to be evaluated by carrying out effectiveness analysis on the E-commerce public opinion data;
and determining the risk level of the to-be-evaluated wire-report data and/or the to-be-evaluated effective data through a risk level evaluation model.
2. The method as claimed in claim 1, wherein when the data category of the e-commerce public opinion data is an e-commerce activity information category, obtaining line report data to be evaluated by performing a first pre-processing on the e-commerce public opinion data, comprises:
and when the data category of the E-commerce public opinion data is the E-commerce activity information category, performing link analysis and keyword replacement pretreatment on the E-commerce public opinion data to obtain the to-be-evaluated wire-newspaper data.
3. The method according to claim 2, wherein before performing link parsing and keyword replacement preprocessing on the e-commerce public opinion data to obtain the line newspaper data to be evaluated, the method further comprises:
and carrying out data format preprocessing on the E-commerce public opinion data to obtain standardized E-commerce public opinion data.
4. The method of claim 2, further comprising:
and triggering risk data early warning when the link analysis result of the E-commerce public opinion data meets a preset condition.
5. The method according to claim 4, wherein the preset condition includes that the frequency of occurrence of the e-commerce activity object corresponding to the link resolution result exceeds a preset threshold.
6. The method according to claim 1, wherein the risk level assessment model comprises a first word vector layer and a first classification layer, and the determining the risk level classification result of the data to be assessed and/or the valid data to be assessed by the risk level assessment model comprises:
respectively adopting each word vectorization submodel in the first word vector layer to carry out word vectorization processing on the line newspaper data to be evaluated and/or the effective data to be evaluated to obtain word vector data to be evaluated corresponding to each word vectorization submodel in the first word vector layer;
classifying the word vector data to be evaluated corresponding to each word vectorization submodel in the first word vector layer by respectively adopting each classification submodel in the first classification layer to obtain a risk grade classification result corresponding to each classification submodel in the first classification layer;
and weighting the risk grade classification results corresponding to each classification submodel in the first classification layer to obtain the risk grade of the to-be-evaluated wire-report data and/or the to-be-evaluated effective data.
7. The method according to claim 1, wherein when the data category of the e-commerce public opinion data is other category, obtaining effective data to be evaluated by performing effectiveness analysis on the e-commerce public opinion data comprises:
when the data type of the E-commerce public opinion data is other types, performing second preprocessing on the E-commerce public opinion data to obtain second preprocessing data;
performing effectiveness classification on the second preprocessed data through an effectiveness classification model;
and when the validity classification result of the second preprocessing data is valid, taking the second preprocessing data as the valid data to be evaluated.
8. The method of claim 7, wherein the validity classification model comprises a second word vector layer and a second classification layer, and wherein the validity classification of the second preprocessed data by the validity classification model comprises:
respectively adopting each word vectorization submodel in the second word vector layer to carry out word vectorization processing on the second preprocessed data to obtain network word vector data corresponding to each word vectorization submodel in the second word vector layer;
classifying network word vector data corresponding to each word vectorization submodel in the second word vector layer by respectively adopting each classification submodel in the second classification layer to obtain an effectiveness classification result corresponding to each classification submodel in the second classification layer;
and weighting the effectiveness classification results corresponding to the classification submodels in the second classification layer to obtain the effectiveness classification result of the second preprocessing data.
9. The utility model provides an electricity merchant public opinion data processing apparatus which characterized in that includes:
the data classification module is used for classifying the E-commerce public opinion data based on a preset classification rule and determining the data category of the E-commerce public opinion data;
the first data determination module is used for performing first preprocessing on the E-commerce public opinion data to obtain to-be-evaluated line newspaper data when the data category of the E-commerce public opinion data is an E-commerce activity information category;
the second data determination module is used for carrying out effectiveness analysis on the E-commerce public opinion data to obtain effective data to be evaluated when the data category of the E-commerce public opinion data is other categories;
and the risk evaluation module is used for determining the risk level of the to-be-evaluated wire-report data and/or the to-be-evaluated effective data through a risk level evaluation model.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 8.
11. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any of claims 1 to 8 via execution of the executable instructions.
CN202111580784.6A 2021-12-22 2021-12-22 Electronic commerce public opinion data processing method, device, storage medium and electronic equipment Pending CN114266596A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111580784.6A CN114266596A (en) 2021-12-22 2021-12-22 Electronic commerce public opinion data processing method, device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111580784.6A CN114266596A (en) 2021-12-22 2021-12-22 Electronic commerce public opinion data processing method, device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114266596A true CN114266596A (en) 2022-04-01

Family

ID=80828861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111580784.6A Pending CN114266596A (en) 2021-12-22 2021-12-22 Electronic commerce public opinion data processing method, device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114266596A (en)

Similar Documents

Publication Publication Date Title
Chen Business and market intelligence 2.0, Part 2
CN111753082A (en) Text classification method and device based on comment data, equipment and medium
CN112287672A (en) Text intention recognition method and device, electronic equipment and storage medium
CN111782793A (en) Intelligent customer service processing method, system and equipment
US20200175406A1 (en) Apparatus and methods for using bayesian program learning for efficient and reliable knowledge reasoning
CN115269827A (en) Intent determination in an improved messaging conversation management system
CN114707041A (en) Message recommendation method and device, computer readable medium and electronic device
Kumar et al. Emotion analysis of news and social media text for stock price prediction using svm-lstm-gru composite model
John et al. Stock market prediction based on deep hybrid RNN model and sentiment analysis
CN115659995B (en) Text emotion analysis method and device
Birbeck et al. Using stock prices as ground truth in sentiment analysis to generate profitable trading signals
Anese et al. Impact of public news sentiment on stock market index return and volatility
CN114266596A (en) Electronic commerce public opinion data processing method, device, storage medium and electronic equipment
CN115907801A (en) E-commerce evaluation information processing method, system, equipment and medium
CN114741501A (en) Public opinion early warning method and device, readable storage medium and electronic equipment
CN113051396A (en) Document classification identification method and device and electronic equipment
Kennis Multi-channel discourse as an indicator for Bitcoin price and volume movements
Edman et al. Predicting Tesla Stock Return Using Twitter Data
CN114896987B (en) Fine-grained emotion analysis method and device based on semi-supervised pre-training model
Noyori et al. Deep learning and gradient-based extraction of bug report features related to bug fixing time
CN113177831B (en) Financial early warning system constructed by application of public data and early warning method
Oliveira et al. Sentiment analysis of stock market behavior from Twitter using the R Tool
Liu et al. Looking for gold in the sands: Stock prediction using financial news and social media
Katende Natural Language Financial Forecasting: The South African Context
Sohrabi et al. Tehran Stock Exchange, Stocks Price Prediction, Using Wisdom of Crowd

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination