CN111651590A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111651590A
CN111651590A CN201910117723.2A CN201910117723A CN111651590A CN 111651590 A CN111651590 A CN 111651590A CN 201910117723 A CN201910117723 A CN 201910117723A CN 111651590 A CN111651590 A CN 111651590A
Authority
CN
China
Prior art keywords
comment
index
data
target product
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910117723.2A
Other languages
Chinese (zh)
Inventor
李志鹏
张光宇
何小锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910117723.2A priority Critical patent/CN111651590A/en
Publication of CN111651590A publication Critical patent/CN111651590A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data processing method and device, electronic equipment and a computer readable storage medium, and relates to the technical field of computers. The method comprises the following steps: obtaining comment data to be detected of a target user for a target product; obtaining a comment quality index of the comment data to be detected, wherein the comment quality index is obtained through a trained target product feature recognition model; acquiring a historical behavior index of the target user; and obtaining an effective comment index of the to-be-detected comment data according to the comment quality index of the to-be-detected comment data and the historical behavior index of the target user. The technical scheme of the embodiment of the invention can obtain accurate quantitative estimation of the comment data based on the multi-dimensional characteristics.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method, a data processing apparatus, an electronic device, and a computer-readable storage medium.
Background
In the field of electronic commerce, how to mine useful information from massive commodities and user data can bring great benefits to company operation.
After a user purchases a commodity, comment data submitted on an electronic commerce platform for the commodity purchased by the user is one of the most important data of the electronic commerce platform, is an important means for interaction between the user and an e-commerce platform, and is also a reference object for other users when the users select and purchase the commodity.
Typically, e-commerce platforms reward user reviews in order to encourage users to post reviews. How to effectively evaluate the quality of the comments and accurately reward is a challenging and valuable problem.
In the prior art, the e-commerce platform generally rewards user comments by the following ways:
1) screening out invalid comments through manual review;
2) and rewarding the comments according to the commodity values corresponding to the comments.
The above prior art has at least the following disadvantages:
1) the efficiency of manual examination is low, and the cost is high.
2) The quality of the comment cannot be distinguished. In practical situations, the quality comment contains more information related to the quality of the product and the shopping experience, and more rewards should be obtained.
3) The characteristics are single, and the influence of commodity value is only considered in the reward of the comment.
That is, the existing comment reward technology is an inefficient, low-quality, high-cost solution. Therefore, a new data processing method, a data processing apparatus, an electronic device, and a computer-readable storage medium are needed.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the invention and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
An object of embodiments of the present invention is to provide a data processing method, a data processing apparatus, an electronic device, and a computer-readable storage medium, which overcome one or more of the problems due to the limitations and disadvantages of the related art, at least to some extent.
According to an aspect of the present disclosure, there is provided a data processing method including: obtaining comment data to be detected of a target user for a target product; obtaining a comment quality index of the comment data to be detected, wherein the comment quality index is obtained through a trained target product feature recognition model; acquiring a historical behavior index of the target user; and obtaining an effective comment index of the to-be-detected comment data according to the comment quality index of the to-be-detected comment data and the historical behavior index of the target user.
In an exemplary embodiment of the present disclosure, obtaining the comment quality index of the to-be-detected comment data includes: identifying product characteristic phrases included in the comment data to be detected through a target product characteristic identification model; counting the number of product characteristic phrases in the comment data to be detected; and calculating the comment quality index according to the product feature phrase quantity.
In an exemplary embodiment of the present disclosure, the method further comprises: identifying a target product category corresponding to the target product; and calling the target product feature recognition model according to the target product category.
In an exemplary embodiment of the present disclosure, the method further comprises: obtaining a first training data set, the first training data set comprising positive samples and negative samples; and training the target product feature recognition model by using the first training data set.
In an exemplary embodiment of the present disclosure, obtaining a first training data set comprises: acquiring historical comment data of the target product category; extracting phrases with frequency meeting preset conditions in the historical comment data of the target product category; distinguishing the phrases into phrases irrelevant to the target product category and phrases relevant to the target product category by using a preset rule; labeling the phrases unrelated to the target product category as the negative examples; labeling the phrase related to the target product category as the positive sample.
In an exemplary embodiment of the present disclosure, tagging the phrase related to the target product category as the positive sample comprises: clustering the phrases related to the target product category, and aggregating the phrases with the same semantics together, wherein the phrases of the same category correspond to the same product characteristic; and marking each phrase with a corresponding product characteristic label.
In an exemplary embodiment of the present disclosure, obtaining the historical behavior index of the target user includes: acquiring historical comment data of the target user; obtaining comment quality indexes of each historical comment data of the target user; and obtaining the historical behavior index of the target user according to the comment quality index of each piece of historical comment data of the target user and the total number of the historical comments of the target user.
In an exemplary embodiment of the disclosure, obtaining the historical behavior index of the target user according to the comment quality index of each piece of historical comment data of the target user and the total number of historical comments of the target user includes: obtaining historical comment average quality indexes of the target user according to the comment quality indexes of the historical comment data of the target user; obtaining the maximum value of the total number of the historical comments; and obtaining the historical behavior index of the target user according to the average quality index of the historical comments of the target user, the total number of the historical comments of the target user and the maximum value of the total number of the historical comments.
In an exemplary embodiment of the present disclosure, obtaining an effective comment index of the to-be-detected comment data according to the comment quality index of the to-be-detected comment data and the historical behavior index of the target user includes: determining a first weight of the comment quality index and a second weight of the historical behavior index; acquiring a price coefficient of the target product; obtaining effective comment indexes of the comment data to be detected according to the comment quality indexes, the first weight, the historical behavior indexes, the second weight and the price coefficient; wherein the first weight is greater than the second weight.
In an exemplary embodiment of the present disclosure, obtaining the price coefficient of the target product includes: counting the price distribution of each single product in the target product category corresponding to the target product; if the price of the target product is within a first predetermined percentage of the price distribution, the price coefficient is a first constant; if the price of the target product is within a second preset percentage of the price distribution, the price coefficient is a second constant; if the price of the target product is between a first predetermined percentage before and a second predetermined percentage after the price distribution, the price coefficient is a third constant; wherein the first constant is greater than the third constant, which is greater than the second constant.
In an exemplary embodiment of the present disclosure, further comprising: arranging the effective comment indexes under the target product category corresponding to the target product in a descending order; determining the minimum effective comment index within the third preset percentage in the effective comment indexes under the target product category and the maximum effective comment index within the fourth preset percentage; if the effective comment index of the comment data to be detected is larger than the minimum effective comment index, determining that the reward coefficient is a fourth constant; if the effective comment index of the comment data to be detected is smaller than the maximum effective comment index, determining that the reward coefficient is a fifth constant; and if the effective comment index of the comment data to be detected is between the minimum effective comment index and the maximum effective comment index, determining that the reward coefficient is a sixth constant.
In an exemplary embodiment of the present disclosure, the method further comprises: judging whether the comment data to be detected is a valid comment or not by using a text two classification model; and if the comment data to be detected is invalid, judging that the valid comment index of the comment data to be detected is a set value.
In an exemplary embodiment of the present disclosure, the method further comprises: obtaining a second training data set, wherein the second training data set comprises positive samples marked as valid comments and negative samples marked as invalid comments; preprocessing positive and negative samples in the second training data set; and training the text classification model by utilizing the preprocessed second training data set.
According to an aspect of the present disclosure, there is provided a data processing apparatus including: the detection data acquisition module is configured to acquire to-be-detected comment data of a target user for a target product; the comment quality acquisition module is configured to acquire a comment quality index of the to-be-detected comment data, wherein the comment quality index is acquired through a trained target product feature recognition model; a historical behavior acquisition module configured to acquire a historical behavior index of the target user; and the effective comment module is configured to obtain an effective comment index of the to-be-detected comment data according to the comment quality index of the to-be-detected comment data and the historical behavior index of the target user.
According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including: a processor; and a memory having computer readable instructions stored thereon which, when executed by the processor, implement a data processing method as in any one of the above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data processing method as described in any one of the above.
In the technical scheme provided by some embodiments of the invention, on one hand, quality of the comment data to be detected of the target product can be distinguished by obtaining the comment quality index of the comment data to be detected; on the other hand, the effective comment index of the comment data to be detected can be obtained comprehensively by combining the historical behavior index of the target user and the comment quality index of the comment data to be detected, and therefore more accurate quantitative evaluation on the comment data to be detected can be achieved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 shows a flow diagram of a data processing method according to some embodiments of the invention.
Fig. 2 shows a schematic flow diagram of some embodiments of step S120 in fig. 1.
FIG. 3 shows a flow diagram of a data processing method according to further embodiments of the present invention.
Fig. 4 shows a schematic flow diagram of some embodiments of step S310 in fig. 3.
Fig. 5 shows a schematic flow diagram of some embodiments of step S130 in fig. 1.
Fig. 6 shows a schematic flow diagram of some embodiments of step S133 in fig. 5.
Fig. 7 shows a schematic flow diagram of some embodiments of step S140 in fig. 1.
Fig. 8 shows a schematic flow diagram of some embodiments of step S142 in fig. 7.
FIG. 9 shows a flow diagram of a data processing method according to further embodiments of the invention.
FIG. 10 shows a flow diagram of a data processing method according to further embodiments of the present invention.
Fig. 11 shows a schematic block diagram of a data processing apparatus according to some exemplary embodiments of the present invention.
FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
First, some terms mentioned in the embodiments of the present invention are explained.
Machine learning refers to a process of simulating or realizing human learning behaviors by using a computer to acquire new knowledge or skills, and reorganizing an existing knowledge structure to continuously improve the performance of the knowledge structure.
Word2Phrase is a statistical method based on Word co-occurrence frequency, and can extract high-frequency key phrases from texts.
The K-means is a clustering algorithm based on distance, and the distance is used as an evaluation index of similarity, namely, the closer the distance between two objects is, the greater the similarity of the two objects is. The algorithm considers clusters to be composed of closely spaced objects, and therefore targets the resulting compact and independent clusters as final targets.
FIG. 1 shows a flow diagram of a data processing method according to some embodiments of the invention.
As shown in fig. 1, a data processing method provided by an embodiment of the present invention may include the following steps.
In step S110, comment data to be detected of the target user for the target product is acquired.
In the embodiment of the present invention, the target user may be a user of an e-commerce platform, and the user may log in the e-commerce platform through a user terminal, open a product web page on the e-commerce platform, and place an order to purchase a target product, where the target product may be a specific article such as a refrigerator, a washing machine, a book, an electronic product, and the like, and may also be a service provided by a merchant such as a travel service, a cleaning service, and the like, and the target product is not limited herein. After a user purchases a target product such as a refrigerator, user comments can be submitted under a web page of the target product, the user comments collected through the user terminal can be sent to the background server, and the background server takes the received user comments as the comment data to be detected.
In step S120, a review quality index of the to-be-reviewed data is obtained, where the review quality index is obtained through a trained target product feature recognition model.
According to the embodiment of the invention, a scheme for reasonably evaluating the quality of the comment data to be detected can be designed by combining a supervised machine learning algorithm and an unsupervised machine learning algorithm, so that high-quality comments and low-quality comments can be effectively distinguished. Specific implementations can be found in the following examples.
In step S130, a historical behavior index of the target user is acquired.
In step S140, an effective comment index of the to-be-detected comment data is obtained according to the comment quality index of the to-be-detected comment data and the historical behavior index of the target user.
According to the data processing method provided by the embodiment of the invention, on one hand, the quality of the comment quality index of the comment data to be detected can be distinguished for the comment data to be detected of the target product by the target user; on the other hand, the effective comment index of the comment data to be detected can be obtained comprehensively by combining the historical behavior index of the target user and the comment quality index of the comment data to be detected, and therefore more accurate quantitative evaluation on the comment data to be detected can be achieved.
Fig. 2 shows a schematic flow diagram of some embodiments of step S120 in fig. 1.
As shown in fig. 2, the step S120 may further include the following steps.
In step S121, a product feature phrase included in the comment data to be detected is identified through the target product feature identification model.
In the embodiment of the present invention, product characteristics may be defined from three aspects of appearance material, product attributes, and usage effects of a product, where the appearance material may include: product size, material, color, process, style, whether damaged, etc.; the product attribute refers to functions which can be provided by the product, such as a freezing function, a space capacity and the like, and the information can be obtained from a product information table; the use effect refers to the effect that the product can only show in the use process, such as 'fresh-keeping effect', 'noise', 'smell' and the like.
For example, taking "refrigerator" as an example, the following product features are defined: product size, product weight, color, material, freezing function, space capacity, temperature regulation, power consumption, fresh-keeping effect, smell, noise and pulley effect.
It should be noted that, when the target product is different, the corresponding product characteristics may be designed and adjusted accordingly, and the invention is not limited to the above example.
In step S122, the number of product feature phrases in the comment data to be detected is counted.
In step S123, the review quality index is calculated according to the number of product feature phrases.
In the embodiment of the invention, after the target product feature recognition model is obtained, the comment data to be detected can be segmented into sentences according to the separators, and the labels of the sentences are predicted by using the target product feature recognition model. And if the sentence label corresponds to a product characteristic, adding 1 to the comment quality score of the comment data to be detected. And counting the same product characteristics in the comment data to be detected only once. Finally, the number of the product characteristic phrases matched with the comment data to be detected is the comment quality index of the comment data to be detected.
In the embodiment of the present invention, one factor determining the quality of the comment is the number of the product features of the evaluated target product included in the to-be-detected comment data, and the larger the number of the product features of the evaluated target product included in the to-be-detected comment data is, the higher the comment quality of the to-be-detected comment data is. Therefore, a target product feature recognition model for the target product can be trained firstly, then the product features in the comment data to be detected are recognized by using the model, and the comment quality index of the comment data to be detected is calculated according to the number of product feature phrases matched with the comment data to be detected.
It should be noted that the target product feature identification model in the embodiment of the present invention may have different implementation methods, which are described below with reference to fig. 3 for example.
FIG. 3 shows a flow diagram of a data processing method according to further embodiments of the present invention.
As shown in fig. 3, the data processing method provided in the embodiment of the present invention is different from the above-described embodiments in that the following steps may be further included.
In step S310, a first training data set is obtained, the first training data set comprising positive samples and negative samples.
In the embodiment of the present invention, all historical comment data of a target product category corresponding to the target product may be collected, a high-frequency phrase in each historical comment data is extracted, the extracted phrases are used as samples, product features designed for the target product category are used as labels of each phrase, and the first training data set is generated, where a positive sample may be a phrase after a product feature label is printed, and a negative sample may be a phrase unrelated to a product feature. Specific implementations may refer to other embodiments below.
In step S320, the target product feature recognition model is trained using the first training data set.
In the embodiment of the present invention, the first training data set may be used to train the target product feature recognition model, so as to recognize product features corresponding to sentences or phrases in the comment data to be detected.
In the embodiment of the present invention, the target product feature identification model may be a multi-classification model, but the present invention is not limited thereto, and when there is only one product feature designed for the target product, the target product feature identification model may also be a two-classification model, that is, the target product feature identification model identifies a sentence or phrase in the comment data to be detected as the designed product feature, or identifies the sentence or phrase as a phrase unrelated to the product feature.
In step S330, a target product category corresponding to the target product is identified.
In the embodiment of the invention, a class of products can be determined according to a class system of the electronic commerce platform for dividing the products, and the current class system can be generally divided into three classes, namely, a first class, a second class and a third class.
In the embodiment of the present invention, a class of products may refer to all products in a class of three, but the present invention is not limited thereto.
For example, suppose that part of category information (each column is: first-level category number, first-level category name, second-level category number, second-level category name, third-level category number, and third-level category name) in the category table of the e-commerce platform is as follows:
737 household electric appliance 738 socket for domestic electric appliance 1052
737 household electrical appliance 794 big household electric 1199 mini sound
737 household electrical appliance 794 big electric 12392 refrigerator/ice bar
737 household appliance 738 household appliance 12394 sweeping robot
737 household electrical appliance 794 big household electrical appliance 878 refrigerator
Among them, the "refrigerator" is a three-level category.
In step S340, the target product feature recognition model is called according to the target product category.
In the embodiment of the invention, respective product feature recognition models can be respectively trained aiming at a plurality of three-level categories of the e-commerce platform in advance, for example, a product feature recognition model of a refrigerator category, a product feature recognition model of a sweeping robot category, a product feature recognition model of a socket category and the like are trained. When product feature recognition is carried out on a new piece of comment data to be detected, firstly, the target product category of a target product corresponding to the new piece of comment data to be detected is judged, and then, a corresponding target product feature recognition model is called to recognize the number of product feature phrases in the new piece of comment data to be detected. At present, the product category to which each product belongs is determined on an electronic commerce platform, and the product category of the product corresponding to each user comment can be inquired.
Fig. 4 shows a schematic flow diagram of some embodiments of step S310 in fig. 3.
As shown in fig. 4, the step S310 may further include the following steps.
In step S311, historical review data of the target product category is acquired.
For example, all historical review data under this three-level category of refrigerator may be collected, or all historical review data over a period of time, such as a year.
In step S312, phrases whose occurrence frequency satisfies a predetermined condition in the historical comment data of the target product category are extracted.
In the embodiment of the invention, after all historical comment data of the target product category are acquired from comment data of an e-commerce platform, high-frequency (how to define the high frequency can be set according to specific application scenes and actual requirements, but the invention is not limited to the high-frequency) phrases can be extracted from all historical comment data by using Word2Phrase, and the high-frequency phrases are often related to the characteristics of the product.
In step S313, the phrases are distinguished into phrases unrelated to the target product category and phrases related to the target product category using a predetermined rule.
In the embodiment of the invention, the extracted phrases can be further distinguished into phrases related to the target product category and phrases unrelated to the target product category by utilizing a preset rule and manual filtering.
In this embodiment of the present invention, the predetermined rule may include: 1) only phrases with the length of 4-8 characters are reserved, and phrases which are too short cannot form complete phrases with high probability; too long a phrase will contain too much extraneous information. 2) The phrases that are emotion neutral are filtered out, and the existing emotion API (Application Programming Interface) can be called to identify the phrases, wherein the neutral phrases are often irrelevant to the product characteristics, such as phrases like "remember the phrases", "wish to use" and the like. 3) Incomplete, unofficial phrases are filtered out, such as "can reach", "cool still", "somewhat small", "can also send" and the like.
In the embodiment of the present invention, taking the refrigerator category as an example, phrases related to the refrigerator category and phrases unrelated to the refrigerator category are exemplified as follows.
Phrases related to refrigerator categories may include: 1) the phrase describing refrigeration: for example, the refrigerating effect lever, the refrigerating effect is not good, and the freezing effect is excellent; 2) phrase describing the freshness effect: for example, "excellent preservation effect"; 3) the phrase describing noise: for example, "noise is negligible", "running sound is small", and "sound and static are really loud".
Phrases unrelated to refrigerator categories may include: 1) phrases of other categories: for example, the wind force is very large, the suction force is strong, the filtering effect is good, and the like; 2) phrases related to price, logistics, customer service: such as "very expensive", "logistics are very fast", "customer service is still" etc.; 3) meaningless phrases: such as "good quality", "no effect on life", "good family" and the like.
In step S314, the phrases unrelated to the target product category are labeled as the negative examples.
In step S315, the phrase related to the target product category is labeled as the positive sample.
In an exemplary embodiment, tagging the phrase related to the target product category as the positive sample may include: clustering the phrases related to the target product category, and aggregating the phrases with the same semantics together, wherein the phrases of the same category correspond to the same product characteristic; and marking each phrase with a corresponding product characteristic label.
In the embodiment of the invention, the phrases related to the product features of the target product category can be clustered by using a K-Mean algorithm. The phrases with the same semantics are gathered together, the same class of phrases often corresponds to the same product characteristics, and then product characteristic labels are marked on the phrases of each class.
For example, under the refrigerator category, the following phrases are grouped under one category by a clustering algorithm: except that the sound is louder, except that the sound is loud, but the noise is not general, but is louder, but the noise is really too loud or quieter, the sound is still more obvious, the sound is louder, the noise is not very loud, the noise is not loud, the occupied area is small and small, and the occupied area is … small. It can be seen that the above phrases are mostly describing the product feature "noise", and phrases in this category can be labeled "noise".
As another example, the following phrases: the space of the refrigerated cabinet is large, namely the refrigerating space is too small, namely the refrigerating space is small, the refrigerating effect is good, and namely the inner space is too small …. The above phrases are found mostly in describing the product feature of "space capacity" of a refrigerator, and may be labeled with a "space capacity".
It should be noted that there may be some error in clustering, for example, in the above example, the phrase describing noise and "small floor space" are mixed, and this class may be filtered to retain only most of the phrases describing the same product features.
Through the steps, the extracted phrases are labeled, and for the phrases related to the product characteristics of the target product category, the labels are the product characteristics actually described by the phrases. The phrases that are unrelated to the product characteristics of the target product category are uniformly labeled as "product independent".
The format of the final data may be:
label noise, low noise
Label space capacity, freezing space sufficient
Label product independence and too slow logistics
Label power consumption, one-level energy efficiency comparison and power saving
Label power consumption, high power consumption
After the labeled data exist, training and prediction of a product feature recognition model can be carried out.
Fig. 5 shows a schematic flow diagram of some embodiments of step S130 in fig. 1.
As shown in fig. 5, the step S130 may further include the following steps.
In step S131, the historical comment data of the target user is acquired.
For example, all historical review data submitted by the user A on the e-commerce platform, or all historical review data submitted over a period of time, such as over the past half year, may be obtained.
In step S132, a comment quality index of each piece of historical comment data of the target user is acquired.
In the embodiment of the present invention, the method for calculating the comment quality index of each piece of historical comment data of the target user may refer to the above method for calculating the comment quality index of the to-be-detected comment data.
In step S133, a historical behavior index of the target user is obtained according to the comment quality index of each piece of historical comment data of the target user and the total number of historical comments of the target user.
Fig. 6 shows a schematic flow diagram of some embodiments of step S133 in fig. 5.
As shown in fig. 6, the above step S133 may further include the following steps.
In step S1331, an average quality index of the historical review of the target user is obtained according to the review quality index of each piece of historical review data of the target user.
In the embodiment of the present invention, the historical comment average quality index may be an average of comment quality indexes of all historical comment data submitted by the target user on the e-commerce platform.
In step S1332, a maximum value of the total number of history comments is acquired.
In this embodiment of the present invention, the maximum value of the total number of the historical comments may be a maximum value of the total number of the historical comments submitted by each user on the e-commerce platform (similarly, the total number of all the historical comments may be a total number of all the historical comments, or a total number of all the historical comments submitted in a past period of time, for example, a past half year), for example, 10 pieces of comments are submitted by a first user, 100 pieces of comments are submitted by a second user, and so on. Assuming that the maximum value among these is 100, the total number of historical reviews is 100 at maximum.
In step S1333, obtaining the historical behavior index of the target user according to the average quality index of the historical comments of the target user, the total number of the historical comments of the target user, and the maximum value of the total number of the historical comments.
In the embodiment of the present invention, the historical behavior index of the target user may be calculated by the following formula:
the historical behavior index of the target user is the average quality index (1+ a N) of the historical comments of the target user
Wherein "+" in the above formula represents a multiplier, and N is the total number of the historical comment data of the target user; a is a constant and can be taken as the reciprocal of the maximum value Nmax of the total number of the historical reviews, namely a is 1/Nmax.
According to the calculation formula, the more times of the historical comments submitted by the user on the electronic commerce platform, or the higher the percentage of the historical comments submitted by the user is, the higher the historical behavior index is, which means that the more active the user makes the comments, the higher the score is, namely, the method provided by the embodiment of the invention can positively stimulate the user to make more comments and make more excellent comments.
Fig. 7 shows a schematic flow diagram of some embodiments of step S140 in fig. 1.
As shown in fig. 7, the step S140 may further include the following steps.
In step S141, a first weight of the review quality indicator and a second weight of the historical behavior indicator are determined.
In step S142, a price coefficient of the target product is obtained.
In step S143, obtaining an effective comment index of the to-be-detected comment data according to the comment quality index, the first weight, the historical behavior index, the second weight, and the price coefficient.
In the embodiment of the invention, the effective comment indexes of the comment data to be detected can be calculated based on the multidimensional characteristics, and then the comment data to be detected can be rewarded according to the size of the effective comment indexes. Specifically, the multidimensional feature may include a review quality index, a historical behavior index, and a price coefficient. The historical behavior index is considered, so that continuous stimulation is performed on users who make high-quality comments, and sources of the high-quality comments are guaranteed. The product price can reflect the profit of the product to a certain extent, and the cost brought by rewarding user comments can be compensated.
In the embodiment of the invention, the effective comment index of the comment data to be detected can be calculated by the following formula:
effective comment index g (x) (W1 comment quality index + W2 historical behavior index)
Wherein x represents the product price of the target product, and g (x) may be a piecewise function. The plurality of segments may be divided according to the price distribution of the target product category corresponding to the target product, and the following example may be specifically referred to. W1 and W2 represent a first weight of the review quality indicator and a second weight of the historical behavior indicator, respectively. In an embodiment of the present invention, W1 may be set larger than W2 because the reward for commentary should take more into account the quality of the current commentary. For example, W1 may be 0.8, and W2 may be 0.2, but the present invention is not limited thereto, and may be adjusted according to a specific application scenario. And finally, accurately rewarding the target user according to the effective comment indexes of the comment data to be detected.
Fig. 8 shows a schematic flow diagram of some embodiments of step S142 in fig. 7.
As shown in fig. 8, the step S142 may further include the following steps.
In step S1421, the price distribution of each single product in the target product category corresponding to the target product is counted.
In the embodiment of the invention, the single product refers to a SKU (Stock Keeping Unit) on an electronic commerce platform, and each product corresponds to a unique SKU number. An individual product is referred to as an individual product when any one of the attributes of brand, model, configuration, grade, flower color, packaging capacity, unit, production date, expiration date, usage, price, place of production, etc. is different from the others. Where a SKU refers to a product that is shelved.
For example, the price distribution of each single item in the refrigerator category corresponding to a certain brand and model of refrigerator purchased by the user may be counted, for example, the product prices of all the single items in the refrigerator category are arranged in descending order from large to small, but the invention is not limited thereto.
In step S1422, if the price of the target product is within the first predetermined percentage of the price distribution, the price coefficient is a first constant.
For example, after the prices of all SKUs in the target product category are sorted in descending order from large to small, if the price of the target product is located in the first 20% of the price distribution (if the prices of the target product are sorted in ascending order from small to large, the price can be modified to the last 20% of the price distribution, and the specific value of the first predetermined percentage can be designed autonomously), the price coefficient of the target product may be 1.5 (for example only, and the present invention is not limited thereto).
In step S1423, if the price of the target product is within the second predetermined percentage of the price distribution, the price coefficient is a second constant.
For example, after the prices of all SKUs in the target product category are sorted in descending order from large to small, if the price of the target product is located at the last 20% of the price distribution (if the prices of the target product are sorted in ascending order from small to large, the price distribution can be modified to the first 20%, and the specific value of the second predetermined percentage can be designed autonomously), the price coefficient of the target product may be 1 (for example only, and the present invention is not limited thereto).
In step S1424, if the price of the target product is between the first predetermined percentage and the second predetermined percentage of the price distribution, the price coefficient is a third constant.
For example, after the prices of all SKUs in the target product category are sorted in descending order from large to small, if the price of the target product is between the first 20% and the last 20% of the price distribution, i.e. in the middle gear, the price coefficient of the target product may be 1.25 (for illustration only, the present invention is not limited thereto).
It should be noted that, different piecewise functions may be set for each type of different products g (x), for example, the piecewise functions may be divided into two sections, three sections, or more than four sections, and the value of the price interval of each section may be specifically set according to different price distributions of each type of products. In addition, the value of the price coefficient of each section of each type of product may also be different, for example, in the above example, the price coefficient corresponding to the top price (top 20%) of the refrigerator category is 1.5, and then the price coefficient corresponding to the top price (top 20%) of the book category may be set to 1.8.
FIG. 9 shows a flow diagram of a data processing method according to further embodiments of the invention.
As shown in fig. 9, the data processing method according to the embodiment of the present invention is different from the other embodiments described above in that the data processing method may further include the following steps.
In step S910, the effective comment indexes in the target product category corresponding to the target product are sorted in descending order.
In step S920, a minimum effective comment index within a third predetermined percentage before and a maximum effective comment index within a fourth predetermined percentage after the effective comment index under the target product category are determined.
In step S930, if the effective comment index of the to-be-detected comment data is greater than the minimum effective comment index, it is determined that the reward coefficient is a fourth constant.
In step S940, if the effective comment index of the comment data to be detected is smaller than the maximum effective comment index, it is determined that the reward coefficient is a fifth constant.
In step S950, if the effective comment index of the comment data to be detected is between the minimum effective comment index and the maximum effective comment index, it is determined that the reward coefficient is a sixth constant.
In the embodiment of the invention, the e-commerce platform can comment and reward in the forms of issuing points, X beans and the like, for example, and the specific reward scheme can be as follows:
when a new piece of comment data to be detected is obtained, the effective comment indexes of all comment data in the target product category corresponding to the comment data to be detected (the calculation method of the effective comment index of each comment data can refer to the calculation method of the effective comment index of the comment data to be detected) are sorted from large to small, the minimum effective comment index of the first 20% is assumed to be f1, and the maximum effective comment index of the last 20% is assumed to be f 2. If the effective comment index of the new comment data to be detected is larger than f1, issuing points or X bean quantity which is 2 times of the original points or X bean quantity; if the effective comment index of the new comment data to be detected is smaller than f2, issuing the point or X bean amount to be 0.5 times of the original point or X bean amount; if the effective comment index of the new comment data to be detected is between f2 and f1, issuing the point or X bean amount to be 1 time of the original point or X bean amount; and if the effective comment index of the new comment data to be detected is 0 (namely, the comment is invalid), issuing a point or setting the X bean quantity to be 0. The original points or the X bean amount refers to the fixed points or the X bean number issued by the comments of the products before the scheme is executed.
FIG. 10 shows a flow diagram of a data processing method according to further embodiments of the present invention.
As shown in fig. 10, the data processing method according to the embodiment of the present invention is different from the other embodiments described above in that the data processing method may further include the following steps.
In step S1010, a second training data set is obtained, which includes positive samples labeled as valid comments and negative samples labeled as invalid comments.
In the embodiment of the invention, the invalid comments may include spam, forbidden and meaningless comments, including comments whose contents are unrelated to the product or repeated comments of meaningless contents.
For example, using ancient poems as a comment, "purchasers-remchasers, magnolia-wai-huo-fantasy. Does not smell the airborne sound, only smells the female sigh. Ask her what thought, ask her what memory. Women have no thought nor memory.
As another example, tell a story as a comment, "do you want? Why are the addle inherited in widowen cells? Because weiwei is a powerful son of weizhou.
As another example, a meaningless repetition is used as a comment, "good and good @ @ good &".
Illicit reviews may include advertising, abuse, and yellow-related reviews.
For example, advertise, "cheap big throw, add q, private chat".
In the embodiment of the present invention, a predetermined number of, for example, 10 ten thousand representative samples may be extracted from historical comment data of an e-commerce platform for labeling, where if a comment is an invalid comment, the comment is labeled as 1, and otherwise, the comment is labeled as 2 (for example, the present invention is not limited thereto). The format of the data may be:
label _1, cheap big-throw, plus q, private chat
Label 2. RTM. is good in appearance, but has no pure sesame stuffing and is delicious
In step S1020, positive and negative samples in the second training data set are preprocessed.
In the embodiment of the invention, the positive and negative sample data are preprocessed, for example, word segmentation is carried out, and the data are stored to train.
For example:
before word segmentation: label 2, nice to look, but no tasty of pure sesame filling.
After word segmentation: label 2, nice but not as good as pure sesame filling.
In step S1030, a text binary model is trained using the preprocessed second training data set.
In the embodiment of the invention, a Fastext model can be called for training. Fastext is a shallow neural network algorithm based on n-gram, and only a data input path, a model output path and parameters need to be input.
In step S1040, the text binary classification model is used to determine whether the comment data to be detected is a valid comment.
In the embodiment of the invention, for the new comment data to be detected, the new comment data to be detected can be preprocessed firstly.
For example, assume that the new comment data to be detected is "fast delivery, not yet in normal use, heating and cooling effect is not yet known", and after the preprocessing is "fast delivery, not yet in normal use, heating and cooling effect is not yet known". And then inputting the text into the trained Fasttext model for prediction, wherein if the prediction result is 1, the text is an invalid comment, and otherwise, the text is a valid comment.
It should be noted that the implementation of the algorithm of the text binary classification model is not limited to Fasttext, but Fasttext is currently selected from the viewpoint of efficiency. There are many text classification models, most of which can be applied to the current task.
In step S1050, if the comment data to be detected is an invalid comment, it is determined that the valid comment index of the comment data to be detected is a set value.
In the embodiment of the invention, whether the comment data to be detected is effective or not is judged by training a two-class machine learning model, namely the text two-class model. If the comment data to be detected are invalid, the corresponding valid comment index is directly set to 0; if the comment data to be detected are valid, the product feature identification and the detection and calculation of the comment quality index, the valid comment index and the like in the above embodiment can be continued.
On one hand, the data processing method provided by the embodiment of the invention can automatically check and review the invalid comments such as rubbish, forbidding, meaningless comments and the like in the comment data by designing the text classification model based on machine learning, and solves the problems of low efficiency, high cost and the like caused by manual checking in the prior art. On the other hand, the embodiment of the invention designs an effective comment quality evaluation scheme which is based on massive product comments and combined with a product feature recognition model based on machine learning and can calculate the comment quality index of the comment data, so that high-quality comments and low-quality comments can be distinguished. Meanwhile, the method provided by the embodiment of the invention can also combine multidimensional characteristics, construct a multi-factor model, calculate the effective comment indexes of the comment data, realize the accurate reward of the comment, and overcome the defects that the prior art has single characteristic and cannot carry out the accurate reward.
Fig. 11 shows a schematic block diagram of a data processing apparatus according to some exemplary embodiments of the present invention.
In addition, in the embodiment of the invention, a data processing device is also provided. Referring to fig. 11, the data processing apparatus 1100 may include: a detection data acquisition module 1110, a review quality acquisition module 1120, a historical behavior acquisition module 1130, and a valid review module 1140.
The detection data obtaining module 1110 may be configured to obtain to-be-detected comment data of a target user for a target product.
The review quality obtaining module 1120 may be configured to obtain a review quality index of the to-be-reviewed data, where the review quality index is obtained through a trained target product feature recognition model.
The historical behavior acquisition module 1130 may be configured to acquire historical behavior metrics for the target user.
The effective comment module 1140 may be configured to obtain an effective comment index of the to-be-detected comment data according to the comment quality index of the to-be-detected comment data and the historical behavior index of the target user.
In an exemplary embodiment, the review quality acquisition module 1120 may include: identifying product feature phrases included in the comment data to be detected through the target product feature identification model; counting the number of product characteristic phrases in the comment data to be detected; and calculating the comment quality index according to the product feature phrase quantity.
In an exemplary embodiment, the data processing apparatus 1100 may further include: identifying a target product category corresponding to the target product; and calling the target product feature recognition model according to the target product category.
In an exemplary embodiment, the data processing apparatus 1100 may further include: a first training data acquisition module configured to acquire a first training data set, the first training data set comprising positive samples and negative samples; and training the target product feature recognition model by using the first training data set.
In an exemplary embodiment, the first training data acquisition module may include: the first historical comment acquisition submodule is configured to acquire historical comment data of the target product category; the phrase extraction submodule is configured to extract phrases, of which the occurrence frequency meets a preset condition, in the historical comment data of the target product category; a phrase distinguishing sub-module configured to distinguish the phrases into phrases unrelated to the target product category and phrases related to the target product category using a predetermined rule; a negative sample labeling sub-module configured to label the phrases unrelated to the target product category as the negative sample; and the positive sample labeling submodule is configured to label the phrases related to the target product category as the positive sample.
In an exemplary embodiment, the positive sample labeling submodule may include: the clustering unit is configured to cluster the phrases related to the target product categories and cluster the phrases with the same semantics together, wherein the phrases of the same category correspond to the same product characteristics; and the labeling unit is configured to mark each type of phrase with a corresponding product characteristic label.
In an exemplary embodiment, the historical behavior acquisition module 1130 may include: the second historical comment acquisition submodule is configured to acquire historical comment data of the target user; the historical comment quality acquisition submodule is configured to acquire comment quality indexes of the historical comment data of the target user; and the historical behavior acquisition submodule is configured to acquire the historical behavior index of the target user according to the comment quality index of each piece of historical comment data of the target user and the total number of the historical comments of the target user.
In an exemplary embodiment, the historical behavior acquisition sub-module may include: the historical comment average quality obtaining unit is configured to obtain the historical comment average quality index of the target user according to the comment quality index of each piece of historical comment data of the target user; the maximum historical comment total value acquisition unit is configured to acquire the maximum historical comment total value; and the historical behavior acquisition unit is configured to acquire the historical behavior index of the target user according to the average quality index of the historical comments of the target user, the total number of the historical comments of the target user and the maximum value of the total number of the historical comments.
In an exemplary embodiment, the valid comment module 1140 may include: a weight determination submodule configured to determine a first weight of the review quality indicator and a second weight of the historical behavior indicator; the price coefficient acquisition submodule is configured to acquire a price coefficient of the target product; the effective comment submodule is configured to obtain an effective comment index of the to-be-detected comment data according to the comment quality index, the first weight, the historical behavior index, the second weight and the price coefficient; wherein the first weight is greater than the second weight.
In an exemplary embodiment, the price coefficient acquisition sub-module may include: the price distribution counting unit is configured to count the price distribution of each single product in the target product category corresponding to the target product; a first price coefficient determination unit configured to determine that the price coefficient is a first constant if the price of the target product is within a first predetermined percentage of the price distribution; a second price coefficient determination unit configured to determine that the price coefficient is a second constant if the price of the target product is within a second predetermined percentage of the price distribution; a third price coefficient determination unit configured to determine that the price coefficient is a third constant if the price of the target product is between a first predetermined percentage and a second predetermined percentage of the price distribution; wherein the first constant is greater than the third constant, which is greater than the second constant.
In an exemplary embodiment, the data processing apparatus 1100 may further include: the comment index sorting module is configured to sort the effective comment indexes in the target product category corresponding to the target product in a descending order; the most-valued comment index determining module is configured to determine a minimum effective comment index within a third preset percentage in the effective comment indexes under the target product category and a maximum effective comment index within a fourth preset percentage; the first reward coefficient determining module is configured to determine that the reward coefficient is a fourth constant if the effective comment index of the comment data to be detected is larger than the minimum effective comment index; the second reward coefficient determining module is configured to determine that the reward coefficient is a fifth constant if the effective comment index of the comment data to be detected is smaller than the maximum effective comment index; and the third reward coefficient determining module is configured to determine that the reward coefficient is a sixth constant if the effective comment index of the comment data to be detected is between the minimum effective comment index and the maximum effective comment index.
In an exemplary embodiment, the data processing apparatus 1100 may further include: the effective comment judging module is configured to judge whether the comment data to be detected is an effective comment or not by using a text two-classification model; and the invalid comment judging module is configured to judge that the valid comment index of the to-be-detected comment data is a set value if the to-be-detected comment data is an invalid comment.
In an exemplary embodiment, the data processing apparatus 1100 may further include: a second training data acquisition module configured to acquire a second training data set, the second training data set including positive samples labeled as valid comments and negative samples labeled as invalid comments; a sample preprocessing module configured to preprocess positive and negative samples in the second training data set; and the binary model training module is configured to train the text binary model by using the preprocessed second training data set.
Since each functional module of the data processing apparatus 1100 according to the exemplary embodiment of the present invention corresponds to the step of the above-described exemplary embodiment of the data processing method, it is not described herein again.
In an exemplary embodiment of the present invention, there is also provided an electronic device capable of implementing the above method.
Referring now to FIG. 12, shown is a block diagram of a computer system 1200 suitable for use with the electronic device implementing an embodiment of the present invention. The computer system 1200 of the electronic device shown in fig. 12 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.
As shown in fig. 12, the computer system 1200 includes a Central Processing Unit (CPU)1201, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for system operation are also stored. The CPU 1201, ROM 1202, and RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.
The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.
In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1209, and/or installed from the removable medium 1211. The computer program performs the above-described functions defined in the system of the present application when executed by the Central Processing Unit (CPU) 1201.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the data processing method as described in the above embodiments.
For example, the electronic device may implement the following as shown in fig. 1: step S110, obtaining comment data to be detected of a target user for a target product; step S120, obtaining a comment quality index of the to-be-detected comment data, wherein the comment quality index is obtained through a trained target product feature recognition model; step S130, acquiring historical behavior indexes of the target user; step S140, obtaining an effective comment index of the to-be-detected comment data according to the comment quality index of the to-be-detected comment data and the historical behavior index of the target user.
It should be noted that although in the above detailed description several modules or units of a device or apparatus for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (16)

1. A data processing method, comprising:
obtaining comment data to be detected of a target user for a target product;
obtaining a comment quality index of the comment data to be detected, wherein the comment quality index is obtained through a trained target product feature recognition model;
acquiring a historical behavior index of the target user;
and obtaining an effective comment index of the to-be-detected comment data according to the comment quality index of the to-be-detected comment data and the historical behavior index of the target user.
2. The data processing method of claim 1, wherein obtaining the comment quality index of the comment data to be detected comprises:
identifying product feature phrases included in the comment data to be detected through the target product feature identification model;
counting the number of product characteristic phrases in the comment data to be detected;
and calculating the comment quality index according to the product feature phrase quantity.
3. The data processing method of claim 2, further comprising:
identifying a target product category corresponding to the target product;
and calling the target product feature recognition model according to the target product category.
4. The data processing method of claim 3, further comprising:
obtaining a first training data set, the first training data set comprising positive samples and negative samples;
and training the target product feature recognition model by using the first training data set.
5. The data processing method of claim 4, wherein obtaining a first training data set comprises:
acquiring historical comment data of the target product category;
extracting phrases with frequency meeting preset conditions in the historical comment data of the target product category;
distinguishing the phrases into phrases irrelevant to the target product category and phrases relevant to the target product category by using a preset rule;
labeling the phrases unrelated to the target product category as the negative examples;
labeling the phrase related to the target product category as the positive sample.
6. The data processing method of claim 5, wherein tagging the phrase related to the target product category as the positive sample comprises:
clustering the phrases related to the target product category, and aggregating the phrases with the same semantics together, wherein the phrases of the same category correspond to the same product characteristic;
and marking each phrase with a corresponding product characteristic label.
7. The data processing method of claim 5, wherein obtaining the historical behavior index of the target user comprises:
acquiring historical comment data of the target user;
obtaining comment quality indexes of each historical comment data of the target user;
and obtaining the historical behavior index of the target user according to the comment quality index of each piece of historical comment data of the target user and the total number of the historical comments of the target user.
8. The data processing method of claim 7, wherein obtaining the historical behavior index of the target user according to the comment quality index of each piece of historical comment data of the target user and the total number of historical comments of the target user comprises:
obtaining historical comment average quality indexes of the target user according to the comment quality indexes of the historical comment data of the target user;
obtaining the maximum value of the total number of the historical comments;
and obtaining the historical behavior index of the target user according to the average quality index of the historical comments of the target user, the total number of the historical comments of the target user and the maximum value of the total number of the historical comments.
9. The data processing method of claim 1, wherein obtaining the effective comment index of the to-be-detected comment data according to the comment quality index of the to-be-detected comment data and the historical behavior index of the target user comprises:
determining a first weight of the comment quality index and a second weight of the historical behavior index;
acquiring a price coefficient of the target product;
obtaining effective comment indexes of the comment data to be detected according to the comment quality indexes, the first weight, the historical behavior indexes, the second weight and the price coefficient;
wherein the first weight is greater than the second weight.
10. The data processing method of claim 9, wherein obtaining the price coefficient of the target product comprises:
counting the price distribution of each single product in the target product category corresponding to the target product;
if the price of the target product is within a first predetermined percentage of the price distribution, the price coefficient is a first constant;
if the price of the target product is within a second preset percentage of the price distribution, the price coefficient is a second constant;
if the price of the target product is between a first predetermined percentage before and a second predetermined percentage after the price distribution, the price coefficient is a third constant;
wherein the first constant is greater than the third constant, which is greater than the second constant.
11. The data processing method of claim 1, further comprising:
arranging the effective comment indexes under the target product category corresponding to the target product in a descending order;
determining the minimum effective comment index within the third preset percentage in the effective comment indexes under the target product category and the maximum effective comment index within the fourth preset percentage;
if the effective comment index of the comment data to be detected is larger than the minimum effective comment index, determining that the reward coefficient is a fourth constant;
if the effective comment index of the comment data to be detected is smaller than the maximum effective comment index, determining that the reward coefficient is a fifth constant;
and if the effective comment index of the comment data to be detected is between the minimum effective comment index and the maximum effective comment index, determining that the reward coefficient is a sixth constant.
12. The data processing method of claim 1, further comprising:
judging whether the comment data to be detected is a valid comment or not by using a text two classification model;
and if the comment data to be detected is invalid, judging that the valid comment index of the comment data to be detected is a set value.
13. The data processing method of claim 12, further comprising:
obtaining a second training data set, wherein the second training data set comprises positive samples marked as valid comments and negative samples marked as invalid comments;
preprocessing positive and negative samples in the second training data set;
and training the text classification model by utilizing the preprocessed second training data set.
14. A data processing apparatus, comprising:
the detection data acquisition module is configured to acquire to-be-detected comment data of a target user for a target product;
the comment quality acquisition module is configured to acquire a comment quality index of the to-be-detected comment data, wherein the comment quality index is acquired through a trained target product feature recognition model;
a historical behavior acquisition module configured to acquire a historical behavior index of the target user;
and the effective comment module is configured to obtain an effective comment index of the to-be-detected comment data according to the comment quality index of the to-be-detected comment data and the historical behavior index of the target user.
15. An electronic device, comprising: a processor; and a memory having computer readable instructions stored thereon which, when executed by the processor, implement the data processing method of any one of claims 1 to 13.
16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 13.
CN201910117723.2A 2019-02-15 2019-02-15 Data processing method and device, electronic equipment and storage medium Pending CN111651590A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910117723.2A CN111651590A (en) 2019-02-15 2019-02-15 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910117723.2A CN111651590A (en) 2019-02-15 2019-02-15 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111651590A true CN111651590A (en) 2020-09-11

Family

ID=72342425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910117723.2A Pending CN111651590A (en) 2019-02-15 2019-02-15 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111651590A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113709125A (en) * 2021-08-18 2021-11-26 北京明略昭辉科技有限公司 Method and device for determining abnormal flow, storage medium and electronic equipment
CN117725909A (en) * 2024-02-18 2024-03-19 四川日报网络传媒发展有限公司 Multi-dimensional comment auditing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049637A (en) * 2011-10-11 2013-04-17 塔塔咨询服务有限公司 Content quality and user engagement in social platforms
US20170091847A1 (en) * 2015-09-29 2017-03-30 International Business Machines Corporation Automated feature identification based on review mapping
CN107316211A (en) * 2017-07-01 2017-11-03 马骁志 Comment processing method and service end
CN107391729A (en) * 2017-08-02 2017-11-24 掌阅科技股份有限公司 Sort method, electronic equipment and the computer-readable storage medium of user comment
CN108153733A (en) * 2017-12-26 2018-06-12 北京小度信息科技有限公司 Comment on the sorting technique and device of quality

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049637A (en) * 2011-10-11 2013-04-17 塔塔咨询服务有限公司 Content quality and user engagement in social platforms
US20170091847A1 (en) * 2015-09-29 2017-03-30 International Business Machines Corporation Automated feature identification based on review mapping
CN107316211A (en) * 2017-07-01 2017-11-03 马骁志 Comment processing method and service end
CN107391729A (en) * 2017-08-02 2017-11-24 掌阅科技股份有限公司 Sort method, electronic equipment and the computer-readable storage medium of user comment
CN108153733A (en) * 2017-12-26 2018-06-12 北京小度信息科技有限公司 Comment on the sorting technique and device of quality

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113709125A (en) * 2021-08-18 2021-11-26 北京明略昭辉科技有限公司 Method and device for determining abnormal flow, storage medium and electronic equipment
CN117725909A (en) * 2024-02-18 2024-03-19 四川日报网络传媒发展有限公司 Multi-dimensional comment auditing method and device, electronic equipment and storage medium
CN117725909B (en) * 2024-02-18 2024-05-14 四川日报网络传媒发展有限公司 Multi-dimensional comment auditing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107391493B (en) Public opinion information extraction method and device, terminal equipment and storage medium
CN106919619B (en) Commodity clustering method and device and electronic equipment
CN109299994B (en) Recommendation method, device, equipment and readable storage medium
CN110163647B (en) Data processing method and device
CN111104526A (en) Financial label extraction method and system based on keyword semantics
CN107833082B (en) Commodity picture recommendation method and device
CN111340121B (en) Target feature determination method and device
CN103123633A (en) Generation method of evaluation parameters and information searching method based on evaluation parameters
CN112380349A (en) Commodity gender classification method and device and electronic equipment
CN108733748A (en) A kind of cross-border product quality risk fuzzy prediction method based on comment on commodity public sentiment
CN112288517A (en) Commodity recommendation method and device combining RPA and AI
CN107423335A (en) A kind of negative sample system of selection for single class collaborative filtering problem
Yeole et al. Opinion mining for emotions determination
Adnan et al. Sentiment analysis of restaurant review with classification approach in the decision tree-j48 algorithm
CN105225135A (en) Potentiality customer recognition method and device
CN111339439A (en) Collaborative filtering recommendation method and device fusing comment text and time sequence effect
CN108932648A (en) A kind of method and apparatus for predicting its model of item property data and training
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN111651590A (en) Data processing method and device, electronic equipment and storage medium
CN108090201A (en) A kind of method, apparatus and electronic equipment of article content classification
CN109933784B (en) Text recognition method and device
CN112148994A (en) Information push effect evaluation method and device, electronic equipment and storage medium
CN111192111A (en) Product sales data analysis method and terminal equipment
Kausar et al. Sentiment Classification based on Machine Learning Approaches in Amazon Product Reviews
Win et al. Sentiment attribution analysis with hierarchical classification and automatic aspect categorization on online user reviews

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination