WO2021232856A1 - Big data-based online sales commodity sampling and testing method - Google Patents

Big data-based online sales commodity sampling and testing method Download PDF

Info

Publication number
WO2021232856A1
WO2021232856A1 PCT/CN2021/074960 CN2021074960W WO2021232856A1 WO 2021232856 A1 WO2021232856 A1 WO 2021232856A1 CN 2021074960 W CN2021074960 W CN 2021074960W WO 2021232856 A1 WO2021232856 A1 WO 2021232856A1
Authority
WO
WIPO (PCT)
Prior art keywords
product
online
brand
link
score
Prior art date
Application number
PCT/CN2021/074960
Other languages
French (fr)
Chinese (zh)
Inventor
王海涛
赵静
张帆
曹馨宇
吴刚
赵超
丁文兴
Original Assignee
中国标准化研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国标准化研究院 filed Critical 中国标准化研究院
Publication of WO2021232856A1 publication Critical patent/WO2021232856A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation

Definitions

  • the invention relates to an inspection and sampling method, in particular to an inspection and sampling method for online merchandise.
  • the objective of the present invention is to provide a big data-based inspection and sampling method for online sales of commodities.
  • the present invention provides a big data-based online commodity inspection sampling method, which includes an emotion score calculation step and a sampling data calculation step;
  • the steps for calculating the emotional score for a similar product include:
  • the calculation steps of sampling data for a similar product include:
  • the step B1) is: collecting online merchandise links belonging to the same kind of merchandise on the online platform and their corresponding data information including brand data, online store data, review data, and sales volume data;
  • step C2) it also includes step C3): combining the online store data of the similar product and the sampling probability of each brand under the similar product to determine the sample entry of each brand under the similar product in each online sales store Probability
  • step C4) it also includes step C5): according to the total number of products to be sampled for the similar product, determine the sample quantity of each brand of the same product in each online sales store.
  • the method further includes an initialization step; wherein the initialization step includes: A0) a step of constructing and/or updating a product review analysis dictionary based on the user's multi-source review data on the product on the network platform;
  • the comment analysis dictionary includes an emotional word dictionary, a negative word dictionary, a degree word dictionary, and/or a stop word dictionary;
  • the emotional word dictionary includes a number of emotional words and the corresponding emotional word score of each emotional word
  • the negative word dictionary includes several negative words
  • the degree word dictionary includes several degree words and the corresponding degree word scores of each degree word;
  • the stop word dictionary includes several stop words.
  • said B2) performs sentiment analysis on each comment of each online product link collected in step B1) through the emotional tendency analysis method based on the comment analysis dictionary, and calculates each comment under each online product link
  • the sentiment analysis calculation of a certain comment under a certain online marketing product link b ij based on the sentiment tendency analysis method of the comment analysis dictionary to obtain the sentiment score of the comment includes the following steps:
  • m c represents the number of clauses of comment c.
  • the step B3) is: based on the emotional score of each comment under each online marketing product link, combined with standardization and entropy-based weighting processing methods, calculating the product emotional score of each online marketing product link;
  • the product sentiment score of a certain online product link b ij is:
  • n c is the total number of comments under the online merchandise link b ij;
  • ⁇ + and ⁇ - are the positive weight and the negative weight, respectively.
  • the positive weight ⁇ + and the negative weight ⁇ - are obtained through the following steps:
  • Max(S + ) and Min(S + ) are the maximum and minimum values of the positive sentiment score of all reviews of the online product link b ij;
  • the step C1) calculates the prior probability of each online product link of each online product link of each brand of the same product based on the product sentiment score of each online product link of the same product and the brand data, and the product of the same category In the prior probability of each brand under the product,
  • the prior probability of the online product link b ij of the brand B i is:
  • x ij is the product sentiment score of the online product link b ij
  • Max(x) and Min(x) are the maximum and minimum product sentiment scores of all the online product links under the brand B i in the similar product
  • Priori probability brand B i is:
  • w j is the proportion of the product sales of the online product link b ij of the brand B i under the similar product
  • n i is the number of links to all online merchandise of the brand B i under the same kind of merchandise.
  • step C2 determining the sampling probability of each brand under the similar commodity
  • B i ) is the proportion of the sales volume of the brand B i under the same product G
  • n b is the number of all brands under the same product G
  • Said step C4) combined with the total number of products to be sampled of the same kind of commodities, determine the sample quantity of each brand of the same kind of commodities,
  • the sample quantity of products of the brand B i under the similar product G to be inspected is:
  • M is the total number of products to be sampled of similar products G to be inspected, where the symbol Indicates that the calculated number in this symbol is rounded down.
  • the step C3) combines the online store data of the similar product and the sampling probability of each brand under the similar product to determine the sampling probability of each brand under the same product in each online sales store,
  • the sampling probability of the brand B i under the similar product G in the online store T k is:
  • T k ) in the network for the brand B i T k pin proportion of sales in shops similar products G, n t is the number of net sales shops selling the same brand of product G to B i;
  • step C5) according to the total number of products to be sampled for the same product, determine the sample number of products of each brand under the same product in each online sales store,
  • the sample quantity of products of the brand B i in the online sales store T j under the similar product G to be inspected is:
  • step C2 After determining the sampling probability of each brand of the similar product through the step C2), the following steps are further included:
  • n b is the number of all brands under the same product G
  • the multi-source review data includes review data of several online sales platforms.
  • the step B2) further includes an outlier removal step, which includes: performing an sentiment analysis calculation on each comment under a certain online marketing product link by an emotional tendency analysis method based on a comment analysis dictionary to obtain each comment After the sentiment scores, the box-and-plot method is used to remove the outliers in the positive and negative sentiment scores of each comment under the online merchandise link.
  • an outlier removal step which includes: performing an sentiment analysis calculation on each comment under a certain online marketing product link by an emotional tendency analysis method based on a comment analysis dictionary to obtain each comment After the sentiment scores, the box-and-plot method is used to remove the outliers in the positive and negative sentiment scores of each comment under the online merchandise link.
  • step B4) is included after step B3): combining the product emotional score and brand data of each online product link obtained in step B3) to calculate the emotional score of each brand under the same product;
  • x ij is the product sentiment score of the online product link b ij;
  • w j is the proportion of the product sales of the online product link b ij of the brand B i under the similar product
  • n i is the number of links to all online merchandise of the brand B i under the same kind of merchandise.
  • the online product inspection sampling method based on big data provided by the present invention converts the user's qualitative comments on the online product links into emotional scores indicating the quality of products and brands, and then into similar products.
  • the prior probability of each online product link and the prior probability of each brand under the same product so as to determine the sampling probability and the number of product samples for each brand under the same product, and the number of samples for each brand under the same product in each online sales Compared with the existing technology, the sampling probability and product sampling quantity in the store include the following advantages:
  • the sentiment score/emotional orientation analysis obtained based on user comments is first transformed into the probability of the quality of the product or brand corresponding to the online product link.
  • the product user evaluation under each online product link is worse and there are more dissatisfaction with the quality problem, the lower the product sentiment score of the online product link, the greater the prior probability, and the corresponding sample number of the product The higher the value, the more focused attention should be paid to the intensity of sampling; on the contrary, when the product user evaluation under the online product link is relatively better, the corresponding emotional product sentiment score is higher, and the prior probability is lower, then the corresponding product sampling
  • Figure 1 shows the emotional score calculation model of a similar product sold online
  • Figure 2 is a schematic diagram of the emotional word dictionary in the comment analysis dictionary
  • Figure 3 is a schematic diagram of the negative word dictionary in the comment analysis dictionary
  • Figure 4 is a schematic diagram of the degree word dictionary in the comment analysis dictionary
  • Figure 5 is a schematic diagram of the stop word dictionary in the comment analysis dictionary
  • Figure 6 is a schematic diagram of the emotional score calculation process of each comment under the online merchandise link
  • Figure 7 is a schematic diagram of box plots to remove outliers
  • Figure 8 shows a stratified sampling model of similar products sold online
  • FIG. 9 is a comparison diagram of the emotional score of each brand and the prior probability of each brand under similar products (air conditioners) in the example given in the embodiment;
  • FIG. 10 is a comparison diagram of the emotional score, prior probability, and sample probability of each brand under similar products (air conditioners) in the example given in the embodiment;
  • FIG. 11 is a comparison diagram of the sampling probability of each brand under the same product (air-conditioning category) in the example given in the embodiment before and after the strict normalization process;
  • FIG. 12 is a data comparison diagram of positive emotion original score (ScorePositive) and negative emotion original score (ScoreNegative) data of a number of online marketing product links b ij that are not standardized in the example of the embodiment;
  • FIG. 13 is a data comparison diagram of positive sentiment raw scores (ScorePositive) and positive sentiment standard scores (z_ScorePositive) after normalization processing for several online marketing product links b ij in the example of the embodiment;
  • FIG. 14 is a data comparison diagram of negative emotion original scores (ScoreNegative) and negative emotion standard scores (z_ScoreNegative) after normalization processing of several online marketing product links b ij in the example of the embodiment;
  • the big data-based online commodity inspection sampling method provided in this embodiment includes an emotional score calculation step and a sampling data calculation step.
  • the comment analysis dictionary described in the article includes an emotional word dictionary, a negative word dictionary, a degree word dictionary, and/or a stop word dictionary.
  • the comment analysis dictionary can be directly formed by using the sentiment word dictionary, the negative word dictionary, the degree word dictionary and/or the stop word dictionary in the prior art.
  • the comment analysis dictionary can also be constructed and/or updated based on the user's multi-source comment data on the product on the network platform. That is to say, the big data-based online commodity inspection sampling method provided in this embodiment further includes an initialization step. Wherein, the initialization step includes: A0) The step of constructing and/or updating the comment analysis dictionary of the commodity based on the user's multi-source comment data on the commodity on the network platform.
  • the updated product review analysis dictionary can be updated on the basis of the review analysis dictionary constructed from the various dictionaries in the prior art, or it can be constructed based on the user's multi-source review data on the product on the network platform. Updated on the basis of the comment analysis dictionary.
  • the emotional word dictionary includes several emotional words and the emotional word score corresponding to each emotional word.
  • the negative word dictionary includes several negative words. The appearance of negative words will directly turn the sentiment of the sentence in the opposite direction, and the utility is usually superimposed.
  • the degree word dictionary includes several degree words and the corresponding degree word scores of each degree word.
  • the degree word score is a numerical value indicating the strength of the degree adverb.
  • the data format in the degree word dictionary is shown in Figure 4. There are two columns. The first column is the degree word (also called the degree adverb), and the second column is Degree word score (also called degree value), the value>1 indicates that the emotion is strengthened, and the value ⁇ 1 indicates that the emotion is weakened.
  • the stop word dictionary includes several stop words.
  • the above-mentioned multi-source comment data includes comment data of several online sales platforms.
  • Several online sales platforms such as Taobao, Tmall, JD, Suning, etc.
  • the steps for calculating the emotional score for a similar product include (Figure 1 is the emotional score calculation level model for a similar product sold online):
  • the sentiment score calculation described in the article can also be called sentiment analysis, sentiment calculation, sentiment orientation analysis and opinion mining, etc. It is the process of analyzing, processing, inducing and reasoning about subjective text with emotional color. Since the sentence structure of the product review itself is relatively simple and has strong emotional color, the emotional tendency analysis method based on the review analysis dictionary can effectively calculate the emotional tendency of the review.
  • the sentiment analysis method based on the sentiment analysis dictionary of the comment analysis dictionary performs sentiment analysis on a certain comment under a certain online marketing product link b ij to obtain the sentiment score of the comment including the following steps:
  • m c represents the number of clauses of comment c.
  • step B24) here also includes an outlier removal step, which includes: performing sentiment analysis on each comment under a certain online product link through a sentiment analysis method based on a comment analysis dictionary After calculating the sentiment score of each review, the box plot method is used to remove the outliers in the positive and negative sentiment scores of each review under the online merchandise link.
  • the following operations are performed on the online sales commodity link b ij:
  • Step11 Arrange all the positive sentiment scores of each comment under the online merchandise link b ij in descending order to form a set n c is the total number of comments under the online merchandise link b ij, and
  • Step17 Calculate the lower edge value
  • Step18 Determine the outliers in the positive sentiment score And remove.
  • Step21 Arrange all negative sentiment scores of each comment under the online merchandise link b ij in descending order to form a set and
  • Step22 calculated S - median
  • Step28 Determine the outliers in the negative sentiment score And remove.
  • abnormal value removal can also be achieved by other methods of removing abnormal values adopted in the prior art or conventional technical means in the art.
  • the step B3) is: based on the emotional score of each comment under each online marketing product link, combined with standardization and entropy-based weighting processing methods, calculate the product emotional score of each online marketing product link.
  • the step B3) is: based on the sentiment score of each comment under each online product link, combined with the z-score standardization method to calculate the positive and negative values of each online product link Based on the emotional standard score, and then based on the entropy weighted processing method, the product emotional score of each online sales product link is calculated.
  • the product sentiment score of a certain online product link b ij is:
  • n c is the total number of comments under the online merchandise link b ij;
  • ⁇ + and ⁇ - are the positive weight and the negative weight, respectively.
  • ⁇ + and ⁇ - are respectively the positive weight and the negative weight calculated based on the entropy method (it can also be said that the weight is calculated by the entropy method).
  • the positive weight ⁇ + and the negative weight ⁇ - are obtained through the following steps:
  • Max(S + ) and Min(S + ) are the maximum and minimum values of the positive sentiment score of all reviews of the online product link b ij;
  • a simulation experiment is used to demonstrate: a number of online sales commodity links b ij are randomly selected to simulate the above method steps.
  • the original online sales data comes from Tmall.
  • the abscissas of Fig. 12, Fig. 13, Fig. 14 and Fig. 15 are all selected links b ij of online sales products.
  • the original positive sentiment score ScorePositive of each online sales product link is the average score of the positive sentiment scores of all reviews under the corresponding online sales product link
  • the negative sentiment raw score ScoreNegative of each online sales product link is the corresponding online sales The average of the negative sentiment scores of all reviews of the product link.
  • the positive emotion standard score z_ScorePositive of each online merchandise link in the figure after standardized processing that is, the positive emotional standard score corresponding to the online merchandise link b ij in the article
  • the negative emotional standard score z_ScoreNegative of each online product link after standardized processing is the negative emotional standard score corresponding to the online product link b ij in the text
  • the product sentiment score z_Score of each online sales product link b ij after normalization and entropy-based weighting processing also called standardized and entropy-based weighting
  • Figure 12 is a comparison of the raw scores of positive emotions ScorePositive and ScoreNegative of these selected online merchandise links b ij that have not been standardized
  • the calculated original score of positive emotion ScorePositive and the original score of negative emotion scorenegative are quite different, and it is difficult to aggregate directly.
  • FIG. 13 is a data comparison diagram of the raw positive emotion scores ScorePositive of these selected online marketing product links b ij without normalization processing and the positive emotion standard score z_ScorePositive after normalization processing. It can be seen from Figure 13 that the standard score of positive emotion z_ScorePositive after normalization is consistent with the original score of positive emotion ScorePositive without normalization. While maintaining the difference and trend, the difference between the scores is reduced. At the same time, the positive score and the negative score are controlled within a similar range of magnitude, which reduces the degree of influence of the positive emotional score, and facilitates the aggregation of the positive and negative emotional scores, so as to facilitate the link between different online marketing products Compare.
  • FIG. 14 is a data comparison diagram of the negative emotion original scores ScoreNegative without standardization processing for these selected online marketing product links b ij and the negative emotion standard score z_ScoreNegative after standardization processing. It can be seen from Figure 14 that the negative sentiment standard score z_ScoreNegative after the standardization process has the same trend as the original negative sentiment score ScoreNegative that has not been standardized, but it magnifies the difference in the link scores of different online marketing products, making it negative The positive effect is more prominent, with the same order of magnitude as the positive emotion standard score z_ScorePositive, which is convenient for summarizing with the positive emotion score to facilitate comparison between different online sales product links.
  • FIG 15 is a positive feedback of the selected network link b ij of goods sold and Rate-based post-standardized and weighted entropy (and may also be referred to as a standardized method based on the weighted entropy) of each network link pin product b ij Data comparison chart of z_Score of product sentiment score.
  • different websites/online sales platforms may have different scoring systems for the favorable rating (favorable rating), some with a full score of 5 (such as Tmall), and some with a full score of 100% (such as JD). Therefore, it is difficult to directly compare the positive ratings of different websites/online sales platforms.
  • This method uses user reviews to calculate the product sentiment score z_Score of each online sales product link. Even if different websites, the product sentiment score z_Score of each online sales product link has the same meaning and order of magnitude, and it can be directly used between different online sales platforms. Compare.
  • step B4) Combining the product emotional score and brand data of each online sales product link obtained in step B3), calculate the emotional score of each brand under the same product:
  • x ij is the product sentiment score of the online merchandise link b ij;
  • w j is the proportion of sales of goods of the same product under the brand B i in the net sales product link b ij of (for brand B i in the similar products, net sales product link b ij cent of all the net of the similar products of the brand B i of The proportion of the sales of the merchandise link),
  • n i is the number of links to all online merchandise of the brand B i under the same kind of merchandise.
  • the sentiment score/ sentiment orientation analysis based on user reviews, it is transformed into a probability indicating the quality of the product or brand corresponding to the online product link.
  • This probability is based on the review history data of the user's purchase or selection or experience, which is called a priori Probability. Further use the prior probability to calculate the probability of product sampling inspection (called the sampling probability) and the number of product samples during subsequent quality monitoring (such as online product sampling).
  • the sampling probability described in this article can also be called the posterior probability or sampling probability.
  • the sample number of commodities mentioned in the article can also be referred to as the number of sample samples.
  • the calculation steps of sampling data for a similar product include:
  • the prior probability of the online product link b ij of the brand B i is:
  • x ij is the product sentiment score of the online sales product link b ij
  • Max(x) and Min(x) are the maximum and minimum product sentiment scores of all the online sales product links under the brand B i in the similar product.
  • w j is the sales of goods proportion brand B i in the net sales product link b ij in the same product (for brand under the similar goods B i, net sales product link b ij representing this same product of the brand B i of Proportion of sales of all online merchandise links),
  • n i is the number of links to all online merchandise of the brand B i under the same kind of merchandise.
  • this plan adopts stratified sampling, which is mainly divided into two levels: the first level is to determine the brand selected under the same product and the sampling probability of each brand; the second level is to determine the sale of the same product The online sales stores of each brand and the sampling probability of each brand under the same product in each online sales store.
  • B i ) is the proportion of the sales volume of the brand B i under the similar product G
  • n b is the number of all brands under the similar product G.
  • the classification of the similar product G in the article can be large or small according to actual application needs.
  • the similar product G is positioned as an air conditioner.
  • the similar product G can also be positioned as a vertical air conditioner or a wall-mounted air conditioner.
  • products that do not belong to the same category that is, products of different categories, and products corresponding to links of different categories of online merchandise cannot be mixed together for calculation.
  • the present invention further combines the historical data of quality inspection to provide a method of tightening treatment for unqualified products and/or brands in the historical data of quality inspection to increase the probability of sampling.
  • the specific strategy for tightening is as follows: Combining the historical data of quality inspections, if the brands B i , B i+1 ,..., B i+k of the same product G under the historical quality inspection are unqualified in the previous year's supervision and random inspection, they will be given priority the ratio of the number of R5 is selected brand B i, B i + 1, ..., B i + k is the sample probability P (B i
  • the sampling probability corresponding to the unqualified brand selected according to the quality inspection historical data will be relatively increased, that is, the probability of being selected will be increased, while the sampling probability of other brands will be relatively reduced.
  • step C2 determines the sampling probability of each brand of the same product
  • step of tightening and normalization is further included, which includes the following steps:
  • n b is the number of all brands under the same product G
  • the priority number As a ratio, increase the sampling probability of unqualified brands in the historical data of quality inspection.
  • the priority number can be adjusted according to the actual needs of the application.
  • the sampling probability of the brand B i under the similar product G in the online store T k is:
  • T k ) in the network for the brand B i T k pin proportion of sales in shops similar products G
  • n t is the number of net sales shops selling the same brand of product B i G.
  • the sample number of products of the brand B i under the similar product G to be inspected is:
  • M is the total number of products to be sampled of similar products G to be inspected, where the symbol Indicates that the calculated number in this symbol is rounded down.
  • the sample number of products of the brand B i in the online sales store T j under the similar product G to be inspected is:
  • the online sales data in this experimental example mainly comes from Tmall and JD.
  • the calculation of this similar product (air conditioner) The emotional score of each online product link; calculate the emotional score of each brand of the similar product (air conditioner) through the above step B4); calculate the emotional score of each brand of the similar product (air conditioner) through the above step C1) The priori probability of each online-sold product link and the priori probability of each brand of the similar product (air conditioner); through the above step C2) determine the sampling probability of each brand under the same product (air conditioner); through the above Step C3) Determine the sampling probability of each brand of the same type of product (air conditioner) in each online store; then use the above step C4) to determine the sample quantity of each brand of the same type of product (air conditioner); and then pass the above Step C5) Determine the sample quantity of each brand in each online store for the same type of product (air conditioner).
  • Figure 9 shows the comparison of the emotional scores of each brand (ie the brand emotional score and the emotional total score in the figure) and the prior probability of each brand (the abscissa is each brand) under the same product (air-conditioning category).
  • Figure 9 shows that the lower the emotional score of each brand in the overall trend, the higher the prior probability of the brand.
  • Figure 10 shows the comparison chart of the emotional score of each brand, the prior probability of each brand, and the sampling probability of each brand (without strict normalization processing) under the same product (air-conditioning category) (the abscissa is each brand) ), it can be seen from Figure 10 that the lower the emotional score of each brand in the overall trend, the higher the prior probability of the brand.
  • the sampling probability of each brand is subject to the double of the prior probability and the sales volume of each brand. Impact, some brand user reviews are very good, the prior probability will be relatively low, but if the sales volume is high (such as Oaks, Gree, Midea, etc. in the figure), the corresponding sampling probability will also increase. That is to say, if the user reviews are poor, or the users buy a lot, it needs to be spot-checked.
  • the sampling probability in each figure is the sampling probability, which can also be called the posterior probability.
  • this example combines the quality inspection historical data published in the 2018 Shanghai household air conditioner product quality supervision and random inspection results (data source: Shanghai Quality and Technical Supervision Official Website-Information Center- Bulletin Board-Spot Check Report-"2018 Shanghai Household Air Conditioner Product Quality Supervision and Spot Check Results", website link: http://shzj.scjgj.sh.gov.cn/art/2018/9/4/art_358_1325245.html)
  • the publicity data shows that the results of the product spot check under the product brand MBO are unqualified (here only for illustration) (another note: the basic online marketing data in this example relates to the brand MBO in the publicity data, not Involving the other two unqualified brands in the public data), the brand MBO is listed as the selected brand and its sampling probability will be tightened, and then the sampling probability of all brands under the same product (air-conditioning category) will be normalized After the processing of steps C2P1) and C2P2), the adjustment
  • Figure 11 shows the comparison chart of the sample probability of each brand under the same product (air conditioner category) before and after the strict normalization treatment (the abscissa is each brand), as can be seen from Figure 11, after the strict normalization treatment , The sampling probability of the selected brand MBO under this similar product (air conditioner category) has increased compared to before, but at the same time, since the sales volume of MBO brand air conditioners is not high, the change in sampling probability is not abrupt or obvious. Reasonable adjustment after various factors are balanced.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Marketing (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A big data-based online sales commodity sampling and testing method. The method for calculating an emotion score of a certain kind of commodities comprises the following steps: B1) collecting online sales commodity links belonging to the same kind of commodities on a network platform, and data information corresponding to the commodity links and including brand data, comment data, and sales volume data; B2) by means of an emotional tendency analysis method based on a comment analysis dictionary, performing emotional analysis on each comment of each online sales commodity link collected in step B1), and calculating the emotion score of each comment under each online sales commodity link; and B3) based on the emotion score of each comment under each online sales commodity link, calculating a commodity emotion score of each online sales commodity link. Under the condition that the types of the online sales commodities are complex, and particularly under the condition that the total number of all the commodities is uncertain, the method can obtain a suitable sampling probability and a sampling amount, so that a quality monitoring or sampling and testing work is more targeted, has higher efficiency, and is more scientific and reasonable.

Description

基于大数据的网销商品检验抽样方法Sampling Method for Online Product Inspection Based on Big Data 技术领域Technical field
本发明涉及一种检验抽样方法,具体涉及一种网销商品的检验抽样方法。The invention relates to an inspection and sampling method, in particular to an inspection and sampling method for online merchandise.
背景技术Background technique
在质量管理过程中,要对产品进行检测,以判断产品的总体质量状况。在检测中,抽检的数量不仅密切影响着总体质量判断的准确性,同时也和时间成本、资金成本、人员成本等密切相关。In the quality management process, products must be tested to determine the overall quality of the product. In testing, the number of random inspections not only closely affects the accuracy of the overall quality judgment, but is also closely related to time cost, capital cost, and personnel cost.
随着社会生活的不断发展,网络销售模式亦在不断演变和进化,网购逐渐深入到千家万户,越来越多的人接受和选择通过各种网络销售平台采买所需的商品和物资。但是,对于网络平台商品而言,由于存在着品牌多、商品分类复杂、商品总数不确定、销售渠道不固定等诸多问题,尤其是总体商品总数的不确定,使得传统的抽检方法难以很好地适用于网销产品来确定适合的抽样概率或抽样数量。With the continuous development of social life, the online sales model is also constantly evolving and evolving. Online shopping has gradually penetrated into thousands of households, and more and more people accept and choose to purchase required commodities and materials through various online sales platforms. However, for online platform products, there are many problems such as multiple brands, complex product classification, uncertainty in the total number of products, and unfixed sales channels, especially the uncertainty in the total number of products, making it difficult for traditional sampling methods to perform well. It is suitable for online marketing products to determine the appropriate sampling probability or sampling quantity.
网络销售商品的质量控制一直都是质量监管部门、网销平台关注的核心问题,如何在商品总数不确定、销售渠道不固定、品牌多、商品分类复杂的情况下,获得适合的抽样数量,或者如何确定适合的或较少的抽样数量,来获得较高的判断准确性,均是目前亟待解决的问题。The quality control of online sales of goods has always been a core issue that quality supervision departments and online marketing platforms pay attention to. How to obtain a suitable sample number when the total number of goods is uncertain, sales channels are not fixed, there are many brands, and the classification of goods is complicated, or How to determine a suitable or smaller sample size to obtain higher judgment accuracy is a problem that needs to be solved urgently.
发明内容Summary of the invention
发明目的:为了解决现有技术中的不足,本发明的目的是提供一种基于大数据的网销商品检验抽样方法。Objective of the invention: In order to solve the deficiencies in the prior art, the objective of the present invention is to provide a big data-based inspection and sampling method for online sales of commodities.
技术方案:为解决上述技术问题,本发明提供的一种基于大数据的网销商品检验抽样方法,其包括情感得分计算步骤和抽样数据计算步骤;Technical solution: In order to solve the above technical problems, the present invention provides a big data-based online commodity inspection sampling method, which includes an emotion score calculation step and a sampling data calculation step;
其中针对某同类商品的情感得分计算步骤包括:The steps for calculating the emotional score for a similar product include:
B1)采集网络平台上属于该同类商品的网销商品链接及其对应的包括品牌数据、评论数据、销售量数据在内的数据信息;B1) Collect the online merchandise links of the same kind of merchandise on the web platform and their corresponding data information including brand data, review data, and sales volume data;
B2)通过基于评论分析词典的情感倾向分析方法,对步骤B1)采集的各网销商品链接的各条评论进行情感分析,计算得到各网销商品链接下各条评论的情感得分;B2) Through the sentiment analysis method based on the comment analysis dictionary, perform sentiment analysis on each comment of each online product link collected in step B1), and calculate the sentiment score of each comment under each online product link;
B3)基于各网销商品链接下各条评论的情感得分,计算得到各网销商品链 接的商品情感得分;B3) Based on the emotional score of each comment under each online product link, calculate the product emotional score of each online product link;
其中针对某同类商品的抽样数据计算步骤包括:Among them, the calculation steps of sampling data for a similar product include:
C1)根据该同类商品下各网销商品链接的商品情感得分,结合品牌数据,计算得到该同类商品下各品牌的各个网销商品链接的商品先验概率,以及该同类商品下各品牌的先验概率;C1) According to the product sentiment score of each online product link of the same product, combined with brand data, calculate the prior probability of each online product link of each brand of the same product, and the priori probability of each brand of the same product. Probability
C2)结合该同类商品下的品牌数据,确定该同类商品下各品牌的入样概率;C2) Combining the brand data under the similar product to determine the sampling probability of each brand under the similar product;
C4)结合该同类商品的待抽取商品总数,确定该同类商品下各品牌的商品抽样数量。C4) Combining the total number of products to be sampled for the same type of product, determine the number of products to be sampled for each brand of the same type of product.
作为进一步优选的,所述步骤B1)为:采集网络平台上属于该同类商品的网销商品链接及其对应的包括品牌数据、网销店铺数据、评论数据、销售量数据在内的数据信息;As a further preference, the step B1) is: collecting online merchandise links belonging to the same kind of merchandise on the online platform and their corresponding data information including brand data, online store data, review data, and sales volume data;
所述步骤C2)后还包括步骤C3):结合该同类商品下的网销店铺数据和该同类商品下各品牌的入样概率,确定该同类商品下各品牌在各网销店铺中的入样概率;After the step C2), it also includes step C3): combining the online store data of the similar product and the sampling probability of each brand under the similar product to determine the sample entry of each brand under the similar product in each online sales store Probability
所述步骤C4)后还包括步骤C5):根据该同类商品的待抽取商品总数,确定该同类商品下各品牌在各网销店铺中的商品抽样数量。After the step C4), it also includes step C5): according to the total number of products to be sampled for the similar product, determine the sample quantity of each brand of the same product in each online sales store.
优选的,该方法还包括初始化步骤;其中初始化步骤包括:A0)基于网络平台上用户对商品的多源评论数据,构建和/或更新商品的评论分析词典的步骤;Preferably, the method further includes an initialization step; wherein the initialization step includes: A0) a step of constructing and/or updating a product review analysis dictionary based on the user's multi-source review data on the product on the network platform;
所述评论分析词典包括情感词词典、否定词词典、程度词词典和/或停用词词典;The comment analysis dictionary includes an emotional word dictionary, a negative word dictionary, a degree word dictionary, and/or a stop word dictionary;
其中情感词词典中包括若干情感词以及各情感词对应的情感词分值;The emotional word dictionary includes a number of emotional words and the corresponding emotional word score of each emotional word;
其中否定词词典中包括若干否定词;The negative word dictionary includes several negative words;
其中程度词词典中包括若干程度词以及各程度词对应的程度词分值;The degree word dictionary includes several degree words and the corresponding degree word scores of each degree word;
其中停用词词典中包括若干停用词。The stop word dictionary includes several stop words.
作为进一步优选的,所述B2)通过基于评论分析词典的情感倾向分析方法,对步骤B1)采集的各网销商品链接的各条评论进行情感分析,计算得到各网销商品链接下各条评论的情感得分的步骤中,基于评论分析词典的情感倾向分析方法对某一网销商品链接b ij下的某条评论进行情感分析计算得到该条评论的情感得分包括如下步骤: As a further preference, said B2) performs sentiment analysis on each comment of each online product link collected in step B1) through the emotional tendency analysis method based on the comment analysis dictionary, and calculates each comment under each online product link In the step of sentiment score, the sentiment analysis calculation of a certain comment under a certain online marketing product link b ij based on the sentiment tendency analysis method of the comment analysis dictionary to obtain the sentiment score of the comment includes the following steps:
B21)子句分割:根据标点符号将条评论对应的评论文本c,拆分为若干子句
Figure PCTCN2021074960-appb-000001
B21) Clause segmentation: According to punctuation, the comment text c corresponding to a comment is divided into several clauses
Figure PCTCN2021074960-appb-000001
B22)修饰关系分析:根据评论分析词典,针对每个子句,识别该子句中的情感词(a 1,a 2,…)、程度词(d 1,d 2,…)、否定词(h 1,h 2,…)和停用词,并记录其位置;结合停用词确定各程度词、否定词所修饰的目标情感词,并结合评论分析词典中对应的程度词分值和情感词分值,以及否定词的数量,确定该子句中各程度词、否定词与目标情感词之间的修饰关系; B22) Modification relationship analysis: According to the comment analysis dictionary, for each clause, identify the emotional words (a 1 ,a 2 ,...), degree words (d 1 ,d 2 ,...), negative words (h 1 ,h 2 ,...) and stop words, and record their positions; combine the stop words to determine the target sentiment words modified by each degree word and negative word, and analyze the corresponding degree word scores and sentiment words in the dictionary with comments The score and the number of negative words determine the modification relationship between each degree word, negative word and target emotional word in the clause;
B23)各子句情感得分计算:根据获得的修饰关系,确定各个子句的情感得分,其中子句c i的情感得分为: B23) Calculation of the emotional score of each clause: Determine the emotional score of each clause according to the obtained modification relationship, where the emotional score of the clause c i is:
Figure PCTCN2021074960-appb-000002
Figure PCTCN2021074960-appb-000002
其中,|H|表示否定词出现的次数,D表示程度词分值,
Figure PCTCN2021074960-appb-000003
表示情感词w k的情感词分值,n w表示子句c i中情感词出现次数;其中,对s i为正值的子句c i的情感得分用子句正向情感得分
Figure PCTCN2021074960-appb-000004
表示,对s i为负值的子句c i的情感得分用子句负向情感得分用
Figure PCTCN2021074960-appb-000005
表示;
Among them, |H| represents the number of times the negative word appears, D represents the score of the degree word,
Figure PCTCN2021074960-appb-000003
Represents the emotional word score of the emotional word w k , n w represents the number of occurrences of the emotional word in the clause c i ; among them, the emotional score of the clause c i where s i is a positive value uses the clause positive emotion score
Figure PCTCN2021074960-appb-000004
Indicates that the emotional score of the clause c i whose s i is negative is used for the negative emotional score of the clause
Figure PCTCN2021074960-appb-000005
Express;
B24)该条评论情感得分计算:针对该条评论对应的评论文本c,将其所有子句中的子句正向情感得分进行累加,得到该条评论的正向情感得分s +,将其所有子句中的子句负向情感得分进行累加,得到该条评论的负向情感得分s -B24) Calculation of the sentiment score of the comment: For the comment text c corresponding to the comment, add up the positive sentiment scores of all clauses to obtain the positive sentiment score s + of the comment, and all The negative sentiment scores of the clauses in the clause are accumulated to obtain the negative sentiment score s -of the comment:
Figure PCTCN2021074960-appb-000006
Figure PCTCN2021074960-appb-000006
Figure PCTCN2021074960-appb-000007
Figure PCTCN2021074960-appb-000007
其中,m c表示评论c的子句数量。 Among them, m c represents the number of clauses of comment c.
优选的,所述步骤B3)为:基于各网销商品链接下各条评论的情感得分,结合标准化和基于熵的加权处理方法,计算得到各网销商品链接的商品情感得分;Preferably, the step B3) is: based on the emotional score of each comment under each online marketing product link, combined with standardization and entropy-based weighting processing methods, calculating the product emotional score of each online marketing product link;
其中,某一网销商品链接b ij的商品情感得分为: Among them, the product sentiment score of a certain online product link b ij is:
Figure PCTCN2021074960-appb-000008
Figure PCTCN2021074960-appb-000008
其中,
Figure PCTCN2021074960-appb-000009
Figure PCTCN2021074960-appb-000010
分别为该网销商品链接b ij的正、负向情感标准分:
in,
Figure PCTCN2021074960-appb-000009
with
Figure PCTCN2021074960-appb-000010
These are the positive and negative emotional standard scores of the online merchandise link b ij:
Figure PCTCN2021074960-appb-000011
Figure PCTCN2021074960-appb-000011
Figure PCTCN2021074960-appb-000012
Figure PCTCN2021074960-appb-000012
其中,in,
n c是该网销商品链接b ij下评论的总条数; n c is the total number of comments under the online merchandise link b ij;
Figure PCTCN2021074960-appb-000013
Figure PCTCN2021074960-appb-000014
分别为该网销商品链接b ij第k条评论c k的正、负向情感得分;
Figure PCTCN2021074960-appb-000013
with
Figure PCTCN2021074960-appb-000014
Respectively are the positive and negative sentiment scores of the kth comment c k of the online merchandise link b ij;
Figure PCTCN2021074960-appb-000015
Figure PCTCN2021074960-appb-000016
分别为该网销商品链接b ij所有评论的正、负向情感得分的平均值;
Figure PCTCN2021074960-appb-000015
with
Figure PCTCN2021074960-appb-000016
Are the average values of the positive and negative sentiment scores of all comments on the online merchandise link b ij;
Figure PCTCN2021074960-appb-000017
Figure PCTCN2021074960-appb-000018
分别为该网销商品链接b ij所有评论的正、负向情感得分的标准差;
Figure PCTCN2021074960-appb-000017
with
Figure PCTCN2021074960-appb-000018
Are the standard deviations of the positive and negative sentiment scores of all comments on the online merchandise link b ij;
其中,α +和α -分别为正向权重和负向权重。 Among them, α + and α - are the positive weight and the negative weight, respectively.
进一步优选的,针对网销商品链接b ij,所述正向权重α +和负向权重α -通过如下步骤获得: Further preferably, for the online merchandise link b ij , the positive weight α + and the negative weight α - are obtained through the following steps:
K1)将网销商品链接b ij下的各条评论的正、负向情感得分分别进行min-max标准化处理,使结果映射到[0,1]区间,包括: K1) Perform min-max standardization on the positive and negative sentiment scores of each comment under the online merchandise link b ij to map the results to the [0,1] interval, including:
将网销商品链接b ij下所有评论中的正向情感得分转化为正向指标,其中,网销商品链接b ij第u条评论的正向指标
Figure PCTCN2021074960-appb-000019
为:
Convert the positive sentiment scores of all comments under the online merchandise link b ij into a positive indicator, where the online merchandise link b ij is the positive indicator of the uth comment
Figure PCTCN2021074960-appb-000019
for:
Figure PCTCN2021074960-appb-000020
Figure PCTCN2021074960-appb-000020
以及,将网销商品链接b ij下所有评论中的负向情感得分转化为负向指标,其中,网销商品链接b ij第u条评论的负向指标
Figure PCTCN2021074960-appb-000021
为:
And, transform the negative sentiment scores of all comments under the online marketing product link b ij into a negative indicator, where the online marketing product link b ij is the negative indicator of the uth comment
Figure PCTCN2021074960-appb-000021
for:
Figure PCTCN2021074960-appb-000022
Figure PCTCN2021074960-appb-000022
其中,u=1,2,…n cAmong them, u=1,2,...n c ;
Figure PCTCN2021074960-appb-000023
为网销商品链接b ij第u条评论的正向情感得分;Max(S +),Min(S +)分别为网销商品链接b ij所有评论中正向情感得分的最大值和最小值;
Figure PCTCN2021074960-appb-000023
Is the positive sentiment score of the uth comment of the online product link b ij ; Max(S + ) and Min(S + ) are the maximum and minimum values of the positive sentiment score of all reviews of the online product link b ij;
Figure PCTCN2021074960-appb-000024
为网销商品链接b ij第u条评论的负向情感得分;Max(S -),Min(S -)分别为网销商品链接b ij所有评论中负向情感得分的最大值和最小值;
Figure PCTCN2021074960-appb-000024
Score of net sales product link b ij u review article negative emotions; Max (S -), Min (S -) were linked network of goods sold b ij review all negative emotion score maximum and minimum;
K2)计算网销商品链接b ij各条评论的正、负向指标的比重,其中,该网销商品链接b ij第u条评论的正、负向指标的比重分别为: Positive, negative gravity to index K2) calculated network link b ij of each pin product reviews, wherein the web link pin product u b ij article reviews the positive, negative gravity indicator are:
Figure PCTCN2021074960-appb-000025
Figure PCTCN2021074960-appb-000025
Figure PCTCN2021074960-appb-000026
Figure PCTCN2021074960-appb-000026
K3)计算网销商品链接b ij下所有评论的正向指标的熵值e +和负向指标的熵值e -K3) positive indicators calculated net sales in all product link b ij comment entropy e + and negative indicators of entropy e -:
Figure PCTCN2021074960-appb-000027
Figure PCTCN2021074960-appb-000027
Figure PCTCN2021074960-appb-000028
Figure PCTCN2021074960-appb-000028
K4)计算网销商品链接b ij下所有评论的正向指标的差异性系数g +和负向指标的差异性系数g -K4) the calculation of net sales product link b ij difference coefficient under all positive indicators of the comments of g + and negative indicators of differences in coefficient g -:
g +=1-e + g + =1-e +
g -=1-e - g - = 1-e -
K5)计算得到网销商品链接b ij的正向权重a +和负向权重a -K5) Calculate the positive weight a + and the negative weight a -of the online merchandise link b ij :
Figure PCTCN2021074960-appb-000029
Figure PCTCN2021074960-appb-000029
Figure PCTCN2021074960-appb-000030
Figure PCTCN2021074960-appb-000030
优选的,所述步骤C1)根据该同类商品下各网销商品链接的商品情感得分,结合品牌数据,计算得到该同类商品下各品牌的各个网销商品链接的商品先验概率,以及该同类商品下各品牌的先验概率中,Preferably, the step C1) calculates the prior probability of each online product link of each online product link of each brand of the same product based on the product sentiment score of each online product link of the same product and the brand data, and the product of the same category In the prior probability of each brand under the product,
品牌B i下网销商品链接b ij的先验概率为: The prior probability of the online product link b ij of the brand B i is:
P(b ij)=(Max(x)-x ij)/(Max(x)-Min(x))×100% P(b ij )=(Max(x)-x ij )/(Max(x)-Min(x))×100%
其中,x ij为网销商品链接b ij的商品情感得分,Max(x)和Min(x)为该同类商品中品牌B i下所有网销商品链接中商品情感得分的最大值和最小值; Among them, x ij is the product sentiment score of the online product link b ij , and Max(x) and Min(x) are the maximum and minimum product sentiment scores of all the online product links under the brand B i in the similar product;
品牌B i的先验概率为: Priori probability brand B i is:
Figure PCTCN2021074960-appb-000031
Figure PCTCN2021074960-appb-000031
其中,w j是该同类商品下品牌B i中网销商品链接b ij的商品销量比重,
Figure PCTCN2021074960-appb-000032
n i为该同类商品下该品牌B i的所有网销商品链接的数量。
Among them, w j is the proportion of the product sales of the online product link b ij of the brand B i under the similar product,
Figure PCTCN2021074960-appb-000032
n i is the number of links to all online merchandise of the brand B i under the same kind of merchandise.
优选的,所述步骤C2)确定该同类商品下各品牌的入样概率中,Preferably, in the step C2) determining the sampling probability of each brand under the similar commodity,
同类商品G下品牌B i的入样概率为: The sampling probability of brand B i under similar product G is:
Figure PCTCN2021074960-appb-000033
Figure PCTCN2021074960-appb-000033
其中,P(G|B i)为同类商品G下品牌B i的销售量比重,n b为同类商品G下所有品牌的数量; Among them, P(G|B i ) is the proportion of the sales volume of the brand B i under the same product G , and n b is the number of all brands under the same product G;
所述步骤C4)结合该同类商品的待抽取商品总数,确定该同类商品下各品牌的商品抽样数量中,Said step C4) combined with the total number of products to be sampled of the same kind of commodities, determine the sample quantity of each brand of the same kind of commodities,
待检同类商品G下品牌B i的商品抽样数量为: The sample quantity of products of the brand B i under the similar product G to be inspected is:
Figure PCTCN2021074960-appb-000034
Figure PCTCN2021074960-appb-000034
其中,M为待检同类商品G的待抽取商品总数,其中符号
Figure PCTCN2021074960-appb-000035
表示该符号中计算出的数字向下取整。
Among them, M is the total number of products to be sampled of similar products G to be inspected, where the symbol
Figure PCTCN2021074960-appb-000035
Indicates that the calculated number in this symbol is rounded down.
优选的,所述步骤C3)结合该同类商品下的网销店铺数据和该同类商品下各品牌的入样概率,确定该同类商品下各品牌在各网销店铺中的入样概率中,Preferably, the step C3) combines the online store data of the similar product and the sampling probability of each brand under the similar product to determine the sampling probability of each brand under the same product in each online sales store,
同类商品G下品牌B i在网销店铺T k的入样概率为: The sampling probability of the brand B i under the similar product G in the online store T k is:
Figure PCTCN2021074960-appb-000036
Figure PCTCN2021074960-appb-000036
其中,P(B i|T k)为同类商品G下品牌B i中网销店铺T k的销售量比重,n t为同类商品G下销售品牌B i的网销店铺的数量; Wherein, P (B i | T k ) in the network for the brand B i T k pin proportion of sales in shops similar products G, n t is the number of net sales shops selling the same brand of product G to B i;
所述步骤C5)根据该同类商品的待抽取商品总数,确定该同类商品下各品牌在各网销店铺中的商品抽样数量中,Said step C5) according to the total number of products to be sampled for the same product, determine the sample number of products of each brand under the same product in each online sales store,
待检同类商品G下品牌B i在网销店铺T j中的商品抽样数量为: The sample quantity of products of the brand B i in the online sales store T j under the similar product G to be inspected is:
Figure PCTCN2021074960-appb-000037
Figure PCTCN2021074960-appb-000037
其中符号
Figure PCTCN2021074960-appb-000038
表示该符号中计算出的数字向下取整。
Where the symbol
Figure PCTCN2021074960-appb-000038
Indicates that the calculated number in this symbol is rounded down.
进一步优选的,通过所述步骤C2)确定该同类商品下各品牌的入样概率后,还包括如下步骤:Further preferably, after determining the sampling probability of each brand of the similar product through the step C2), the following steps are further included:
C2P1)结合质检历史数据,以优先数R5为比率对同类商品G下的选定品牌B i,B i+1,…,B i+h的入样概率进行加严处理,进而对同类商品G下所有品牌的入样概率进行归一化处理: C2P1) Combining the historical data of quality inspection, use the priority number R5 as the ratio to strictly deal with the sampling probability of selected brands B i , B i+1 ,..., B i+h under similar product G, and then treat similar products The sampling probability of all brands under G is normalized:
Figure PCTCN2021074960-appb-000039
Figure PCTCN2021074960-appb-000039
其中,in,
Figure PCTCN2021074960-appb-000040
Figure PCTCN2021074960-appb-000040
此处,n b为同类商品G下所有品牌的数量; Here, n b is the number of all brands under the same product G;
C2P2)更新同类商品G下所有品牌的入样概率为:C2P2) Update the sampling probability of all brands under similar product G as:
P(B k|G)=P′(B k|G) P(B k |G)=P′(B k |G)
其中,k∈[1,n b]。 Among them, k∈[1,n b ].
优选的,所述多源评论数据包括若干网络销售平台的评论数据。Preferably, the multi-source review data includes review data of several online sales platforms.
优选的,所述步骤B2)中还包括异常值去除步骤,该步骤包括:通过基于评论分析词典的情感倾向分析方法对某一网销商品链接下的各条评论进行情感分析计算得到各条评论的情感得分后,利用箱线图法,将该网销商品链接下的各条评论的正、负向情感得分中的异常值去除。Preferably, the step B2) further includes an outlier removal step, which includes: performing an sentiment analysis calculation on each comment under a certain online marketing product link by an emotional tendency analysis method based on a comment analysis dictionary to obtain each comment After the sentiment scores, the box-and-plot method is used to remove the outliers in the positive and negative sentiment scores of each comment under the online merchandise link.
优选的,所述步骤B3)后还包括步骤B4):结合步骤B3)得到的各网销商品链接的商品情感得分和品牌数据,计算该同类商品下各品牌的情感得分;Preferably, step B4) is included after step B3): combining the product emotional score and brand data of each online product link obtained in step B3) to calculate the emotional score of each brand under the same product;
其中该同类商品下某一品牌B i的情感得分为: Among them, the emotional score of a certain brand B i under the similar product is:
Figure PCTCN2021074960-appb-000041
Figure PCTCN2021074960-appb-000041
其中,in,
x ij为网销商品链接b ij的商品情感得分; x ij is the product sentiment score of the online product link b ij;
w j是该同类商品下品牌B i中网销商品链接b ij的商品销量比重,
Figure PCTCN2021074960-appb-000042
n i为该同类商品下该品牌B i的所有网销商品链接的数量。
w j is the proportion of the product sales of the online product link b ij of the brand B i under the similar product,
Figure PCTCN2021074960-appb-000042
n i is the number of links to all online merchandise of the brand B i under the same kind of merchandise.
优选的,所述优先数
Figure PCTCN2021074960-appb-000043
Preferably, the priority number
Figure PCTCN2021074960-appb-000043
有益效果:本发明提供的基于大数据的网销商品检验抽样方法,通过将用户对网销商品链接下的定性评论,转化为表示商品、品牌质量优劣的情感得分,进而转化为同类商品下各个网销商品链接的商品先验概率,以及该同类商品下各品牌的先验概率,从而确定出同类商品下各品牌的入样概率和商品抽样数量,以及 同类商品下各品牌在各网销店铺中的入样概率和商品抽样数量,与现有技术相比,包括如下优点:Beneficial effects: The online product inspection sampling method based on big data provided by the present invention converts the user's qualitative comments on the online product links into emotional scores indicating the quality of products and brands, and then into similar products. The prior probability of each online product link and the prior probability of each brand under the same product, so as to determine the sampling probability and the number of product samples for each brand under the same product, and the number of samples for each brand under the same product in each online sales Compared with the existing technology, the sampling probability and product sampling quantity in the store include the following advantages:
1、基于大数据,将网销商品的用户评论转化为表示商品、品牌质量优劣的情感得分,并以此作为先验概率为后续检验抽样提供计算基础,可以在商品总数不确定、销售渠道不固定、品牌多、商品分类复杂的情况下,尤其是总体商品总数不确定的情况下,即可获得相对适合的入样概率和抽样数量,使质量监控或抽检工作目标性更强,从而在有限的资源下大幅提升抽样检测的效率。1. Based on big data, convert user reviews of online products into emotional scores that indicate the quality of products and brands, and use this as a prior probability to provide a calculation basis for subsequent inspection sampling. In the case of unfixed, multiple brands and complex product classification, especially when the total number of products is uncertain, relatively suitable sampling probability and sampling quantity can be obtained, which makes the quality monitoring or sampling work more targeted. With limited resources, the efficiency of sampling and testing has been greatly improved.
2、通过本发明进行网销商品抽样检测前的抽样工作时,先将基于用户评论得到的情感得分/情感倾向性分析,转化为表示网销商品链接对应的商品或品牌质量优劣的概率,当各网销商品链接下的商品用户评价越差、对质量问题存在较多不满时,该网销商品链接的商品情感得分就越低,先验概率就越大,则对应的商品抽样数量则越高,以加强抽检力度进行重点关注;反之,当网销商品链接下的商品用户评价相对越好,其对应的情商品情感得分就越高,先验概率就越低,则对应的商品抽样数量则越低,以相对降低抽检力度;抽样的数据受到用户评论的影响而各不相同且各有侧重,更为科学合理。2. When performing the sampling work before the online product sampling and testing through the present invention, the sentiment score/emotional orientation analysis obtained based on user comments is first transformed into the probability of the quality of the product or brand corresponding to the online product link. When the product user evaluation under each online product link is worse and there are more dissatisfaction with the quality problem, the lower the product sentiment score of the online product link, the greater the prior probability, and the corresponding sample number of the product The higher the value, the more focused attention should be paid to the intensity of sampling; on the contrary, when the product user evaluation under the online product link is relatively better, the corresponding emotional product sentiment score is higher, and the prior probability is lower, then the corresponding product sampling The lower the number, in order to relatively reduce the intensity of sampling; the sampled data are affected by user comments and are different and have their own emphasis, which is more scientific and reasonable.
3、通过将用户对网销商品链接下的定性评论,转化为表示商品、品牌质量优劣的情感得分,放大差评(即负向情感得分)的作用,凸显更有问题的商品和品牌。3. By transforming users' qualitative comments on online merchandise links into emotional scores that indicate the quality of the product and brand, amplifying the effect of negative reviews (ie negative emotional scores) and highlighting more problematic products and brands.
4、进一步提供以品牌为维度进行情感得分倾向的统计,可以避免由于网销商品对应的网销商品链接名称复杂、散乱等带来的不利干扰影响。4. Further provide statistics on the sentiment scoring tendency with brand as the dimension, which can avoid the adverse interference caused by the complicated and scattered names of the online commodity links corresponding to the online commodity.
5、进一步的,可大幅减弱刷单等现象带来的无用评价(如重复评价、套话评价等)造成的数据影响和干扰,相比现有的好评率或好评度,提供更有参考意义的情感倾向得分,供后续检验抽样提供参考。5. Further, it can greatly reduce the data impact and interference caused by useless evaluations (such as repeated evaluations, cliché evaluations, etc.) caused by phenomenon such as brushing orders. Compared with the existing praise rate or praise rating, it provides more reference significance The emotional tendency score is provided as a reference for subsequent inspection sampling.
6、进一步的,结合历史质检数据,引入以优先数为比率对同类商品下的某些选定品牌的入样概率进行加严处理的方法,使得检验抽样结合实时数据和历史数据,更合理更有针对性。6. Further, combined with historical quality inspection data, the introduction of a method of tightening the sampling probability of certain selected brands under the same commodity based on the priority number, makes the inspection sampling combined with real-time data and historical data, which is more reasonable More targeted.
7、实际应用场景下用户评论不断增加,本方法无需事先进行模型训练,克实时快捷地适应评论数量变化,实现实时采集、实时计算,或滚动采集、 累积计算等各种方式,实时性强,灵活度高。7. User comments continue to increase in actual application scenarios. This method does not require model training in advance. It can quickly and quickly adapt to changes in the number of comments in real time. It can realize real-time collection, real-time calculation, or rolling collection, cumulative calculation and other methods, with strong real-time performance. High flexibility.
附图说明Description of the drawings
图1为某网销同类商品的情感得分计算层次模型;Figure 1 shows the emotional score calculation model of a similar product sold online;
图2为评论分析词典中的情感词词典示意图;Figure 2 is a schematic diagram of the emotional word dictionary in the comment analysis dictionary;
图3为评论分析词典中的否定词词典示意图;Figure 3 is a schematic diagram of the negative word dictionary in the comment analysis dictionary;
图4为评论分析词典中的程度词词典示意图;Figure 4 is a schematic diagram of the degree word dictionary in the comment analysis dictionary;
图5为评论分析词典中的停用词词典示意图;Figure 5 is a schematic diagram of the stop word dictionary in the comment analysis dictionary;
图6为网销商品链接下各条评论的情感得分计算流程示意图;Figure 6 is a schematic diagram of the emotional score calculation process of each comment under the online merchandise link;
图7为箱线图去除异常值的示意图;Figure 7 is a schematic diagram of box plots to remove outliers;
图8为某网销同类商品的分层抽样模型;Figure 8 shows a stratified sampling model of similar products sold online;
图9为实施例中给出的示例中同类商品(空调类)下各品牌的情感得分与各品牌的先验概率的对照图;FIG. 9 is a comparison diagram of the emotional score of each brand and the prior probability of each brand under similar products (air conditioners) in the example given in the embodiment;
图10为实施例中给出的示例中同类商品(空调类)下各品牌的情感得分、先验概率以及入样概率的对照图;FIG. 10 is a comparison diagram of the emotional score, prior probability, and sample probability of each brand under similar products (air conditioners) in the example given in the embodiment;
图11为实施例中给出的示例中同类商品(空调类)下各品牌的入样概率在加严归一化处理前后的对照图;FIG. 11 is a comparison diagram of the sampling probability of each brand under the same product (air-conditioning category) in the example given in the embodiment before and after the strict normalization process;
图12为实施例的举例中若干网销商品链接b ij未进行标准化处理的的正向情感原始得分(ScorePositive)和负向情感原始得分(ScoreNegative)数据对比图; FIG. 12 is a data comparison diagram of positive emotion original score (ScorePositive) and negative emotion original score (ScoreNegative) data of a number of online marketing product links b ij that are not standardized in the example of the embodiment;
图13为实施例的举例中若干网销商品链接b ij未进行标准化处理的正向情感原始得分(ScorePositive)和进行标准化处理后的正向情感标准分(z_ScorePositive)的数据对比图; FIG. 13 is a data comparison diagram of positive sentiment raw scores (ScorePositive) and positive sentiment standard scores (z_ScorePositive) after normalization processing for several online marketing product links b ij in the example of the embodiment;
图14为实施例的举例中若干网销商品链接b ij未进行标准化处理的负向情感原始得分(ScoreNegative)和进行标准化处理后的负向情感标准分(z_ScoreNegative)的数据对比图; FIG. 14 is a data comparison diagram of negative emotion original scores (ScoreNegative) and negative emotion standard scores (z_ScoreNegative) after normalization processing of several online marketing product links b ij in the example of the embodiment;
图15为实施例的举例中若干网销商品链接b ij的好评度(Rate)与经过标准化和基于熵的加权处理后的各网销商品链接b ij的商品情感得分(z_Score)的数据对比图。 15 is the plurality of net sales product link b ij praise of (Rate) and the data comparison chart standardized and based on the respective net sales product link weighted entropy process b ij commodity emotion score (z_Score) exemplified embodiment .
具体实施方式Detailed ways
下面结合实施例和附图对本发明做进一步的详细说明,以下实施列对本发明不构成限定。The present invention will be further described in detail below in conjunction with the embodiments and the drawings, and the following embodiments do not limit the present invention.
本实施例提供的基于大数据的网销商品检验抽样方法,包括情感得分计算步骤和抽样数据计算步骤。The big data-based online commodity inspection sampling method provided in this embodiment includes an emotional score calculation step and a sampling data calculation step.
文中所述评论分析词典包括情感词词典、否定词词典、程度词词典和/或停用词词典。该评论分析词典可采用现有技术中的情感词词典、否定词词典、程度词词典和/或停用词词典直接组建而成。The comment analysis dictionary described in the article includes an emotional word dictionary, a negative word dictionary, a degree word dictionary, and/or a stop word dictionary. The comment analysis dictionary can be directly formed by using the sentiment word dictionary, the negative word dictionary, the degree word dictionary and/or the stop word dictionary in the prior art.
当然在某些实施例中,该评论分析词典也可根据网络平台上用户对商品的多源评论数据,来进行构建和/或更新。也即在该实施例所提供的基于大数据的网销商品检验抽样方法中,还包括初始化步骤。其中,该初始化步骤包括:A0)基于网络平台上用户对商品的多源评论数据,构建和/或更新商品的评论分析词典的步骤。Of course, in some embodiments, the comment analysis dictionary can also be constructed and/or updated based on the user's multi-source comment data on the product on the network platform. That is to say, the big data-based online commodity inspection sampling method provided in this embodiment further includes an initialization step. Wherein, the initialization step includes: A0) The step of constructing and/or updating the comment analysis dictionary of the commodity based on the user's multi-source comment data on the commodity on the network platform.
其中,更新商品的评论分析词典,可在上述由现有技术中的各词典组建而成的评论分析词典的基础上更新,也可在根据网络平台上用户对商品的多源评论数据构建而成的评论分析词典的基础上更新。Among them, the updated product review analysis dictionary can be updated on the basis of the review analysis dictionary constructed from the various dictionaries in the prior art, or it can be constructed based on the user's multi-source review data on the product on the network platform. Updated on the basis of the comment analysis dictionary.
如图2所示,其中情感词词典中包括若干情感词以及各情感词对应的情感词分值。如图3所示,其中否定词词典中包括若干否定词。否定词的出现将直接将句子情感转向相反的方向,而且通常效用是叠加的。As shown in Figure 2, the emotional word dictionary includes several emotional words and the emotional word score corresponding to each emotional word. As shown in Figure 3, the negative word dictionary includes several negative words. The appearance of negative words will directly turn the sentiment of the sentence in the opposite direction, and the utility is usually superimposed.
如图4所示,其中程度词词典中包括若干程度词以及各程度词对应的程度词分值。其中程度词分值为表示该程度副词强弱的数值,程度词词典内数据格式如图4所示,共两列,第一列为程度词(也可称为程度副词),第二列是程度词分值(也可称为程度数值),该值>1表示强化情感,该值<1表示弱化情感。As shown in Figure 4, the degree word dictionary includes several degree words and the corresponding degree word scores of each degree word. The degree word score is a numerical value indicating the strength of the degree adverb. The data format in the degree word dictionary is shown in Figure 4. There are two columns. The first column is the degree word (also called the degree adverb), and the second column is Degree word score (also called degree value), the value>1 indicates that the emotion is strengthened, and the value<1 indicates that the emotion is weakened.
如图5所示,其中停用词词典中包括若干停用词。As shown in Figure 5, the stop word dictionary includes several stop words.
上述多源评论数据包括若干网络销售平台的评论数据。其中若干网络销售平台如淘宝、天猫、京东、苏宁等。The above-mentioned multi-source comment data includes comment data of several online sales platforms. Several online sales platforms such as Taobao, Tmall, JD, Suning, etc.
其中针对某同类商品的情感得分计算步骤包括(图1为某网销同类商品的情感得分计算层次模型):The steps for calculating the emotional score for a similar product include (Figure 1 is the emotional score calculation level model for a similar product sold online):
B1)采集网络平台上属于该同类商品的网销商品链接及其对应的包括商品数据、品牌数据、评论数据、销售量数据在内的数据信息。B1) Collect online merchandise links belonging to the same kind of merchandise on the web platform and their corresponding data information including merchandise data, brand data, review data, and sales volume data.
B2)通过基于评论分析词典的情感倾向分析方法,对步骤B1)采集的各网 销商品链接的各条评论进行情感分析,计算得到各网销商品链接下各条评论的情感得分:B2) Through the sentiment analysis method based on the comment analysis dictionary, perform sentiment analysis on each comment of each online product link collected in step B1), and calculate the sentiment score of each comment under each online product link:
文中所述情感得分计算,又可称为情感分析、情感计算、情感倾向性分析和意见挖掘等等,它是对带有情感色彩的主观性文本进行分析、处理、归纳和推理的过程。由于商品评论本身句子结构较为简单且存在强烈的情感色彩,所以使用基于评论分析词典的情感倾向分析方法可以有效的计算评论的情感倾向。The sentiment score calculation described in the article can also be called sentiment analysis, sentiment calculation, sentiment orientation analysis and opinion mining, etc. It is the process of analyzing, processing, inducing and reasoning about subjective text with emotional color. Since the sentence structure of the product review itself is relatively simple and has strong emotional color, the emotional tendency analysis method based on the review analysis dictionary can effectively calculate the emotional tendency of the review.
其中如图6所示,基于评论分析词典的情感倾向分析方法对某一网销商品链接b ij下的某条评论进行情感分析计算得到该条评论的情感得分包括如下步骤: As shown in Figure 6, the sentiment analysis method based on the sentiment analysis dictionary of the comment analysis dictionary performs sentiment analysis on a certain comment under a certain online marketing product link b ij to obtain the sentiment score of the comment including the following steps:
B21)子句分割:根据标点符号将条评论对应的评论文本c,拆分为若干子句
Figure PCTCN2021074960-appb-000044
B21) Clause segmentation: According to punctuation, the comment text c corresponding to a comment is divided into several clauses
Figure PCTCN2021074960-appb-000044
B22)修饰关系分析:根据评论分析词典,针对每个子句,识别该子句中的情感词(a 1,a 2,…)、程度词(d 1,d 2,…)、否定词(h 1,h 2,…)和停用词,并记录其位置;结合停用词确定各程度词、否定词所修饰的目标情感词,并结合评论分析词典中对应的程度词分值和情感词分值,以及否定词的数量,确定该子句中各程度词、否定词与目标情感词之间的修饰关系; B22) Modification relationship analysis: According to the comment analysis dictionary, for each clause, identify the emotional words (a 1 ,a 2 ,...), degree words (d 1 ,d 2 ,...), negative words (h 1 ,h 2 ,...) and stop words, and record their positions; combine the stop words to determine the target sentiment words modified by each degree word and negative word, and analyze the corresponding degree word scores and sentiment words in the dictionary with comments The score and the number of negative words determine the modification relationship between each degree word, negative word and target emotional word in the clause;
B23)各子句情感得分计算:根据获得的修饰关系,确定各个子句的情感得分,其中子句c i的情感得分为: B23) Calculation of the emotional score of each clause: Determine the emotional score of each clause according to the obtained modification relationship, where the emotional score of the clause c i is:
Figure PCTCN2021074960-appb-000045
Figure PCTCN2021074960-appb-000045
其中,|H|表示否定词出现的次数,D表示程度词分值,
Figure PCTCN2021074960-appb-000046
表示情感词w k的情感词分值,n w表示子句c i中情感词出现次数;其中,对s i为正值的子句c i的情感得分用子句正向情感得分
Figure PCTCN2021074960-appb-000047
表示,对s i为负值的子句c i的情感得分用子句负向情感得分用
Figure PCTCN2021074960-appb-000048
表示;
Among them, |H| represents the number of times the negative word appears, D represents the score of the degree word,
Figure PCTCN2021074960-appb-000046
Represents the emotional word score of the emotional word w k , n w represents the number of occurrences of the emotional word in the clause c i ; among them, the emotional score of the clause c i where s i is a positive value uses the clause positive emotion score
Figure PCTCN2021074960-appb-000047
Indicates that the emotional score of the clause c i whose s i is negative is used for the negative emotional score of the clause
Figure PCTCN2021074960-appb-000048
Express;
B24)该条评论情感得分计算(亦可称为评论子句倾向汇总):针对该条评论对应的评论文本c,将其所有子句中的子句正向情感得分进行累加,得到该条评论的正向情感得分s +,将其所有子句中的子句负向情感得分进行累加,得到该条评论的负向情感得分s -B24) Calculation of the sentiment score of the comment (also called the summary of the comment clause tendency): For the comment text c corresponding to the comment, the positive sentiment scores of the clauses in all its clauses are accumulated to obtain the comment The positive sentiment score s + , and the negative sentiment scores in all clauses of the clause are accumulated to obtain the negative sentiment score s -of the comment:
Figure PCTCN2021074960-appb-000049
Figure PCTCN2021074960-appb-000049
Figure PCTCN2021074960-appb-000050
Figure PCTCN2021074960-appb-000050
其中,m c表示评论c的子句数量。 Among them, m c represents the number of clauses of comment c.
在某些优选实施例中,此处步骤B24)后还包括异常值去除步骤,该步骤包括:通过基于评论分析词典的情感倾向分析方法对某一网销商品链接下的各条评论进行情感分析计算得到各条评论的情感得分后,利用箱线图法,将该网销商品链接下的各条评论的正、负向情感得分中的异常值去除。具体的,本实施例中,如图7所示,对网销商品链接b ij执行以下操作: In some preferred embodiments, step B24) here also includes an outlier removal step, which includes: performing sentiment analysis on each comment under a certain online product link through a sentiment analysis method based on a comment analysis dictionary After calculating the sentiment score of each review, the box plot method is used to remove the outliers in the positive and negative sentiment scores of each review under the online merchandise link. Specifically, in this embodiment, as shown in Figure 7, the following operations are performed on the online sales commodity link b ij:
1)去除网销商品链接b ij下的各条评论的所有正向情感得分中的异常值 1) Remove all the outliers in the positive sentiment scores of each comment under the online merchandise link b ij
Step11.将网销商品链接b ij下的各条评论的所有正向情感得分按从大到小排列,构成集合
Figure PCTCN2021074960-appb-000051
n c是该网销商品链接b ij下评论的总条数,并且
Figure PCTCN2021074960-appb-000052
Step11. Arrange all the positive sentiment scores of each comment under the online merchandise link b ij in descending order to form a set
Figure PCTCN2021074960-appb-000051
n c is the total number of comments under the online merchandise link b ij, and
Figure PCTCN2021074960-appb-000052
Step12.计算S +的中位数
Figure PCTCN2021074960-appb-000053
Step12. Calculate the median of S+
Figure PCTCN2021074960-appb-000053
Figure PCTCN2021074960-appb-000054
Figure PCTCN2021074960-appb-000054
Step13.计算上四分位数
Figure PCTCN2021074960-appb-000055
即集合
Figure PCTCN2021074960-appb-000056
的中位数,当m为偶数时k=m/2,当m为奇数时k=(m+1)/2;
Step13. Calculate the upper quartile
Figure PCTCN2021074960-appb-000055
Collection
Figure PCTCN2021074960-appb-000056
When m is an even number, k=m/2, when m is an odd number, k=(m+1)/2;
Step14.计算下四分位数
Figure PCTCN2021074960-appb-000057
即集合
Figure PCTCN2021074960-appb-000058
的中位数;
Step14. Calculate the lower quartile
Figure PCTCN2021074960-appb-000057
Collection
Figure PCTCN2021074960-appb-000058
Median
Step15.计算四分位距
Figure PCTCN2021074960-appb-000059
Step15. Calculate the interquartile range
Figure PCTCN2021074960-appb-000059
Step16.计算上边缘值
Figure PCTCN2021074960-appb-000060
Figure PCTCN2021074960-appb-000061
Step16. Calculate the upper edge value
Figure PCTCN2021074960-appb-000060
Figure PCTCN2021074960-appb-000061
Step17.计算下边缘值
Figure PCTCN2021074960-appb-000062
Figure PCTCN2021074960-appb-000063
Step17. Calculate the lower edge value
Figure PCTCN2021074960-appb-000062
Figure PCTCN2021074960-appb-000063
Step18.确定正向情感得分中的异常值
Figure PCTCN2021074960-appb-000064
Figure PCTCN2021074960-appb-000065
并去除。
Step18. Determine the outliers in the positive sentiment score
Figure PCTCN2021074960-appb-000064
Figure PCTCN2021074960-appb-000065
And remove.
2)去除网销商品链接b ij下的各条评论的所有负向情感得分中的异常值 2) Remove all the outliers in the negative sentiment scores of each comment under the online merchandise link b ij
Step21.将网销商品链接b ij下的各条评论的所有负向情感得分按从大到小 排列,构成集合
Figure PCTCN2021074960-appb-000066
并且
Figure PCTCN2021074960-appb-000067
Step21. Arrange all negative sentiment scores of each comment under the online merchandise link b ij in descending order to form a set
Figure PCTCN2021074960-appb-000066
and
Figure PCTCN2021074960-appb-000067
Step22.计算S -的中位数
Figure PCTCN2021074960-appb-000068
. Step22 calculated S - median
Figure PCTCN2021074960-appb-000068
Figure PCTCN2021074960-appb-000069
Figure PCTCN2021074960-appb-000069
Step23.计算上四分位数
Figure PCTCN2021074960-appb-000070
即集合
Figure PCTCN2021074960-appb-000071
的中位数,当m为偶数时k=m/2,当m为奇数时k=(m+1)/2;
Step23. Calculate the upper quartile
Figure PCTCN2021074960-appb-000070
Collection
Figure PCTCN2021074960-appb-000071
When m is an even number, k=m/2, when m is an odd number, k=(m+1)/2;
Step24.计算下四分位数
Figure PCTCN2021074960-appb-000072
即集合
Figure PCTCN2021074960-appb-000073
的中位数;
Step24. Calculate the lower quartile
Figure PCTCN2021074960-appb-000072
Collection
Figure PCTCN2021074960-appb-000073
Median
Step25.计算四分位距
Figure PCTCN2021074960-appb-000074
Step25. Calculate the interquartile range
Figure PCTCN2021074960-appb-000074
Step26.计算上边缘值
Figure PCTCN2021074960-appb-000075
Figure PCTCN2021074960-appb-000076
Step26. Calculate the upper edge value
Figure PCTCN2021074960-appb-000075
Figure PCTCN2021074960-appb-000076
Step27.计算下边缘值
Figure PCTCN2021074960-appb-000077
Figure PCTCN2021074960-appb-000078
Step27. Calculate the lower edge value
Figure PCTCN2021074960-appb-000077
Figure PCTCN2021074960-appb-000078
Step28.确定负向情感得分中的异常值
Figure PCTCN2021074960-appb-000079
Figure PCTCN2021074960-appb-000080
并去除。
Step28. Determine the outliers in the negative sentiment score
Figure PCTCN2021074960-appb-000079
Figure PCTCN2021074960-appb-000080
And remove.
当然,上述异常值去除也可通过现有技术或本领域常规技术手段中采取的其他异常值去除方法来实现。Of course, the above-mentioned abnormal value removal can also be achieved by other methods of removing abnormal values adopted in the prior art or conventional technical means in the art.
B3)基于各网销商品链接下各条评论的情感得分,计算得到各网销商品链接的商品情感得分:B3) Based on the emotional score of each comment under each online product link, calculate the product emotional score of each online product link:
在本优选实施例中,所述步骤B3)为:基于各网销商品链接下各条评论的情感得分,结合标准化和基于熵的加权处理方法,计算得到各网销商品链接的商品情感得分。在某些优选实施例中也可以说是,所述步骤B3)为:基于各网销商品链接下各条评论的情感得分,结合z-score标准化方法计算得到各网销商品链接的正、负向情感标准分,进而基于熵的加权处理方法,计算得到各网销商品链接的商品情感得分。具体的:In this preferred embodiment, the step B3) is: based on the emotional score of each comment under each online marketing product link, combined with standardization and entropy-based weighting processing methods, calculate the product emotional score of each online marketing product link. In some preferred embodiments, it can also be said that the step B3) is: based on the sentiment score of each comment under each online product link, combined with the z-score standardization method to calculate the positive and negative values of each online product link Based on the emotional standard score, and then based on the entropy weighted processing method, the product emotional score of each online sales product link is calculated. specific:
其中,某一网销商品链接b ij的商品情感得分为: Among them, the product sentiment score of a certain online product link b ij is:
Figure PCTCN2021074960-appb-000081
Figure PCTCN2021074960-appb-000081
其中,
Figure PCTCN2021074960-appb-000082
Figure PCTCN2021074960-appb-000083
分别为该网销商品链接b ij的正、负向情感标准分,本实施例 中使用z-score标准化方法计算得到:
in,
Figure PCTCN2021074960-appb-000082
with
Figure PCTCN2021074960-appb-000083
They are the positive and negative emotional standard scores of the online merchandise link b ij , which are calculated using the z-score standardization method in this embodiment:
Figure PCTCN2021074960-appb-000084
Figure PCTCN2021074960-appb-000084
Figure PCTCN2021074960-appb-000085
Figure PCTCN2021074960-appb-000085
其中,in,
n c是该网销商品链接b ij下评论的总条数; n c is the total number of comments under the online merchandise link b ij;
Figure PCTCN2021074960-appb-000086
Figure PCTCN2021074960-appb-000087
分别为该网销商品链接b ij第k条评论c k的正、负向情感得分;
Figure PCTCN2021074960-appb-000086
with
Figure PCTCN2021074960-appb-000087
Respectively are the positive and negative sentiment scores of the kth comment c k of the online merchandise link b ij;
Figure PCTCN2021074960-appb-000088
Figure PCTCN2021074960-appb-000089
分别为该网销商品链接b ij所有评论的正、负向情感得分的平均值;
Figure PCTCN2021074960-appb-000088
with
Figure PCTCN2021074960-appb-000089
Are the average values of the positive and negative sentiment scores of all comments on the online merchandise link b ij;
Figure PCTCN2021074960-appb-000090
Figure PCTCN2021074960-appb-000091
分别为该网销商品链接b ij所有评论的正、负向情感得分的标准差;
Figure PCTCN2021074960-appb-000090
with
Figure PCTCN2021074960-appb-000091
Are the standard deviations of the positive and negative sentiment scores of all comments on the online merchandise link b ij;
其中,α +和α -分别为正向权重和负向权重。本实施例中,α +和α -分别为基于熵值法计算得到的正向权重和负向权重(也可以说是熵值法求权重)。 Among them, α + and α - are the positive weight and the negative weight, respectively. In this embodiment, α + and α - are respectively the positive weight and the negative weight calculated based on the entropy method (it can also be said that the weight is calculated by the entropy method).
针对网销商品链接b ij,其中正向权重α +和负向权重α -通过如下步骤获得: For the online merchandise link b ij , the positive weight α + and the negative weight α - are obtained through the following steps:
K1)将网销商品链接b ij下的各条评论的正、负向情感得分分别进行min-max标准化处理,使结果映射到[0,1]区间,包括: K1) Perform min-max standardization on the positive and negative sentiment scores of each comment under the online merchandise link b ij to map the results to the [0,1] interval, including:
将网销商品链接b ij下所有评论中的正向情感得分转化为正向指标,其中,网销商品链接b ij第u条评论的正向指标
Figure PCTCN2021074960-appb-000092
为:
Convert the positive sentiment scores of all comments under the online merchandise link b ij into a positive indicator, where the online merchandise link b ij is the positive indicator of the uth comment
Figure PCTCN2021074960-appb-000092
for:
Figure PCTCN2021074960-appb-000093
Figure PCTCN2021074960-appb-000093
以及,将网销商品链接b ij下所有评论中的负向情感得分转化为负向指标,其中,网销商品链接b ij第u条评论的负向指标
Figure PCTCN2021074960-appb-000094
为:
And, transform the negative sentiment scores of all comments under the online marketing product link b ij into a negative indicator, where the online marketing product link b ij is the negative indicator of the uth comment
Figure PCTCN2021074960-appb-000094
for:
Figure PCTCN2021074960-appb-000095
Figure PCTCN2021074960-appb-000095
其中,u=1,2,…n cAmong them, u=1,2,...n c ;
Figure PCTCN2021074960-appb-000096
为网销商品链接b ij第u条评论的正向情感得分;Max(S +),Min(S +)分别为网销商品链接b ij所有评论中正向情感得分的最大值和最小值;
Figure PCTCN2021074960-appb-000096
Is the positive sentiment score of the uth comment of the online product link b ij ; Max(S + ) and Min(S + ) are the maximum and minimum values of the positive sentiment score of all reviews of the online product link b ij;
Figure PCTCN2021074960-appb-000097
为网销商品链接b ij第u条评论的负向情感得分;Max(S -),Min(S -)分别为 网销商品链接b ij所有评论中负向情感得分的最大值和最小值;
Figure PCTCN2021074960-appb-000097
Score of net sales product link b ij u review article negative emotions; Max (S -), Min (S -) were linked network of goods sold b ij review all negative emotion score maximum and minimum;
K2)计算网销商品链接b ij各条评论的正、负向指标的比重,其中,该网销商品链接b ij第u条评论的正、负向指标的比重分别为: Positive, negative gravity to index K2) calculated network link b ij of each pin product reviews, wherein the web link pin product u b ij article reviews the positive, negative gravity indicator are:
Figure PCTCN2021074960-appb-000098
Figure PCTCN2021074960-appb-000098
Figure PCTCN2021074960-appb-000099
Figure PCTCN2021074960-appb-000099
K3)计算网销商品链接b ij下所有评论的正向指标的熵值e +和负向指标的熵值e -K3) positive indicators calculated net sales in all product link b ij comment entropy e + and negative indicators of entropy e -:
Figure PCTCN2021074960-appb-000100
Figure PCTCN2021074960-appb-000100
Figure PCTCN2021074960-appb-000101
Figure PCTCN2021074960-appb-000101
K4)计算网销商品链接b ij下所有评论的正向指标的差异性系数g +和负向指标的差异性系数g -K4) the calculation of net sales product link b ij difference coefficient under all positive indicators of the comments of g + and negative indicators of differences in coefficient g -:
g +=1-e + g + =1-e +
g -=1-e - g - = 1-e -
K5)计算得到网销商品链接b ij的正向权重a +和负向权重a -K5) Calculate the positive weight a + and the negative weight a -of the online merchandise link b ij :
Figure PCTCN2021074960-appb-000102
Figure PCTCN2021074960-appb-000102
Figure PCTCN2021074960-appb-000103
Figure PCTCN2021074960-appb-000103
目前网销产品的评论数据中,大多差评占比较少,好评或默认好评占比较多,也存在较多刷单好评的现象,因此一方面往往能起到提醒作用的差评,大多掩埋在数量众多的好评中,另一方面,不同销售链接根据好评差评的占比给出的好评率也相差甚微,很难在接近或相同的好评率(或好评度)如98%和99%这样1~2个点的区分中,感知出产品或服务的差异性,对于一些打分制的评论体系,也难以在接近或相同的打分分值如4.8、4.9等评分中,区分感知出产品或服务的差异性。At present, in the review data of online sales products, most of the negative reviews accounted for less, the positive or default praise accounted for more, and there were also more frequent reviews of the order. Therefore, on the one hand, the negative reviews that often act as reminders are mostly buried in Among the large number of positive reviews, on the other hand, the positive rates given by different sales links based on the proportion of positive and negative reviews are also very different. It is difficult to get close or the same positive rate (or praise) such as 98% and 99%. In this way, in the distinction of 1 to 2 points, the difference of products or services can be perceived. For some scoring systems, it is difficult to distinguish products or services from similar or the same scoring value such as 4.8 and 4.9. Differences in services.
此处通过熵值和标准分的引入计算,可以大幅减弱刷单等现象带来的无用评价(如重复评价、套话评价、故意好评或恶意差评等)造成的数据影响和干扰, 将好评和差评综合考虑,并根据差评情况动态调整好评和差评的权重。相比现有的好评率、好评度、评分(如打分评分、星级评分或综合评分等)等,提供更有区分度和参考意义的情感倾向得分,更符合人们直观体验,可供客户选择商品时提供参考,在本实施例的应用场合中,更是作为检验、抽样工作的基础数据为质检抽样提供参考。Here, through the introduction and calculation of entropy and standard scores, the data impact and interference caused by useless evaluations (such as repeated evaluations, routine evaluations, deliberate praise or malicious bad reviews, etc.) caused by phenomena such as scalping can be greatly reduced, and the praise and The negative reviews are considered comprehensively, and the weights of positive and negative reviews are dynamically adjusted according to the situation of the negative reviews. Compared with the existing favorable ratings, favorable ratings, ratings (such as scoring ratings, star ratings, or comprehensive ratings, etc.), it provides a more differentiated and reference-meaning emotional tendency score, which is more in line with people's intuitive experience and can be selected by customers It provides a reference for the commodity. In the application of this embodiment, it is used as the basic data for inspection and sampling to provide a reference for quality inspection sampling.
此处通过一仿真实验来进行论证:随机选取若干网销商品链接b ij进行上述方法步骤的仿真,本仿真实验的示例中原始网销数据来源于天猫。 Here, a simulation experiment is used to demonstrate: a number of online sales commodity links b ij are randomly selected to simulate the above method steps. In the example of this simulation experiment, the original online sales data comes from Tmall.
图12、图13、图14和图15的横坐标均为所选取的若干网销商品链接b ij。图中的各网销商品链接的正向情感原始得分ScorePositive为对应网销商品链接下的所有评论的正向情感得分的均分,各网销商品链接的负向情感原始得分ScoreNegative为对应网销商品链接的所有评论的负向情感得分的均分。 The abscissas of Fig. 12, Fig. 13, Fig. 14 and Fig. 15 are all selected links b ij of online sales products. In the figure, the original positive sentiment score ScorePositive of each online sales product link is the average score of the positive sentiment scores of all reviews under the corresponding online sales product link, and the negative sentiment raw score ScoreNegative of each online sales product link is the corresponding online sales The average of the negative sentiment scores of all reviews of the product link.
以网销商品链接b ij举例说明,图中各网销商品链接进行标准化处理后的正向情感标准分z_ScorePositive,即对应文中网销商品链接b ij的正向情感标准分
Figure PCTCN2021074960-appb-000104
图中各网销商品链接进行标准化处理后的负向情感标准分z_ScoreNegative,即对应文中网销商品链接b ij的负向情感标准分
Figure PCTCN2021074960-appb-000105
图中经过标准化和基于熵的加权处理后(也可称为经过标准化和基于熵的方法加权后)的各网销商品链接b ij的商品情感得分z_Score,即对应文中网销商品链接b ij的商品情感得分x ij
Take the online merchandise link b ij as an example, the positive emotion standard score z_ScorePositive of each online merchandise link in the figure after standardized processing, that is, the positive emotional standard score corresponding to the online merchandise link b ij in the article
Figure PCTCN2021074960-appb-000104
In the figure, the negative emotional standard score z_ScoreNegative of each online product link after standardized processing is the negative emotional standard score corresponding to the online product link b ij in the text
Figure PCTCN2021074960-appb-000105
In the figure, the product sentiment score z_Score of each online sales product link b ij after normalization and entropy-based weighting processing (also called standardized and entropy-based weighting), which corresponds to the online sales product link b ij in the text Commodity sentiment score x ij .
图12为所选取的这些网销商品链接b ij未进行标准化处理的正向情感原始得分ScorePositive和负向情感原始得分ScoreNegative数据对比图,由图12可见:未进行标准化处理前,直接由用户评论计算出的正向情感原始得分ScorePositive和负向情感原始得分scorenegative差异较大,难以直接汇总。 Figure 12 is a comparison of the raw scores of positive emotions ScorePositive and ScoreNegative of these selected online merchandise links b ij that have not been standardized The calculated original score of positive emotion ScorePositive and the original score of negative emotion scorenegative are quite different, and it is difficult to aggregate directly.
图13为所选取的这些网销商品链接b ij未进行标准化处理的正向情感原始得分ScorePositive和进行标准化处理后的正向情感标准分z_ScorePositive的数据对比图。由图13可见:进行标准化处理之后的正向情感标准分z_ScorePositive,和未进行标准化处理的正向情感原始得分ScorePositive的趋势是一致的,保持差异和趋势性的同时,缩小了分值之间的跨度,同时也将正向得分与负向得分控制在相近数量级范围内,降低了正向情感得分的影响程度,便于正向和负向情感得分进行汇总,以便于在不同网销商品链接之间进行比较。 FIG. 13 is a data comparison diagram of the raw positive emotion scores ScorePositive of these selected online marketing product links b ij without normalization processing and the positive emotion standard score z_ScorePositive after normalization processing. It can be seen from Figure 13 that the standard score of positive emotion z_ScorePositive after normalization is consistent with the original score of positive emotion ScorePositive without normalization. While maintaining the difference and trend, the difference between the scores is reduced. At the same time, the positive score and the negative score are controlled within a similar range of magnitude, which reduces the degree of influence of the positive emotional score, and facilitates the aggregation of the positive and negative emotional scores, so as to facilitate the link between different online marketing products Compare.
图14为所选取的这些网销商品链接b ij未进行标准化处理的负向情感原始得分ScoreNegative和进行标准化处理后的负向情感标准分z_ScoreNegative的数据对比图。由图14可见:进行标准化处理之后的负向情感标准分z_ScoreNegative,和未进行标准化处理的负向情感原始得分ScoreNegative的趋势是一致的,但放大了不同网销商品链接得分的差异性,使负向效果更加突出,与正向情感标准分z_ScorePositive具有相同数量级,便于与正向情感得分进行汇总,以便于在不同网销商品链接之间进行比较。 FIG. 14 is a data comparison diagram of the negative emotion original scores ScoreNegative without standardization processing for these selected online marketing product links b ij and the negative emotion standard score z_ScoreNegative after standardization processing. It can be seen from Figure 14 that the negative sentiment standard score z_ScoreNegative after the standardization process has the same trend as the original negative sentiment score ScoreNegative that has not been standardized, but it magnifies the difference in the link scores of different online marketing products, making it negative The positive effect is more prominent, with the same order of magnitude as the positive emotion standard score z_ScorePositive, which is convenient for summarizing with the positive emotion score to facilitate comparison between different online sales product links.
图15为所选取的这些网销商品链接b ij的好评率Rate与经过标准化和基于熵的加权处理后(也可称为经过标准化和基于熵的方法加权后)的各网销商品链接b ij的商品情感得分z_Score的数据对比图。由图15可以明显看出:不同网销商品链接的标准分也即商品情感得分z_Score差异比较明显,比好评率Rate更有区分性,尤其是很多网销商品链接的好评率数值相同(如图中一些峰部平坦位置),但经过标准化和基于熵的加权处理后的商品情感得分的数值却明显不同,更好地反应了不同网销商品链接下的产品质量的差异。因此处仿真的原始数据来源于天猫,故此处好评率Rate为打分(如满分为5分的综合评分)所获得的分值。 FIG 15 is a positive feedback of the selected network link b ij of goods sold and Rate-based post-standardized and weighted entropy (and may also be referred to as a standardized method based on the weighted entropy) of each network link pin product b ij Data comparison chart of z_Score of product sentiment score. It can be clearly seen from Figure 15 that the standard scores of different online sales product links, that is, the product sentiment score z_Score, are significantly different, which is more distinguishing than the praise rate Rate, especially the value of the praise rate of many online sales product links is the same (as shown in the figure) Some of the peaks are flat), but the value of the product sentiment score after normalization and entropy-based weighting is significantly different, which better reflects the difference in product quality under different online product links. Therefore, the original data of the simulation comes from Tmall, so the favorable rate here is the score obtained by scoring (for example, a comprehensive score with a full score of 5).
此外,不同网站/网络销售平台对于好评率(好评度)可能有不同的打分体系,有些是满分为5分(如天猫),有些是满分为100%(如京东)。因此,不同网站/网络销售平台之间好评率很难直接比较。本方法以用户评论来计算各网销商品链接的商品情感得分z_Score,即使不同网站,得到的各网销商品链接的商品情感得分z_Score也具有相同的意义和数量级,可以在不同网络销售平台间直接比较。In addition, different websites/online sales platforms may have different scoring systems for the favorable rating (favorable rating), some with a full score of 5 (such as Tmall), and some with a full score of 100% (such as JD). Therefore, it is difficult to directly compare the positive ratings of different websites/online sales platforms. This method uses user reviews to calculate the product sentiment score z_Score of each online sales product link. Even if different websites, the product sentiment score z_Score of each online sales product link has the same meaning and order of magnitude, and it can be directly used between different online sales platforms. Compare.
B4)结合步骤B3)得到的各网销商品链接的商品情感得分和品牌数据,计算该同类商品下各品牌的情感得分:B4) Combining the product emotional score and brand data of each online sales product link obtained in step B3), calculate the emotional score of each brand under the same product:
其中该同类商品下某一品牌B i的情感得分为: Among them, the emotional score of a certain brand B i under the similar product is:
Figure PCTCN2021074960-appb-000106
Figure PCTCN2021074960-appb-000106
其中,x ij为网销商品链接b ij的商品情感得分; Among them, x ij is the product sentiment score of the online merchandise link b ij;
w j是该同类商品下品牌B i中网销商品链接b ij的商品销量比重(针对该同类商 品下的品牌B i,网销商品链接b ij占该同类商品下该品牌B i的所有网销商品链接的销量比重),
Figure PCTCN2021074960-appb-000107
n i为该同类商品下该品牌B i的所有网销商品链接的数量。
w j is the proportion of sales of goods of the same product under the brand B i in the net sales product link b ij of (for brand B i in the similar products, net sales product link b ij cent of all the net of the similar products of the brand B i of The proportion of the sales of the merchandise link),
Figure PCTCN2021074960-appb-000107
n i is the number of links to all online merchandise of the brand B i under the same kind of merchandise.
利用基于用户评论得到的情感得分/情感倾向性分析,转化为表示网销商品链接对应的商品或品牌质量优劣的概率,该概率基于用户购买或选用或体验的评论历史数据,称为先验概率。进一步利用先验概率,计算后续开展质量监控(如网销商品抽检)时对商品抽样检验的概率(称为入样概率)和商品抽样数量。本文所述入样概率,亦可称为后验概率或抽样概率。文中所述商品抽样数量,亦可称为抽样样本数。Using the sentiment score/ sentiment orientation analysis based on user reviews, it is transformed into a probability indicating the quality of the product or brand corresponding to the online product link. This probability is based on the review history data of the user's purchase or selection or experience, which is called a priori Probability. Further use the prior probability to calculate the probability of product sampling inspection (called the sampling probability) and the number of product samples during subsequent quality monitoring (such as online product sampling). The sampling probability described in this article can also be called the posterior probability or sampling probability. The sample number of commodities mentioned in the article can also be referred to as the number of sample samples.
其中针对某同类商品的抽样数据计算步骤包括:Among them, the calculation steps of sampling data for a similar product include:
C1)根据该同类商品下各网销商品链接的商品情感得分,结合品牌数据,计算得到该同类商品下各品牌的各个网销商品链接的商品先验概率,以及该同类商品下各品牌的先验概率:C1) According to the product sentiment score of each online product link of the same product, combined with brand data, calculate the prior probability of each online product link of each brand of the same product, and the priori probability of each brand of the same product. Test probability:
其中品牌B i下网销商品链接b ij的先验概率为: Among them, the prior probability of the online product link b ij of the brand B i is:
P(b ij)=(Max(x)-x ij)/(Max(x)-Min(x))×100% P(b ij )=(Max(x)-x ij )/(Max(x)-Min(x))×100%
其中,x ij为网销商品链接b ij的商品情感得分,Max(x)和Min(x)为该同类商品中品牌B i下所有网销商品链接中商品情感得分的最大值和最小值。 Among them, x ij is the product sentiment score of the online sales product link b ij , and Max(x) and Min(x) are the maximum and minimum product sentiment scores of all the online sales product links under the brand B i in the similar product.
各网销商品链接下的商品用户评价越差,该网销商品链接的商品情感得分就越低,先验概率就越大。反之网销商品链接下的商品用户评价越好,其对应的情商品情感得分就越高,先验概率就越低。The worse the product user evaluation under each online product link, the lower the product sentiment score of the online product link, and the greater the prior probability. Conversely, the better the product user evaluation under the online product link, the higher the emotional score of the corresponding emotional product, and the lower the prior probability.
其中品牌B i的先验概率为: Which a priori probability brand B i is:
Figure PCTCN2021074960-appb-000108
Figure PCTCN2021074960-appb-000108
其中,w j是该同类商品下品牌B i中网销商品链接b ij的商品销量比重(针对该同类商品下的品牌B i,网销商品链接b ij占该同类商品下该品牌B i的所有网销商品链接的销量比重),
Figure PCTCN2021074960-appb-000109
n i为该同类商品下该品牌B i的所有网销商品链接的数量。
Wherein, w j is the sales of goods proportion brand B i in the net sales product link b ij in the same product (for brand under the similar goods B i, net sales product link b ij representing this same product of the brand B i of Proportion of sales of all online merchandise links),
Figure PCTCN2021074960-appb-000109
n i is the number of links to all online merchandise of the brand B i under the same kind of merchandise.
C2)结合该同类商品下的品牌数据,确定该同类商品下各品牌的入样概率:C2) Combining the brand data of the similar product to determine the sampling probability of each brand of the similar product:
如图8所示,本方案采用分层抽样,主要分为两层:第一层是确定该同类商品下抽取的品牌和每个品牌的入样概率;第二层是确定该同类商品下销售各品牌的网销店铺和该同类商品下各品牌在各网销店铺中的入样概率。As shown in Figure 8, this plan adopts stratified sampling, which is mainly divided into two levels: the first level is to determine the brand selected under the same product and the sampling probability of each brand; the second level is to determine the sale of the same product The online sales stores of each brand and the sampling probability of each brand under the same product in each online sales store.
因为根据此前步骤计算所得的同类商品G下各品牌的先验概率之和不一定等于1,因此需进一步计算抽样时同类商品G下每个品牌和每个网销店铺分配到的概率,也即入样概率。Because the sum of the prior probabilities of each brand under the same product G calculated according to the previous steps is not necessarily equal to 1, it is necessary to further calculate the probability of each brand and each online store under the same product G when sampling, that is, Probability of sampling.
其中同类商品G下品牌B i的入样概率为: Among them, the sampling probability of brand B i under similar product G is:
Figure PCTCN2021074960-appb-000110
Figure PCTCN2021074960-appb-000110
其中,P(G|B i)为同类商品G下品牌B i的销售量比重,n b为同类商品G下所有品牌的数量。文中同类商品G的分类根据实际应用需要可大可小,例如同类商品G定位为空调,当然同类商品G也可以定位到立式空调或挂式空调等。但不属于同一类商品也即不同类别的商品,以及不同类别的网销商品链接对应的商品不能混在一起计算。 Among them, P(G|B i ) is the proportion of the sales volume of the brand B i under the similar product G , and n b is the number of all brands under the similar product G. The classification of the similar product G in the article can be large or small according to actual application needs. For example, the similar product G is positioned as an air conditioner. Of course, the similar product G can also be positioned as a vertical air conditioner or a wall-mounted air conditioner. However, products that do not belong to the same category, that is, products of different categories, and products corresponding to links of different categories of online merchandise cannot be mixed together for calculation.
同时,考虑到质检抽查等工作是一个持续性的工作,如国家有关管理部门会对产品质量进行定期和不定期的监督抽查,并将结果进行公布,而这些质检历史数据的信息,对进行后期质量监管或抽样检查具有重要的参考意义或指引提示作用。因此,在某些优选实施例中,本发明进一步结合质检历史数据,对质检历史数据中的不合格产品和/或品牌,提供了一种加严处理的方法以提高其入样概率。At the same time, considering that quality inspection and random inspection is a continuous work, for example, relevant national administrative departments will conduct regular and irregular supervision and random inspection of product quality, and publish the results, and the information of these quality inspection historical data is very important. Carrying out later quality supervision or sampling inspection has important reference significance or guiding function. Therefore, in some preferred embodiments, the present invention further combines the historical data of quality inspection to provide a method of tightening treatment for unqualified products and/or brands in the historical data of quality inspection to increase the probability of sampling.
加严具体策略如下:结合质检历史数据,如果同类商品G下的品牌B i、B i+1、…、B i+k在历史质检中如上年度监督抽查中质量不合格,则以优先数R5为比率对选定品牌B i、B i+1、…、B i+k的入样概率P(B i|G)、P(B i+1|G)、…、P(B i+k|G)进行加严。加严处理后,根据质检历史数据选定的不合格品牌对应的入样概率就会相对提高,即提高被抽中的概率,同时其他品牌的入样概率会相对降低。 The specific strategy for tightening is as follows: Combining the historical data of quality inspections, if the brands B i , B i+1 ,..., B i+k of the same product G under the historical quality inspection are unqualified in the previous year's supervision and random inspection, they will be given priority the ratio of the number of R5 is selected brand B i, B i + 1, ..., B i + k is the sample probability P (B i | G), P (B i + 1 | G), ..., P (B i +k |G) to tighten. After the stricter treatment, the sampling probability corresponding to the unqualified brand selected according to the quality inspection historical data will be relatively increased, that is, the probability of being selected will be increased, while the sampling probability of other brands will be relatively reduced.
因而在某些优选实施例中,通过所述步骤C2)确定该同类商品下各品牌的入样概率后,还包括加严归一化处理步骤,其包括如下步骤:Therefore, in some preferred embodiments, after the step C2) determines the sampling probability of each brand of the same product, the step of tightening and normalization is further included, which includes the following steps:
C2P1)结合质检历史数据,以优先数R5为比率对同类商品G下的选定品牌B i,B i+1,…,B i+h(即根据质检历史数据选定的不合格品牌)的入样概率进行加严 处理,进而对同类商品G下所有品牌的入样概率进行归一化处理: C2P1) Combining the historical data of quality inspection, use the priority number R5 as the ratio to compare the selected brands B i ,B i+1 ,…,B i+h under the similar product G (that is, the unqualified brand selected based on the historical data of quality inspection The sampling probability of) is tightened, and then the sampling probability of all brands under the same product G is normalized:
Figure PCTCN2021074960-appb-000111
Figure PCTCN2021074960-appb-000111
其中,in,
Figure PCTCN2021074960-appb-000112
Figure PCTCN2021074960-appb-000112
此处,n b为同类商品G下所有品牌的数量; Here, n b is the number of all brands under the same product G;
C2P2)更新同类商品G下所有品牌的入样概率为:C2P2) Update the sampling probability of all brands under similar product G as:
P(B k|G)=P′(B k|G) P(B k |G)=P′(B k |G)
其中,k∈[1,n b]。 Among them, k∈[1,n b ].
本实施例中,以优先数
Figure PCTCN2021074960-appb-000113
为比率,提高质检历史数据中的不合格品牌的入样概率。实际中可根据应用场合的实际需求对优先数进行调整。
In this embodiment, the priority number
Figure PCTCN2021074960-appb-000113
As a ratio, increase the sampling probability of unqualified brands in the historical data of quality inspection. In practice, the priority number can be adjusted according to the actual needs of the application.
C3)结合该同类商品下的网销店铺数据和该同类商品下各品牌的入样概率,确定该同类商品下各品牌在各网销店铺中的入样概率:C3) Combining the online store data of the similar product and the sampling probability of each brand under the same product, determine the sampling probability of each brand under the same product in each online sales store:
其中同类商品G下品牌B i在网销店铺T k的入样概率为: Among them, the sampling probability of the brand B i under the similar product G in the online store T k is:
Figure PCTCN2021074960-appb-000114
Figure PCTCN2021074960-appb-000114
其中,P(B i|T k)为同类商品G下品牌B i中网销店铺T k的销售量比重,n t为同类商品G下销售品牌B i的网销店铺的数量。 Wherein, P (B i | T k ) in the network for the brand B i T k pin proportion of sales in shops similar products G, n t is the number of net sales shops selling the same brand of product B i G.
C4)结合该同类商品的待抽取商品总数,确定该同类商品下各品牌的商品抽样数量:C4) Combining the total number of products to be sampled for the same product, determine the sample number of each brand of the same product:
其中待检同类商品G下品牌B i的商品抽样数量为: Among them, the sample number of products of the brand B i under the similar product G to be inspected is:
Figure PCTCN2021074960-appb-000115
Figure PCTCN2021074960-appb-000115
其中,M为待检同类商品G的待抽取商品总数,其中符号
Figure PCTCN2021074960-appb-000116
表示该符号中计算出的数字向下取整。
Among them, M is the total number of products to be sampled of similar products G to be inspected, where the symbol
Figure PCTCN2021074960-appb-000116
Indicates that the calculated number in this symbol is rounded down.
C5)根据该同类商品的待抽取商品总数,确定该同类商品下各品牌在各网销店铺中的商品抽样数量:C5) According to the total number of products to be sampled for the same product, determine the sample number of products of each brand under the same product in each online store:
其中待检同类商品G下品牌B i在网销店铺T j中的商品抽样数量为: Among them, the sample number of products of the brand B i in the online sales store T j under the similar product G to be inspected is:
Figure PCTCN2021074960-appb-000117
Figure PCTCN2021074960-appb-000117
其中符号
Figure PCTCN2021074960-appb-000118
表示该符号中计算出的数字向下取整。
Where the symbol
Figure PCTCN2021074960-appb-000118
Indicates that the calculated number in this symbol is rounded down.
以同类商品G是空调类为举例,对上述实施例的部分步骤予以实验示例说明:本实验示例中网销数据主要来源于天猫和京东,通过上述步骤B3)计算得到该同类商品(空调类)下各网销商品链接的商品情感得分;通过上述步骤B4)计算得到该同类商品(空调类)下各品牌的情感得分;通过上述步骤C1)计算得到该同类商品(空调类)下各品牌的各个网销商品链接的商品先验概率,以及该同类商品(空调类)各品牌的先验概率;通过上述步骤C2)确定该同类商品(空调类)下各品牌的入样概率;通过上述步骤C3)确定该同类商品(空调类)下各品牌在各网销店铺中的入样概率;进而通过上述步骤C4)确定该同类商品(空调类)下各品牌的商品抽样数量;进而通过上述步骤C5)确定该同类商品(空调类)下各品牌在各网销店铺中的商品抽样数量。Taking the similar product G as an air conditioner as an example, some steps of the above embodiment are given experimental examples to illustrate: the online sales data in this experimental example mainly comes from Tmall and JD. Through the above step B3), the calculation of this similar product (air conditioner) ) The emotional score of each online product link; calculate the emotional score of each brand of the similar product (air conditioner) through the above step B4); calculate the emotional score of each brand of the similar product (air conditioner) through the above step C1) The priori probability of each online-sold product link and the priori probability of each brand of the similar product (air conditioner); through the above step C2) determine the sampling probability of each brand under the same product (air conditioner); through the above Step C3) Determine the sampling probability of each brand of the same type of product (air conditioner) in each online store; then use the above step C4) to determine the sample quantity of each brand of the same type of product (air conditioner); and then pass the above Step C5) Determine the sample quantity of each brand in each online store for the same type of product (air conditioner).
图9给出了该同类商品(空调类)下各品牌的情感得分(即图中的品牌情感得分、情感总得分)与各品牌的先验概率的对照图(横坐标为各品牌),由图9可见,总体趋势上各品牌的情感得分越低,则该品牌的先验概率越高。Figure 9 shows the comparison of the emotional scores of each brand (ie the brand emotional score and the emotional total score in the figure) and the prior probability of each brand (the abscissa is each brand) under the same product (air-conditioning category). Figure 9 shows that the lower the emotional score of each brand in the overall trend, the higher the prior probability of the brand.
图10给出了该同类商品(空调类)下各品牌的情感得分、各品牌的先验概率、各品牌的入样概率(无加严归一化处理)的对照图(横坐标为各品牌),由图10可见,总体趋势上各品牌的情感得分越低,则该品牌的先验概率越高,与此同时,各品牌的入样概率受先验概率和各品牌的销售量的双重影响,某些品牌用户评论很好,先验概率会相对较低,但如果销量很高(如图中的奥克斯、格力、美的等),则其对应的入样概率也会增加。也就是说用户评论差的、或者用户购买多的,都需要重点抽检。另,各附图中的抽样概率即为入样概率,亦可称为后验概率。Figure 10 shows the comparison chart of the emotional score of each brand, the prior probability of each brand, and the sampling probability of each brand (without strict normalization processing) under the same product (air-conditioning category) (the abscissa is each brand) ), it can be seen from Figure 10 that the lower the emotional score of each brand in the overall trend, the higher the prior probability of the brand. At the same time, the sampling probability of each brand is subject to the double of the prior probability and the sales volume of each brand. Impact, some brand user reviews are very good, the prior probability will be relatively low, but if the sales volume is high (such as Oaks, Gree, Midea, etc. in the figure), the corresponding sampling probability will also increase. That is to say, if the user reviews are poor, or the users buy a lot, it needs to be spot-checked. In addition, the sampling probability in each figure is the sampling probability, which can also be called the posterior probability.
为说明加严处理策略,针对同类商品(空调类),本示例结合2018年上海市家用空调器产品质量监督抽查结果中公示的质检历史数据(数据来源:上海质量技术监督官网-信息中心-公告栏-抽查报告-《2018年上海市家用空调器产品质量监督抽查结果》,网站链接:http://shzj.scjgj.sh.gov.cn/art/2018/9/4/art_358_1325245.html),该公示数据中显示此次商品品牌MBO下的产品抽查结果为不合格(此处仅用于举例说明)(另说明:本示例中的基础网销数据涉及该公示数据中的品牌MBO,不涉及该公示 数据中的另外两个不合格品牌),将品牌MBO列为选定品牌对其入样概率进行加严处理,进而对同类商品(空调类)下所有品牌的入样概率进行归一化处理,经过步骤C2P1)和C2P2)的处理后,实现了对质检历史数据中的不合格产品和/或品牌进行加严处理的抽样数据的调整,使得后续的质检抽样更为科学合理且具有延续性。In order to illustrate the stricter treatment strategy, for similar products (air conditioners), this example combines the quality inspection historical data published in the 2018 Shanghai household air conditioner product quality supervision and random inspection results (data source: Shanghai Quality and Technical Supervision Official Website-Information Center- Bulletin Board-Spot Check Report-"2018 Shanghai Household Air Conditioner Product Quality Supervision and Spot Check Results", website link: http://shzj.scjgj.sh.gov.cn/art/2018/9/4/art_358_1325245.html) , The publicity data shows that the results of the product spot check under the product brand MBO are unqualified (here only for illustration) (another note: the basic online marketing data in this example relates to the brand MBO in the publicity data, not Involving the other two unqualified brands in the public data), the brand MBO is listed as the selected brand and its sampling probability will be tightened, and then the sampling probability of all brands under the same product (air-conditioning category) will be normalized After the processing of steps C2P1) and C2P2), the adjustment of sampling data for strict processing of substandard products and/or brands in the quality inspection history data is realized, making the subsequent quality inspection sampling more scientific and reasonable And with continuity.
图11给出了该同类商品(空调类)下各品牌的入样概率在加严归一化处理前后的对照图(横坐标为各品牌),由图11可见,加严归一化处理后,该同类商品(空调类)下选定品牌MBO的入样概率相对之前有所提升,但同时,由于MBO品牌空调的销量不高,因此其入样概率的变化并不突兀或明显,其是各种因素均衡后的合理调整。Figure 11 shows the comparison chart of the sample probability of each brand under the same product (air conditioner category) before and after the strict normalization treatment (the abscissa is each brand), as can be seen from Figure 11, after the strict normalization treatment , The sampling probability of the selected brand MBO under this similar product (air conditioner category) has increased compared to before, but at the same time, since the sales volume of MBO brand air conditioners is not high, the change in sampling probability is not abrupt or obvious. Reasonable adjustment after various factors are balanced.
以上仅是本发明的优选实施方式,应当指出以上实施列对本发明不构成限定,相关工作人员在不偏离本发明技术思想的范围内,所进行的多样变化和修改,均落在本发明的保护范围内。The above are only the preferred embodiments of the present invention. It should be pointed out that the above implementation list does not limit the present invention. Various changes and modifications made by relevant staff within the scope of the technical idea of the present invention fall under the protection of the present invention. Within range.

Claims (10)

  1. 一种基于大数据的网销商品检验抽样方法,其特征在于:该方法包括情感得分计算步骤和抽样数据计算步骤;A big data-based inspection and sampling method for online sales of goods, characterized in that: the method includes an emotion score calculation step and a sampling data calculation step;
    其中针对某同类商品的情感得分计算步骤包括:The steps for calculating the emotional score for a similar product include:
    B1)采集网络平台上属于该同类商品的网销商品链接及其对应的包括品牌数据、评论数据、销售量数据在内的数据信息;B1) Collect the online merchandise links of the same kind of merchandise on the web platform and their corresponding data information including brand data, review data, and sales volume data;
    B2)通过基于评论分析词典的情感倾向分析方法,对步骤B1)采集的各网销商品链接的各条评论进行情感分析,计算得到各网销商品链接下各条评论的情感得分;B2) Through the sentiment analysis method based on the comment analysis dictionary, perform sentiment analysis on each comment of each online product link collected in step B1), and calculate the sentiment score of each comment under each online product link;
    B3)基于各网销商品链接下各条评论的情感得分,计算得到各网销商品链接的商品情感得分;B3) Based on the emotional score of each comment under each online product link, calculate the product emotional score of each online product link;
    其中针对某同类商品的抽样数据计算步骤包括:Among them, the calculation steps of sampling data for a similar product include:
    C1)根据该同类商品下各网销商品链接的商品情感得分,结合品牌数据,计算得到该同类商品下各品牌的各个网销商品链接的商品先验概率,以及该同类商品下各品牌的先验概率;C1) According to the product sentiment score of each online product link of the same product, combined with brand data, calculate the prior probability of each online product link of each brand of the same product, and the priori probability of each brand of the same product. Probability
    C2)结合该同类商品下的品牌数据,确定该同类商品下各品牌的入样概率;C2) Combining the brand data under the similar product to determine the sampling probability of each brand under the similar product;
    C4)结合该同类商品的待抽取商品总数,确定该同类商品下各品牌的商品抽样数量。C4) Combining the total number of products to be sampled for the same type of product, determine the number of products to be sampled for each brand of the same type of product.
  2. 根据权利要求1所述的基于大数据的网销商品检验抽样方法,其特征在于:所述步骤B1)为:采集网络平台上属于该同类商品的网销商品链接及其对应的包括品牌数据、网销店铺数据、评论数据、销售量数据在内的数据信息;The big data-based inspection and sampling method for online sales of goods according to claim 1, characterized in that: said step B1) is: collecting online sales of goods belonging to the same kind of goods on the online platform and the corresponding links including brand data, Data information including online store data, review data, and sales volume data;
    所述步骤C2)后还包括步骤C3):结合该同类商品下的网销店铺数据和该同类商品下各品牌的入样概率,确定该同类商品下各品牌在各网销店铺中的入样概率;After the step C2), it also includes step C3): combining the online store data of the similar product and the sampling probability of each brand under the similar product to determine the sample entry of each brand under the similar product in each online sales store Probability
    所述步骤C4)后还包括步骤C5):根据该同类商品的待抽取商品总数,确定该同类商品下各品牌在各网销店铺中的商品抽样数量。After the step C4), it also includes step C5): according to the total number of products to be sampled for the similar product, determine the sample quantity of each brand of the same product in each online sales store.
  3. 根据权利要求1所述的基于大数据的网销商品检验抽样方法,其特征在于:该方法还包括初始化步骤;其中初始化步骤包括:A0)基于网络平台上用户对商品的多源评论数据,构建和/或更新商品的评论分析词典的步骤;The big data-based online commodity inspection sampling method according to claim 1, characterized in that: the method further includes an initialization step; wherein the initialization step includes: A0) Based on the user’s multi-source comment data on the commodity on the network platform, construct And/or the steps to update the product review analysis dictionary;
    所述评论分析词典包括情感词词典、否定词词典、程度词词典和/或停用 词词典;The comment analysis dictionary includes an emotional word dictionary, a negative word dictionary, a degree word dictionary, and/or a stop word dictionary;
    其中情感词词典中包括若干情感词以及各情感词对应的情感词分值;The emotional word dictionary includes a number of emotional words and the corresponding emotional word score of each emotional word;
    其中否定词词典中包括若干否定词;The negative word dictionary includes several negative words;
    其中程度词词典中包括若干程度词以及各程度词对应的程度词分值;The degree word dictionary includes several degree words and the corresponding degree word scores of each degree word;
    其中停用词词典中包括若干停用词。The stop word dictionary includes several stop words.
  4. 根据权利要求1所述的基于大数据的网销商品检验抽样方法,其特征在于:所述B2)通过基于评论分析词典的情感倾向分析方法,对步骤B1)采集的各网销商品链接的各条评论进行情感分析,计算得到各网销商品链接下各条评论的情感得分的步骤中,基于评论分析词典的情感倾向分析方法对某一网销商品链接b ij下的某条评论进行情感分析计算得到该条评论的情感得分包括如下步骤: The big data-based inspection and sampling method for online sales of goods according to claim 1, characterized in that: said B2) through the sentiment analysis method based on the comment analysis dictionary, for each of the links of the online sales collected in step B1) Sentiment analysis is performed on each comment, and in the step of calculating the sentiment score of each comment under each online product link, sentiment analysis is performed on a certain comment under a certain online product link b ij based on the emotional tendency analysis method of the comment analysis dictionary Calculating the sentiment score of the comment includes the following steps:
    B21)子句分割:根据标点符号将条评论对应的评论文本c,拆分为若干子句
    Figure PCTCN2021074960-appb-100001
    B21) Clause segmentation: According to punctuation, the comment text c corresponding to a comment is divided into several clauses
    Figure PCTCN2021074960-appb-100001
    B22)修饰关系分析:根据评论分析词典,针对每个子句,识别该子句中的情感词(a 1,a 2,…)、程度词(d 1,d 2,…)、否定词(h 1,h 2,…)和停用词,并记录其位置;结合停用词确定各程度词、否定词所修饰的目标情感词,并结合评论分析词典中对应的程度词分值和情感词分值,以及否定词的数量,确定该子句中各程度词、否定词与目标情感词之间的修饰关系; B22) Modification relationship analysis: According to the comment analysis dictionary, for each clause, identify the emotional words (a 1 ,a 2 ,...), degree words (d 1 ,d 2 ,...), negative words (h 1 ,h 2 ,...) and stop words, and record their positions; combine the stop words to determine the target sentiment words modified by each degree word and negative word, and analyze the corresponding degree word scores and sentiment words in the dictionary with comments The score and the number of negative words determine the modification relationship between each degree word, negative word and target emotional word in the clause;
    B23)各子句情感得分计算:根据获得的修饰关系,确定各个子句的情感得分,其中子句c i的情感得分为: B23) Calculation of the emotional score of each clause: Determine the emotional score of each clause according to the obtained modification relationship, where the emotional score of the clause c i is:
    Figure PCTCN2021074960-appb-100002
    Figure PCTCN2021074960-appb-100002
    其中,|H|表示否定词出现的次数,D表示程度词分值,
    Figure PCTCN2021074960-appb-100003
    表示情感词w k的情感词分值,n w表示子句c i中情感词出现次数;其中,对s i为正值的子句c i的情感得分用子句正向情感得分
    Figure PCTCN2021074960-appb-100004
    表示,对s i为负值的子句c i的情感得分用子句负向情感得分用
    Figure PCTCN2021074960-appb-100005
    表示;
    Among them, |H| represents the number of times the negative word appears, D represents the score of the degree word,
    Figure PCTCN2021074960-appb-100003
    Represents the emotional word score of the emotional word w k , n w represents the number of occurrences of the emotional word in the clause c i ; among them, the emotional score of the clause c i where s i is a positive value uses the clause positive emotion score
    Figure PCTCN2021074960-appb-100004
    Indicates that the emotional score of the clause c i whose s i is negative is used for the negative emotional score of the clause
    Figure PCTCN2021074960-appb-100005
    Express;
    B24)该条评论情感得分计算:针对该条评论对应的评论文本c,将其所有子句中的子句正向情感得分进行累加,得到该条评论的正向情感得分s +,将其所有子句中的子句负向情感得分进行累加,得到该条评论的负向情感得分s -B24) Calculation of the sentiment score of the comment: For the comment text c corresponding to the comment, add up the positive sentiment scores of all clauses to obtain the positive sentiment score s + of the comment, and all The negative sentiment scores of the clauses in the clause are accumulated to obtain the negative sentiment score s -of the comment:
    Figure PCTCN2021074960-appb-100006
    Figure PCTCN2021074960-appb-100006
    Figure PCTCN2021074960-appb-100007
    Figure PCTCN2021074960-appb-100007
    其中,m c表示评论c的子句数量。 Among them, m c represents the number of clauses of comment c.
  5. 根据权利要求4所述的基于大数据的网销商品检验抽样方法,其特征在于:所述步骤B3)为:基于各网销商品链接下各条评论的情感得分,结合标准化和基于熵的加权处理方法,计算得到各网销商品链接的商品情感得分;The online product inspection sampling method based on big data according to claim 4, characterized in that: the step B3) is: based on the sentiment score of each comment under each online product link, combined with standardization and entropy-based weighting Processing method, calculate the emotional score of each online merchandise link;
    其中,某一网销商品链接b ij的商品情感得分为: Among them, the product sentiment score of a certain online product link b ij is:
    Figure PCTCN2021074960-appb-100008
    Figure PCTCN2021074960-appb-100008
    其中,
    Figure PCTCN2021074960-appb-100009
    Figure PCTCN2021074960-appb-100010
    分别为该网销商品链接b ij的正、负向情感标准分:
    in,
    Figure PCTCN2021074960-appb-100009
    with
    Figure PCTCN2021074960-appb-100010
    These are the positive and negative emotional standard scores of the online merchandise link b ij:
    Figure PCTCN2021074960-appb-100011
    Figure PCTCN2021074960-appb-100011
    Figure PCTCN2021074960-appb-100012
    Figure PCTCN2021074960-appb-100012
    其中,in,
    n c是该网销商品链接b ij下评论的总条数; n c is the total number of comments under the online merchandise link b ij;
    Figure PCTCN2021074960-appb-100013
    Figure PCTCN2021074960-appb-100014
    分别为该网销商品链接b ij第k条评论c k的正、负向情感得分;
    Figure PCTCN2021074960-appb-100013
    with
    Figure PCTCN2021074960-appb-100014
    Respectively are the positive and negative sentiment scores of the kth comment c k of the online merchandise link b ij;
    Figure PCTCN2021074960-appb-100015
    Figure PCTCN2021074960-appb-100016
    分别为该网销商品链接b ij所有评论的正、负向情感得分的平均值;
    Figure PCTCN2021074960-appb-100015
    with
    Figure PCTCN2021074960-appb-100016
    Are the average values of the positive and negative sentiment scores of all comments on the online merchandise link b ij;
    Figure PCTCN2021074960-appb-100017
    Figure PCTCN2021074960-appb-100018
    分别为该网销商品链接b ij所有评论的正、负向情感得分的标准差;
    Figure PCTCN2021074960-appb-100017
    with
    Figure PCTCN2021074960-appb-100018
    Are the standard deviations of the positive and negative sentiment scores of all comments on the online merchandise link b ij;
    其中,α +和α -分别为正向权重和负向权重。 Among them, α + and α - are the positive weight and the negative weight, respectively.
  6. 根据权利要求5所述的基于大数据的网销商品检验抽样方法,其特征在于:针对网销商品链接b ij,所述正向权重α +和负向权重α -通过如下步骤获得: The big data-based inspection and sampling method for online sales of goods according to claim 5, characterized in that: for the online sales of goods link b ij , the positive weight α + and the negative weight α - are obtained by the following steps:
    K1)将网销商品链接b ij下的各条评论的正、负向情感得分分别进行min-max标准化处理,使结果映射到[0,1]区间,包括: K1) Perform min-max standardization on the positive and negative sentiment scores of each comment under the online merchandise link b ij to map the results to the [0,1] interval, including:
    将网销商品链接b ij下所有评论中的正向情感得分转化为正向指标,其中, 网销商品链接b ij第u条评论的正向指标
    Figure PCTCN2021074960-appb-100019
    为:
    Convert the positive sentiment scores of all comments under the online merchandise link b ij into a positive indicator, where the online merchandise link b ij is the positive indicator of the uth comment
    Figure PCTCN2021074960-appb-100019
    for:
    Figure PCTCN2021074960-appb-100020
    Figure PCTCN2021074960-appb-100020
    以及,将网销商品链接b ij下所有评论中的负向情感得分转化为负向指标,其中,网销商品链接b ij第u条评论的负向指标
    Figure PCTCN2021074960-appb-100021
    为:
    And, transform the negative sentiment scores of all comments under the online marketing product link b ij into a negative indicator, where the online marketing product link b ij is the negative indicator of the uth comment
    Figure PCTCN2021074960-appb-100021
    for:
    Figure PCTCN2021074960-appb-100022
    Figure PCTCN2021074960-appb-100022
    其中,u=1,2,…n cAmong them, u=1,2,...n c ;
    Figure PCTCN2021074960-appb-100023
    为网销商品链接b ij第u条评论的正向情感得分;Max(S +),Min(S +)分别为网销商品链接b ij所有评论中正向情感得分的最大值和最小值;
    Figure PCTCN2021074960-appb-100023
    Is the positive sentiment score of the uth comment of the online product link b ij ; Max(S + ) and Min(S + ) are the maximum and minimum values of the positive sentiment score of all reviews of the online product link b ij;
    Figure PCTCN2021074960-appb-100024
    为网销商品链接b ij第u条评论的负向情感得分;Max(S -),Min(S -)分别为网销商品链接b ij所有评论中负向情感得分的最大值和最小值;
    Figure PCTCN2021074960-appb-100024
    Score of net sales product link b ij u review article negative emotions; Max (S -), Min (S -) were linked network of goods sold b ij review all negative emotion score maximum and minimum;
    K2)计算网销商品链接b ij各条评论的正、负向指标的比重,其中,该网销商品链接b ij第u条评论的正、负向指标的比重分别为: Positive, negative gravity to index K2) calculated network link b ij of each pin product reviews, wherein the web link pin product u b ij article reviews the positive, negative gravity indicator are:
    Figure PCTCN2021074960-appb-100025
    Figure PCTCN2021074960-appb-100025
    Figure PCTCN2021074960-appb-100026
    Figure PCTCN2021074960-appb-100026
    K3)计算网销商品链接b ij下所有评论的正向指标的熵值e +和负向指标的熵值e -K3) positive indicators calculated net sales in all product link b ij comment entropy e + and negative indicators of entropy e -:
    Figure PCTCN2021074960-appb-100027
    Figure PCTCN2021074960-appb-100027
    Figure PCTCN2021074960-appb-100028
    Figure PCTCN2021074960-appb-100028
    K4)计算网销商品链接b ij下所有评论的正向指标的差异性系数g +和负向指标的差异性系数g -K4) the calculation of net sales product link b ij difference coefficient under all positive indicators of the comments of g + and negative indicators of differences in coefficient g -:
    g +=1-e + g + =1-e +
    g -=1-e - g - = 1-e -
    K5)计算得到网销商品链接b ij的正向权重a +和负向权重a -K5) Calculate the positive weight a + and the negative weight a -of the online merchandise link b ij :
    Figure PCTCN2021074960-appb-100029
    Figure PCTCN2021074960-appb-100029
    Figure PCTCN2021074960-appb-100030
    Figure PCTCN2021074960-appb-100030
  7. 根据权利要求5所述的基于大数据的网销商品检验抽样方法,其特征在于:所述步骤C1)根据该同类商品下各网销商品链接的商品情感得分,结合品牌数据,计算得到该同类商品下各品牌的各个网销商品链接的商品先验概率,以及该同类商品下各品牌的先验概率中,The big data-based inspection and sampling method for online sales of goods according to claim 5, characterized in that: said step C1) according to the product sentiment score of each online sales product link of the same type of goods, combined with brand data, calculates the same type of product. Among the prior probability of each online product link of each brand under the product, and the prior probability of each brand under the same product,
    品牌B i下网销商品链接b ij的先验概率为: The prior probability of the online product link b ij of the brand B i is:
    P(b ij)=(Max(x)-x ij)/(Max(x)-Min(x))×100% P(b ij )=(Max(x)-x ij )/(Max(x)-Min(x))×100%
    其中,x ij为网销商品链接b ij的商品情感得分,Max(x)和Min(x)为该同类商品中品牌B i下所有网销商品链接中商品情感得分的最大值和最小值; Among them, x ij is the product sentiment score of the online product link b ij , and Max(x) and Min(x) are the maximum and minimum product sentiment scores of all the online product links under the brand B i in the similar product;
    品牌B i的先验概率为: Priori probability brand B i is:
    Figure PCTCN2021074960-appb-100031
    Figure PCTCN2021074960-appb-100031
    其中,w j是该同类商品下品牌B i中网销商品链接b ij的商品销量比重,
    Figure PCTCN2021074960-appb-100032
    n i为该同类商品下该品牌B i的所有网销商品链接的数量。
    Among them, w j is the proportion of the product sales of the online product link b ij of the brand B i under the similar product,
    Figure PCTCN2021074960-appb-100032
    n i is the number of links to all online merchandise of the brand B i under the same kind of merchandise.
  8. 根据权利要求7所述的基于大数据的网销商品检验抽样方法,其特征在于:所述步骤C2)确定该同类商品下各品牌的入样概率中,The big data-based inspection and sampling method for online sales of goods according to claim 7, wherein the step C2) determines the sampling probability of each brand of the same type of goods,
    同类商品G下品牌B i的入样概率为: The sampling probability of brand B i under similar product G is:
    Figure PCTCN2021074960-appb-100033
    Figure PCTCN2021074960-appb-100033
    其中,P(G|B i)为同类商品G下品牌B i的销售量比重,n b为同类商品G下所有品牌的数量; Among them, P(G|B i ) is the proportion of the sales volume of the brand B i under the same product G , and n b is the number of all brands under the same product G;
    所述步骤C4)结合该同类商品的待抽取商品总数,确定该同类商品下各品牌的商品抽样数量中,Said step C4) combined with the total number of products to be sampled of the same kind of commodities, determine the sample quantity of each brand of the same kind of commodities,
    待检同类商品G下品牌B i的商品抽样数量为: The sample quantity of products of the brand B i under the similar product G to be inspected is:
    Figure PCTCN2021074960-appb-100034
    Figure PCTCN2021074960-appb-100034
    其中,M为待检同类商品G的待抽取商品总数,其中符号
    Figure PCTCN2021074960-appb-100035
    表示该符号中计算出的数字向下取整。
    Among them, M is the total number of products to be sampled of similar products G to be inspected, where the symbol
    Figure PCTCN2021074960-appb-100035
    Indicates that the calculated number in this symbol is rounded down.
  9. 根据权利要求8所述的基于大数据的网销商品检验抽样方法,其特征在于:所述步骤C3)结合该同类商品下的网销店铺数据和该同类商品下各品牌的入样概率,确定该同类商品下各品牌在各网销店铺中的入样概率中,The big data-based inspection and sampling method for online sales of goods according to claim 8, characterized in that: said step C3) combines the online store data of the same type of goods and the sampling probability of each brand of the same type of goods to determine Among the sampling probabilities of each brand under the same product in each online store,
    同类商品G下品牌B i在网销店铺T k的入样概率为: The sampling probability of the brand B i under the similar product G in the online store T k is:
    Figure PCTCN2021074960-appb-100036
    Figure PCTCN2021074960-appb-100036
    其中,P(B i|T k)为同类商品G下品牌B i中网销店铺T k的销售量比重,n t为同类商品G下销售品牌B i的网销店铺的数量; Wherein, P (B i | T k ) in the network for the brand B i T k pin proportion of sales in shops similar products G, n t is the number of net sales shops selling the same brand of product G to B i;
    所述步骤C5)根据该同类商品的待抽取商品总数,确定该同类商品下各品牌在各网销店铺中的商品抽样数量中,Said step C5) according to the total number of products to be sampled for the same product, determine the sample number of products of each brand under the same product in each online sales store,
    待检同类商品G下品牌B i在网销店铺T j中的商品抽样数量为: The sample quantity of products of the brand B i in the online sales store T j under the similar product G to be inspected is:
    Figure PCTCN2021074960-appb-100037
    Figure PCTCN2021074960-appb-100037
    其中符号
    Figure PCTCN2021074960-appb-100038
    表示该符号中计算出的数字向下取整。
    Where the symbol
    Figure PCTCN2021074960-appb-100038
    Indicates that the calculated number in this symbol is rounded down.
  10. 根据权利要求8所述的基于大数据的网销商品检验抽样方法,其特征在于:通过所述步骤C2)确定该同类商品下各品牌的入样概率后,还包括如下步骤:The big data-based inspection and sampling method for online sales of goods according to claim 8, characterized in that: after the sampling probability of each brand of the same type of goods is determined through the step C2), the method further comprises the following steps:
    C2P1)结合质检历史数据,以优先数R5为比率对同类商品G下的选定品牌B i,B i+1,…,B i+h的入样概率进行加严处理,进而对同类商品G下所有品牌的入样概率进行归一化处理: C2P1) Combining the historical data of quality inspection, use the priority number R5 as the ratio to strictly deal with the sampling probability of selected brands B i , B i+1 ,..., B i+h under similar product G, and then treat similar products The sampling probability of all brands under G is normalized:
    Figure PCTCN2021074960-appb-100039
    Figure PCTCN2021074960-appb-100039
    其中,in,
    Figure PCTCN2021074960-appb-100040
    Figure PCTCN2021074960-appb-100040
    此处,n b为同类商品G下所有品牌的数量; Here, n b is the number of all brands under the same product G;
    C2P2)更新同类商品G下所有品牌的入样概率为:C2P2) Update the sampling probability of all brands under similar product G as:
    P(B k|G)=P′(B k|G) P(B k |G)=P′(B k |G)
    其中,k∈[1,n b]。 Among them, k∈[1,n b ].
PCT/CN2021/074960 2020-05-21 2021-02-03 Big data-based online sales commodity sampling and testing method WO2021232856A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010437558.1A CN111612340B (en) 2020-05-21 2020-05-21 Big data-based network sales commodity inspection sampling method
CN202010437558.1 2020-05-21

Publications (1)

Publication Number Publication Date
WO2021232856A1 true WO2021232856A1 (en) 2021-11-25

Family

ID=72201759

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074960 WO2021232856A1 (en) 2020-05-21 2021-02-03 Big data-based online sales commodity sampling and testing method

Country Status (2)

Country Link
CN (1) CN111612340B (en)
WO (1) WO2021232856A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114626899A (en) * 2022-05-13 2022-06-14 南京铋悠数据技术有限公司 Product sales data acquisition method and system based on big data
CN115293861A (en) * 2022-10-09 2022-11-04 连连银通电子支付有限公司 Commodity identification method and device, electronic equipment and storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612340B (en) * 2020-05-21 2023-10-17 中国标准化研究院 Big data-based network sales commodity inspection sampling method
CN114757587B (en) * 2022-06-13 2022-09-30 深圳市玄羽科技有限公司 Product quality control system and method based on big data
CN116304538B (en) * 2023-05-19 2023-07-21 中国标准化研究院 Method for evaluating uncertainty of detection result by using big data
CN116757560B (en) * 2023-08-22 2023-10-13 中国标准化研究院 Intelligent quality inspection method for large data set data
CN117634988B (en) * 2024-01-25 2024-04-12 中国标准化研究院 Commodity qualification sampling inspection method and system based on priori information

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977798A (en) * 2017-12-21 2018-05-01 中国计量大学 A kind of risk evaluating method of e-commerce product quality
CN109345272A (en) * 2018-11-28 2019-02-15 中国计量大学 One kind is based on the markovian shop credit risk forecast method of improvement
US20190318407A1 (en) * 2015-07-17 2019-10-17 Devanathan GIRIDHARI Method for product search using the user-weighted, attribute-based, sort-ordering and system thereof
CN110515982A (en) * 2019-07-17 2019-11-29 金蝶软件(中国)有限公司 Inspect method, apparatus, computer equipment and storage medium by random samples
CN110555596A (en) * 2019-08-09 2019-12-10 国网陕西省电力公司电力科学研究院 sampling inspection strategy making method and system based on power distribution material quality evaluation
CN111612340A (en) * 2020-05-21 2020-09-01 中国标准化研究院 Network commodity inspection sampling method based on big data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069647A (en) * 2015-07-30 2015-11-18 齐鲁工业大学 Improved method for extracting evaluation object in Chinese commodity review
CN108491377B (en) * 2018-03-06 2021-10-08 中国计量大学 E-commerce product comprehensive scoring method based on multi-dimensional information fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190318407A1 (en) * 2015-07-17 2019-10-17 Devanathan GIRIDHARI Method for product search using the user-weighted, attribute-based, sort-ordering and system thereof
CN107977798A (en) * 2017-12-21 2018-05-01 中国计量大学 A kind of risk evaluating method of e-commerce product quality
CN109345272A (en) * 2018-11-28 2019-02-15 中国计量大学 One kind is based on the markovian shop credit risk forecast method of improvement
CN110515982A (en) * 2019-07-17 2019-11-29 金蝶软件(中国)有限公司 Inspect method, apparatus, computer equipment and storage medium by random samples
CN110555596A (en) * 2019-08-09 2019-12-10 国网陕西省电力公司电力科学研究院 sampling inspection strategy making method and system based on power distribution material quality evaluation
CN111612340A (en) * 2020-05-21 2020-09-01 中国标准化研究院 Network commodity inspection sampling method based on big data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114626899A (en) * 2022-05-13 2022-06-14 南京铋悠数据技术有限公司 Product sales data acquisition method and system based on big data
CN114626899B (en) * 2022-05-13 2022-11-18 南京铋悠数据技术有限公司 Product sales data acquisition method and system based on big data
CN115293861A (en) * 2022-10-09 2022-11-04 连连银通电子支付有限公司 Commodity identification method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111612340A (en) 2020-09-01
CN111612340B (en) 2023-10-17

Similar Documents

Publication Publication Date Title
WO2021232856A1 (en) Big data-based online sales commodity sampling and testing method
TWI257556B (en) Rapid valuation of portfolios of assets such as financial instruments
CN110246029A (en) Risk management method, terminal, device and readable storage medium storing program for executing after loan
JP2012256325A (en) High-risk procurement analysis and scoring system
CN111583012B (en) Method for evaluating default risk of credit, debt and debt main body by fusing text information
CN109740036B (en) Hotel ordering method and device for OTA platform
CN107609771A (en) A kind of supplier&#39;s value assessment method
CN112966962A (en) Electric business and enterprise evaluation method
CN107944761A (en) Early warning and monitoring analysis method is complained based on artificial intelligence protection of consumers&#39; rights index enterprise
CN112884359A (en) Electric power spot market risk assessment method
Nasridinovna Methodological Foundations for Assessing Competitiveness
CN111612339B (en) Big data-based network sales commodity emotion tendency analysis method
CN107992613A (en) A kind of Text Mining Technology protection of consumers&#39; rights index analysis method based on machine learning
CN114912739A (en) Construction and application method of environment and transformer substation operation and maintenance cost correlation model
CN113077165A (en) Method for judging market force abuse of generator set
CN111951105A (en) Intelligent credit wind control system based on multidimensional big data analysis
TWI769385B (en) Method and system for screening potential purchasers of financial products
CN110648173B (en) Unsupervised abnormal commodity data detection method based on good evaluation and poor evaluation rates of commodities
TWI629660B (en) Bus company operation management service evaluation method based on big data analysis
CN112418704A (en) Evaluation method for online commodity comment quality
CN117575809A (en) Method and device for classifying freight risk users based on behavior data
TWM646766U (en) Computing device for credit scoring model evaluation
CN117952472A (en) Safe construction index evaluation method and system
CN116911865A (en) Service management platform for jewelry
CN115271311A (en) Assessment method and system for monitoring electric power retail market risk

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21809043

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21809043

Country of ref document: EP

Kind code of ref document: A1