CN108694647B - Method and device for mining merchant recommendation reason and electronic equipment - Google Patents

Method and device for mining merchant recommendation reason and electronic equipment Download PDF

Info

Publication number
CN108694647B
CN108694647B CN201810447255.0A CN201810447255A CN108694647B CN 108694647 B CN108694647 B CN 108694647B CN 201810447255 A CN201810447255 A CN 201810447255A CN 108694647 B CN108694647 B CN 108694647B
Authority
CN
China
Prior art keywords
recommendation reason
word
recommendation
reason
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810447255.0A
Other languages
Chinese (zh)
Other versions
CN108694647A (en
Inventor
虞金花
苏婧
兰田
侯培旭
华镇
陈翀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201810447255.0A priority Critical patent/CN108694647B/en
Publication of CN108694647A publication Critical patent/CN108694647A/en
Application granted granted Critical
Publication of CN108694647B publication Critical patent/CN108694647B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0217Discounts or incentives, e.g. coupons or rebates involving input on products or services in exchange for incentives or rewards
    • G06Q30/0218Discounts or incentives, e.g. coupons or rebates involving input on products or services in exchange for incentives or rewards based on score
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a method for mining merchant recommendation reasons, belongs to the technical field of computers, and solves the problem that the recommendation reasons mined in the prior art are inaccurate. The method for mining the merchant recommendation reason comprises the following steps: determining candidate recommendation reasons and feature vectors of the candidate recommendation reasons based on user original data of a target merchant; determining a high-quality candidate recommendation reason and an evaluation score of the high-quality candidate recommendation reason according to the feature vector of the candidate recommendation reason through a preset recommendation reason classification model; and constructing a recommendation reason pool of the target merchant based on the high-quality candidate recommendation reason and the evaluation score of the high-quality candidate recommendation reason. The method for mining the merchant recommendation reason disclosed by the embodiment of the application effectively improves the accuracy of the mined recommendation reason.

Description

Method and device for mining merchant recommendation reason and electronic equipment
Technical Field
The application relates to the technical field of computers, in particular to a method and a device for mining merchant recommendation reasons and electronic equipment.
Background
In a searching or recommending scene, the merchant recommendation reason is important information for assisting a user in making a decision, and the method for mining the merchant recommendation reason in the prior art mainly comprises manual operation and rule matching. Wherein, the manual operation requires a large amount of labor cost, and the quantization cost is quite high; although the rule matching can automatically dig out the recommendation reason, most of the rules are dug out based on character matching, the evaluation factor of the recommendation reason is single, and the dug-out recommendation reason is not accurate enough. Moreover, the templates are fixed, the extracted recommendation reasons are relatively single, the recommendation reasons of merchants seen by each user are the same, and the recommendation reasons seen by the same user in different searching or recommendation scenes are also the same, so that the user decision efficiency is low.
In summary, the mining method for merchant recommendation reasons in the prior art at least has the defect that the mined recommendation reasons are inaccurate.
Disclosure of Invention
The application provides a mining method for merchant recommendation reasons, which solves the problem that mining recommendation reasons existing in a mining method for merchant recommendation reasons in the prior art are inaccurate.
In order to solve the above problem, in a first aspect, an embodiment of the present application provides a method for mining a merchant recommendation reason, including:
determining candidate recommendation reasons and feature vectors of the candidate recommendation reasons based on user original data of a target merchant;
determining a high-quality candidate recommendation reason and an evaluation score of the high-quality candidate recommendation reason according to the feature vector of the candidate recommendation reason through a preset recommendation reason classification model;
and constructing a recommendation reason pool of the target merchant based on the high-quality candidate recommendation reason and the evaluation score of the high-quality candidate recommendation reason.
In a second aspect, an embodiment of the present application provides a mining apparatus for merchant recommendation reasons, including:
the candidate recommendation reason and feature vector determining module is used for determining candidate recommendation reasons and feature vectors of the candidate recommendation reasons based on user original data of the target merchants;
the candidate recommendation reason set evaluation score determining module is used for determining high-quality candidate recommendation reasons and evaluation scores of the high-quality candidate recommendation reasons according to the feature vectors of the candidate recommendation reasons through a preset recommendation reason classification model;
and the recommendation reason pool building module is used for building the recommendation reason pool of the target merchant based on the high-quality candidate recommendation reason and the evaluation score of the high-quality candidate recommendation reason.
In a third aspect, an embodiment of the present application further discloses an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method for mining the merchant recommendation reason according to the embodiment of the present application when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the digging method for merchant recommendation reasons disclosed in the present application.
The method for mining the merchant recommendation reason disclosed by the embodiment of the application determines the candidate recommendation reason and the feature vector of the candidate recommendation reason through user original data based on a target merchant; determining a high-quality candidate recommendation reason and an evaluation score of the high-quality candidate recommendation reason according to the feature vector of the candidate recommendation reason through a preset recommendation reason classification model; and constructing a recommendation reason pool of the target merchant based on the high-quality candidate recommendation reason and the evaluation score of the high-quality candidate recommendation reason, so that the problem that the recommendation reason mined in the prior art is inaccurate is solved. According to the method for mining the merchant recommendation reason, the candidate recommendation reason is mined based on user original data, the candidate recommendation reason is calculated through a pre-trained classification model, the evaluation score is calculated based on the preset dimensional characteristics, and then the high-quality candidate recommendation reason is determined according to the evaluation score to serve as the merchant recommendation reason, so that the problem that the subjective factors and rules introduced by manual operation are matched and the singleness limit is introduced is avoided, and the accuracy of the mined recommendation reason is effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flowchart of a method for mining reasons for merchant recommendations according to a first embodiment of the present application;
FIG. 2 is a flowchart of a method for mining merchant recommendation reasons according to a second embodiment of the present application;
fig. 3 is a schematic structural diagram of a mining apparatus for merchant recommendation reasons according to a third embodiment of the present application;
fig. 4 is a second schematic structural diagram of a digging device for merchant recommendation reasons according to a third embodiment of the present application;
fig. 5 is a third schematic structural diagram of a digging device for merchant recommendation reasons according to a third embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making an invasive task, are within the scope of the present application.
Example one
As shown in fig. 1, the method for mining the merchant recommendation reason disclosed in this embodiment includes: step 110 to step 130.
And step 110, determining candidate recommendation reasons and feature vectors of the candidate recommendation reasons based on the user original data of the target merchants.
In specific implementation, candidate recommendation reasons are mined from user original data of target merchants. Firstly, data processing is carried out on each piece of user original data of a target merchant to obtain a plurality of clauses, wherein each clause corresponds to a candidate recommendation reason.
Further, based on the user original data of the target merchant, a sentiment analysis method, a word frequency screening method and the like are combined to perform data cleaning on a preset word bank, and a reference word bank is obtained. The reference word bank comprises a front idiom, a degree word, a high-level evaluation word and a common evaluation word.
Then, based on the reference word stock, determining a feature vector of each candidate recommendation reason as a feature vector of the candidate recommendation reason. For example, matching the words in the clauses corresponding to each candidate reason for recommendation with the words in the reference lexicon, assigning values to each dimension of the feature vector of the candidate reason for recommendation according to the matching result, and finally obtaining the feature vector of each candidate reason for recommendation mined.
And step 120, determining a high-quality candidate recommendation reason and an evaluation score of the high-quality candidate recommendation reason according to the feature vector of the candidate recommendation reason through a preset recommendation reason classification model.
When the method is concretely implemented, firstly, a recommendation reason classification model is trained according to the recommendation reason mined from the original data of the user. Then, in a specific application process, the feature vector of the candidate recommendation reason is input into the recommendation reason classification model trained in advance, so that the evaluation score of the candidate recommendation reason and whether the candidate recommendation reason is a good candidate recommendation reason can be obtained. For example, when the evaluation score of the candidate reason for recommendation is 0.5 or more, the candidate reason for recommendation is determined to be a good candidate reason for recommendation, and when the evaluation score of the candidate reason for recommendation is less than 0.5, the candidate reason for recommendation is determined to be a non-good candidate reason for recommendation.
And step 130, constructing a recommendation reason pool of the target merchant based on the high-quality candidate recommendation reason and the evaluation score of the high-quality candidate recommendation reason.
And then, further screening the mined high-quality candidate recommendation reasons according to information such as entity word frequency included in the high-quality candidate recommendation reasons and evaluation scores of the high-quality candidate recommendation reasons, and filtering the high-quality candidate recommendation reasons of which the entity word frequency included in the high-quality candidate recommendation reasons does not meet preset conditions. And finally, for the reserved high-quality candidate recommendation reasons, selecting a high-quality candidate recommendation reason with the highest evaluation score in the high-quality candidate recommendation reasons corresponding to a certain entity word or a certain group of entity words according to the evaluation score of the high-quality candidate recommendation reason corresponding to each entity word, taking the high-quality candidate recommendation reason as the high-quality candidate recommendation reason corresponding to the entity word or the group of entity words, and adding the high-quality candidate recommendation reason into the recommendation reason pool of the target business.
The method for mining the merchant recommendation reason disclosed by the embodiment of the application determines the candidate recommendation reason and the feature vector of the candidate recommendation reason through user original data based on a target merchant; determining a high-quality candidate recommendation reason and an evaluation score of the high-quality candidate recommendation reason according to the feature vector of the candidate recommendation reason through a preset recommendation reason classification model; and constructing a recommendation reason pool of the target merchant based on the high-quality candidate recommendation reason and the evaluation score of the high-quality candidate recommendation reason, so that the problem that the recommendation reason mined in the prior art is inaccurate is solved. According to the method for mining the merchant recommendation reason, the candidate recommendation reason is mined based on user original data, the candidate recommendation reason is calculated through a pre-trained classification model, the evaluation score is calculated based on the preset dimensional characteristics, and then the high-quality candidate recommendation reason is determined according to the evaluation score to serve as the merchant recommendation reason, so that the problem that the subjective factors and rules introduced by manual operation are matched and the singleness limit is introduced is avoided, and the accuracy of the mined recommendation reason is effectively improved.
Example two
As shown in fig. 2, the method for mining the merchant recommendation reason disclosed in this embodiment includes: step 210 to step 270.
Step 210, training a recommended reason classification model.
When the method is specifically implemented, training samples of the recommendation reasons are mined based on user original data of the target merchants, and then a recommendation reason classification model is trained based on the training samples. And the training sample is provided with a reason label of whether the training sample is a high-quality recommendation or not.
Firstly, data processing is carried out on user original data of a target merchant, a plurality of clauses are determined, and each clause corresponds to a possible recommendation reason.
The data processing is carried out on the user original data of the target commercial tenant, and a plurality of clauses are determined, wherein the clauses comprise: and carrying out data cleaning and segmentation on the user original data of the target merchant according to a preset rule to obtain a plurality of clauses, wherein the preset rule comprises at least one of the following items: the clause length is larger than the preset number of characters, the clause containing the connecting words is deleted, and the preset symbol (such as the emoticon) is deleted.
In specific implementation, UGC (user Generated content) user original data of all business merchants of the platform in a recent period of time (such as one year) can be extracted, clauses are divided according to punctuation marks except for a pause number, and then clauses are filtered according to length, special characters, connecting words and the like to obtain a plurality of pieces of source data. The original data of the user is used as' dishes, tastes and services absolutely meet the Michelin standard, but the storefront is too small, and the staircase is very narrow … …, so that the situation that the storefront is spacious is perfect
Figure RE-GDA0001752407020000051
"this UGC is an example, and first, a sentence is divided according to punctuation marks except for a ton number, so as to obtain 4 clauses: "dishes, tastes and services are absolutely up to the Michelin standard", "but the storefront is too small", "the stairs are also narrow", "perfect if the storefront is spacious
Figure RE-GDA0001752407020000052
"; then, special characters such as emoticons are filtered, clauses containing the conjunction word 'however' are removed, clauses with the length larger than 5 are selected, and finally two pieces of source data 'dish, taste and service absolutely meet the Michelin standard' and 'are perfect if the dish, taste and service can be spacious'.
In specific implementation, a plurality of clauses can be determined according to the user original data of each merchant, and each clause can be possibly used as a recommendation reason. The user original data are complex in format and various in content, dirty data are screened out by cleaning the user original data, accuracy of the trained recommendation reason classification model can be improved, and effectiveness of mined recommendation reasons can be improved.
Further, based on the user original data of the target merchant, a preset word bank is subjected to data cleaning by combining an emotion analysis method and a word frequency screening method, and a reference word bank containing a front idiom, a degree word, a high-level evaluation word and a common evaluation word is determined.
In specific implementation, when the recommendation reason features are extracted from each clause, operations such as syntactic analysis and text feature extraction need to be performed on the clause, and the recommendation reason features of each clause need to be determined according to the front idiom, the degree words, the high-level evaluation words, the common evaluation words and the like. Therefore, in order to improve the accuracy of the extracted features, the general word bank is firstly cleaned according to the original data of the user to obtain the front idioms, the degree words and the evaluation words which are suitable for the application scene of the application, and the evaluation words are further distinguished as follows: high-level evaluation words and general evaluation words.
The positive idiom in the embodiment of the application is an idiom which describes information such as the taste, environment, service and the like of the dish of the merchant from the good and positive side, such as 'just right' and 'high-quality and low-price' and the like. The positive degree words in the embodiment of the application are degree adverbs for describing information such as dish taste, environment, service and the like of the merchant from a good and positive side, such as 'especially', 'unavailable' and the like. The positive evaluation words in the embodiment of the application are adjectives for describing information such as the taste, the environment, the service and the like of the dish of the merchant from the good and positive side. Meanwhile, the evaluation words are divided into high-grade evaluation words and common evaluation words based on the emotion scores of the evaluation words, for example, the high-grade evaluation words comprise ' savoury and ' crisp ' and the like, and the common evaluation words comprise ' fresh ' and ' cost-effective ' and the like. The preset word bank is a general word bank comprising a front idiom, a degree word and an evaluation word.
In specific implementation, firstly, cross judgment is carried out on idioms, degree words and evaluation words in a preset word bank and the full UGC data respectively to obtain all idioms, degree words and evaluation words appearing in the full UGC data and the frequency of each word; secondly, obtaining emotion scores of the idioms, the degree words and the evaluation words appearing in the total UGC data by using an emotion analysis method; and finally, comprehensively considering the frequency and the emotion classification of the words to obtain a word bank consisting of the front idioms, the degree words and the evaluation words, and classifying the evaluation words into high-level evaluation words and common evaluation words. In specific implementation, the emotion score and the word frequency threshold value can be preset, and the emotion score higher than the preset emotion score threshold value or the frequency higher than the preset word frequency threshold value is reserved in the reference word bank.
To this end, there is obtained a composition comprising: and the word banks of the front idioms, the degree words, the high-level evaluation words and the common evaluation words are used as reference word banks for recommending reason characteristics in the extraction source data.
In specific implementation, the preset word bank is cleaned to determine the reference word bank, data processing is carried out based on user original data, and specific execution sequence of a plurality of clauses is determined without limitation.
Then, based on the reference word stock, determining a feature vector of each clause as a recommendation reason feature vector sample, and setting whether the feature vector sample is a high-quality recommendation reason label for each recommendation reason feature vector sample.
In a specific implementation, the feature vector of the recommendation reason includes any one or more of the following dimensions: syntax structure, whether a language word is contained, a sentence text score, whether a common evaluation word is contained, the number of common evaluation words, whether a high-level evaluation word is contained, the number of high-level evaluation words, whether a degree word is contained, the number of degree words, whether an idiom is contained, the number of idioms, an emotion score, a comment score, whether a merchant descriptor is contained, the number of merchant descriptors, the weight of merchant descriptors, whether an entity is present, the number of entities, the frequency of entity words, whether a viewpoint is present, the number of viewpoints and the viewpoint score.
In order to obtain a high-quality recommendation reason with merchant characteristics, in this embodiment, 22 characteristics of sub-dimensions are selected from three dimensions of syntax, sentence quality, and merchant association degree, where the syntax reflects whether the constituent parts and the arrangement order of the clauses are reasonable, and includes the characteristics: syntactic structure, whether the structure contains language words and text marks; the sentence quality reflects whether the clause language description is vivid, specific and picture-like, and comprises the following characteristics: whether the evaluation words contain common evaluation words or not, the number of the common evaluation words, whether the high-level evaluation words contain high-level evaluation words or not, the number of the high-level evaluation words, whether the degree words contain degree words or not, the number of the degree words or not, whether the idioms contain idioms or not, the number of the idioms, the emotion marks or the comment marks; the merchant association degree reflects whether the content described by the clause is the merchant characteristics or not, can attract users to generate strong interest for merchants, and comprises the following characteristics: whether merchant descriptors are contained, the number of merchant descriptors, the weight of merchant descriptors, whether entities exist, the number of entities, the word frequency of entities, whether viewpoints exist, the number of viewpoints and the viewpoint score.
The feature meaning and the acquisition method of each dimension are described in detail below.
The syntactic structure characteristics represent the part-of-speech meaning of the sentence beginning, 0 represents a recommended dish, 1 represents a merchant category, 2 represents an adjective, 3 represents a noun, 4 represents a verb, and 5 represents other parts-of-speech. In specific implementation, the value of the syntactic structure characteristic can be determined by segmenting the sentence and then judging the initial part of the sentence.
Whether the semantic word characteristics are contained or not indicates whether the following semantic words are contained or not: 'wool', 'Yao', 'wo', 'pray', include 1, not include 0.
The text score characteristic represents a sentence length score and is between 0 and 1, and the larger the value is, the longer the sentence length is.
Whether the common evaluation word characteristics are contained or not indicates whether the current clause contains the common evaluation words, wherein the common evaluation words are contained as 1 and not contained as 0.
The general term number feature indicates the number of general terms included in the current clause.
Whether the high-level evaluation word feature is contained or not is represented by whether the high-level evaluation word is contained in the current clause or not, wherein the high-level evaluation word is contained as 1 and not contained as 0.
The high-level rating term quantity feature represents the quantity of high-level rating terms contained in the current clause.
The term containing or not feature indicates whether or not the current clause contains a term, the term containing or not is 1, and the term not containing is 0.
The degree word quantity feature represents the quantity of the degree words contained in the current clause.
Whether the idiom feature is included or not indicates whether the idiom is included in the current clause, the inclusion is 1, and the exclusion is 0.
The idiom quantity feature represents the quantity of idioms contained in the current clause.
In specific implementation, whether the words in the current clause contain the common evaluation words, the number of the common evaluation words, whether the words contain the high-level evaluation words, the number of the high-level evaluation words, whether the words contain the degree words, the number of the degree words, whether the idioms contain the idioms and the idiom number characteristics are compared with the words in the reference word bank respectively, and the number statistics is carried out to determine the words.
The emotion classification characteristic represents quantitative representation of emotion tendentiousness of the current clause, is between 0 and 1, and is obtained by calling a preset emotion analysis service interface, wherein the larger the value is, the more positive the emotion of the current clause is.
The comment score characteristics show that the quality of the original data of the user where the current clause is located is integrated, such as whether a score is obtained by high-quality comments, clicks, praise, commenting, star grades and comment publishing time, and the higher the value is, the higher the quality of the clause is. In specific implementation, the comments are given by the formula:
the reviewScore is calculated by (log (follows + hits + voteGoods +1) + isQuality T + star W) × timeScore, wherein follows represents the number of scores, hits represents the number of clicks, voteGoods represents the number of likes, isQuality represents whether the comment is a good comment, T represents a good comment adjustment factor (for example, a value of 30), star represents the comment star rating, W represents a comment star-level adjustment factor (for example, a value of 0.8),
the timecore represents a time decay factor, obtained by the formula timecore ═ 3650-x)/3650, where x represents the number of days to evaluate the publication time from the current time. Whether high-quality comments, the number of clicks, the number of prawns, the number of follow-up comments, the star level of comments and the number of days of the current time of the evaluation release time can be acquired by using the existing service interface.
Whether the description words contain merchant description word features or not indicates whether the clauses contain the description words of the merchants where the clauses are located, if yes, the description words are 1, otherwise, the description words are 0, wherein the merchant description words are entity words which are originated from original data under the merchants and have large relevance with the merchants, and the merchant description words can be obtained by calling an existing interface.
The number characteristic of the merchant descriptors is used for indicating the number of the merchant descriptors contained in the current clause.
And the merchant descriptor weight characteristic represents the word frequency value of the merchant descriptor, and if a plurality of merchant descriptors exist in the current clause, the maximum word frequency value is taken.
In specific implementation, the information whether the merchant descriptor, the number of the merchant descriptors and the weight distribution characteristics of the merchant descriptors are included is obtained by calling the existing service structure.
The existence of the entity feature indicates whether the current clause contains an entity word, wherein the existence of the entity word is 1, and the existence of the entity word is not 0.
The entity quantity characteristic represents the quantity of entity words contained in the current clause.
In specific implementation, the entity words are obtained by using preset services.
And the entity word frequency characteristic represents the frequency of the entity words in the current clause appearing in the user comments of the merchants to which the entity words belong. In specific implementation, the word frequency of the entity word is the maximum value of the word frequency of each entity word included in the current clause. For example, if a clause includes multiple entity words ei, the frequency of each entity word appearing in the user's comments of the affiliated merchant is denoted as CeiThen the entity word frequency characteristic entritytf of the clause passes through the formula
Figure BDA0001657538340000091
And calculating to obtain i as an integer larger than 1.
The presence or absence of a viewpoint feature indicates whether or not the current clause includes a viewpoint, and the presence or absence of a viewpoint feature is 1 or 0.
The viewpoint number characteristic indicates the number of viewpoints included in the current clause.
The viewpoint feature, a quantitative representation of the importance of the viewpoint, is between 0 and 1, and a larger value indicates that the viewpoint is more important.
In specific implementation, the presence or absence of the viewpoint, the viewpoint number and the viewpoint division characteristics are obtained by calling an existing service structure.
For example, taking a clause as 'the roasted lamb leg is fresh', the characteristic vector generation process is as follows:
Figure BDA0001657538340000101
therefore, the 22-dimensional feature vector corresponding to the clause that the roasted lamb leg is fresh is as follows: [1,0,1,1,1,1,1,0,0,1,1,0,0,0.92,31.77,1,1,0,375,1,1,0.48,0.07].
Then, setting a recommendation reason label for the feature vector of each clause, wherein the label is "1" for example, the recommendation reason corresponding to the clause of the feature vector is a high-quality recommendation reason; the label of "0" indicates that the recommendation reason corresponding to the clause of the feature vector is a non-premium recommendation reason.
The feature vector of the clause is extracted from one sub-dimension of three dimensions of syntax, sentence quality and merchant association degree, namely the feature vector of the recommendation reason is extracted, the problems of the quality of the sentence and the association with the merchant and the like are considered, the feature coverage is wide, and the mined recommendation reason is closely related to the merchant.
And finally, training a recommended reason classification model through the training samples.
In specific implementation, each training sample at least comprises two fields of a sample label and a feature vector, wherein the sample label is used for identifying whether the sample corresponds to a high-quality recommendation reason. Taking an example of training a random forest classification model through a training sample, wherein the training process of the model is actually the process of training the feature weight of each dimension of the sample, and after the training of the classification model of the recommendation reason is finished, the optimal weight of each dimension of the feature vector of the recommendation reason is obtained.
Step 220, based on the user original data of the target merchant, determining a candidate recommendation reason and a feature vector of the candidate recommendation reason.
When a user performs operations such as searching or querying, a searching or querying application or service may recommend a relevant business for the user according to a keyword or query word input by the user. For example, when the user inputs "roast lamb leg", the application or service may recommend merchants such as "seeshell oat village", "ninety-nine yurt" to the user according to the keyword "roast lamb leg". To facilitate user decision-making, the application or service will typically also show a recommendation reason for the respective merchant. In specific implementation, a recommendation reason pool of each merchant is constructed according to information such as the category, the place, the service and the comment of the merchant, the recommendation reason pool comprises at least one recommendation reason, and the recommendation reason refers to a sentence which is used for vividly and specifically describing entity words under the merchant, so that the user is attracted to generate interest in the merchant through the recommendation reasons. The entity words may be: the name of the merchant information such as product, service, environment, etc.
In a specific implementation, the reason for recommending the merchant can be manually set by an application or a platform, or extracted according to click rate, purchase rate and heat information. Preferably, the candidate reason for recommendation is determined based on user-originating data of the target merchant.
In specific implementation, a plurality of clauses included in each user original data are determined by performing data processing on the user original data of the target merchant, and each clause corresponds to one candidate recommendation reason, that is, each clause may correspond to one candidate recommendation reason. Then, a feature vector of each clause is determined as a feature vector of a candidate recommendation reason based on a preset reference word bank. The step of determining candidate recommendation reasons and feature vectors of the candidate recommendation reasons based on the user original data of the target merchant comprises the following steps: carrying out data processing on user original data of a target merchant to obtain a plurality of clauses, wherein each clause corresponds to a candidate recommendation reason; determining a feature vector of each candidate recommendation reason based on a preset reference word bank; the reference word bank is obtained by cleaning data of a preset word bank by combining an emotion analysis method and a word frequency screening method based on the user original data of the target merchant; the reference word bank comprises a front idiom, a degree word, a high-level evaluation word and a common evaluation word.
Wherein the feature vector of the recommendation reason comprises any one or more of the following dimensions: syntax structure, whether a language word is contained, a sentence text score, whether a common evaluation word is contained, the number of common evaluation words, whether a high-level evaluation word is contained, the number of high-level evaluation words, whether a degree word is contained, the number of degree words, whether an idiom is contained, the number of idioms, an emotion score, a comment score, whether a merchant descriptor is contained, the number of merchant descriptors, the weight of merchant descriptors, whether an entity exists, the number of entities, the word frequency of the entity words, whether a viewpoint exists, the number of viewpoints and the viewpoint score. The comments are given by the formula: the reviewScore is calculated by (log (follows + hits + voteagoes +1) + isQuality T + star W). Timescore, wherein followos represents the number of scores, hits represents the number of clicks, voteagoes represents the number of likes, isQuality represents whether the comment is a good comment, T represents a good comment adjustment factor, star represents the comment star, W represents a comment star adjustment factor, Timescore represents a time attenuation factor, and the formula is used for calculating the time attenuation factor
timeScore ═ (3650-x)/3650 was obtained, where x represents the number of days from the current time at which the publication was evaluated.
Optionally, the word frequency of the entity word is the maximum value of the word frequency of each entity word included in the current clause.
And determining candidate recommendation reasons and specific real-time modes of the feature vectors of the candidate recommendation reasons based on the user original data of the target merchants, and mining specific implementation modes of training samples of the recommendation reasons based on the user original data of the target merchants when referring to a training recommendation reason classification model, wherein the detailed descriptions are omitted here.
In specific implementation, one part of the recommendation reason feature vector samples determined according to the user original data can be randomly selected as training samples, and the other part of the recommendation reasons can be used as candidate recommendation reasons.
And step 230, determining a high-quality candidate recommendation reason and an evaluation score of the high-quality candidate recommendation reason according to the feature vector of the candidate recommendation reason through a preset recommendation reason classification model.
And inputting the candidate recommendation reason into a recommendation reason classification model, and obtaining whether the candidate recommendation reason is a high-quality candidate recommendation reason or not and a corresponding evaluation score. For example, if the feature vector [1,0,1,1,1,1,1,0,0,1,1,0, 0,0.92,31.77,1,1,0,375,1,1,0.48,0.07] of the candidate recommendation reason corresponding to the clause "roasted lamb leg is fresh" is input to the trained recommendation reason classification model, and the output result is 1 and the score is 0.89, the feature vector indicating the input recommendation reason is the feature vector of the high-quality candidate recommendation reason, that is, "roasted lamb leg is fresh" is the high-quality candidate recommendation reason, and "roasted lamb leg is fresh" is determined to be the score of the high-quality candidate recommendation reason, which is 0.89.
In particular implementation, only good candidate recommendation reasons may be used to construct the recommendation reason pool for the target merchant.
And 240, constructing a recommendation reason pool of the target merchant based on the high-quality candidate recommendation reason and the evaluation score of the high-quality candidate recommendation reason.
In specific implementation, an entity word corresponding to each high-quality candidate recommendation reason can be determined by calling an interface of the existing service, then, for each target merchant, evaluation and screening operations are respectively performed on the high-quality candidate recommendation reasons mined from the user original data of the target merchant, the corresponding relation between the recommendation reason of each target merchant and the entity word is determined, and a recommendation reason pool of the target merchant is constructed based on the corresponding relation between the recommendation reason and the entity word.
In specific implementation, the constructing a recommendation reason pool of the target merchant based on the high-quality candidate recommendation reason and the evaluation score of the high-quality candidate recommendation reason includes: screening out high-quality candidate recommendation reasons with the entity word frequency larger than a preset word frequency threshold value according to the entity word frequency characteristic value in the characteristic vector of the high-quality candidate recommendation reasons, and taking the high-quality candidate recommendation reasons as the high-quality candidate recommendation reasons of the target merchant; taking one high-quality candidate recommendation reason with the highest evaluation score in the high-quality candidate recommendation reasons corresponding to the same group of entity words as the recommendation reason corresponding to the same group of entity words to determine the recommendation reason corresponding to each group of entity words; and constructing a recommendation reason pool of the target merchant according to the recommendation reason corresponding to each group of entity words.
Detailed description of the inventionFirstly, for the high-quality candidate recommendation reason mined from the clause of a certain target merchant, screening out the high-quality candidate recommendation reason of which the entity word frequency is greater than a preset word frequency threshold value according to the value of the entity word frequency feature in the feature vector of the recommendation reason, and taking the high-quality candidate recommendation reason as the high-quality candidate recommendation reason of the target merchant. In specific implementation, the preset word frequency threshold FH is determined by the following formula: FH ═ max (entityTF)1,...,entityTFn) 0.1, wherein, entityTF1,., entityTF is the entity word frequency characteristic value of the characteristic vector of the high-quality candidate recommendation reason of the target merchant.
Then, for all the good candidate recommendation reasons comprising the same entity word, selecting one good candidate recommendation reason with the highest evaluation score as the recommendation reason corresponding to the same entity word. Suppose that UGC under a certain merchant A has 5 clauses of ' very fresh roasted lamb leg ', ' crisp roasted lamb leg outside and tender inside ', ' roasted lamb leg is well roasted on the whole table ', ' shop with high cost performance on western-style road better served ' and ' this shop service carefully comes around ', four high-quality candidate recommendation reasons ' roasted lamb leg is very fresh ', ' crisp lamb leg outside and tender inside ', shop with high cost performance on western-style road better served ' and ' this shop service carefully comes around ' are classified by a random forest classification model to obtain classification results 1, the high-quality candidate recommendation reasons and evaluation scores dug out from each clause are respectively 0.4, 0.5, 0.35 and 0.32, wherein ' very fresh roasted lamb leg ' and ' crisp lamb leg outside and tender ' are both used for describing the entity word ' roasted lamb leg ', that is the entity of ' roasted lamb leg ' corresponding to two high-quality recommendation reasons, in order to avoid redundancy, the high-quality candidate recommendation reason with the highest evaluation score is finally selected as the recommendation reason corresponding to the entity word, namely the final recommendation reason corresponding to the roasted lamb leg is that the roasted lamb leg is crisp outside and tender inside. The 'store with good service and high cost performance on the links of the Dixi province' and 'the store can be served carefully and thoughtlessly' describe the entity words 'service' and 'store', and the same reason selects the store with the highest evaluation score as the recommendation reason corresponding to the group of entity words, namely 'the store with good service and high cost performance on the links of the Dixi province' corresponds to the entity words 'service' and 'store'.
In specific implementation, according to the recommendation reason corresponding to each group of entity words, constructing the recommendation reason pool of the target merchant may be: and according to the evaluation scores of the high-quality candidate recommendation reasons, selecting a preset number (such as the top 20) of high-quality candidate recommendation reasons with the highest evaluation score and entity words corresponding to the preset number of high-quality candidate recommendation reasons, and constructing a recommendation reason pool of the target merchant. The high-quality candidate recommendation reasons are further screened by combining the entity word frequency and the evaluation score, so that the effectiveness of the recommendation reasons of the target trader can be improved.
In specific implementation, after the building of the recommendation reason pool of the target merchant based on the premium candidate recommendation reason and the evaluation score of the premium candidate recommendation reason, the method further includes: determining a user characteristic vector of a current user according to historical behavior information and real-time behavior information of the current user; determining a feature vector of each recommendation reason in a recommendation reason pool of a target merchant, wherein the target merchant is a merchant recommended to the current user according to the real-time behavior of the current user; and determining the recommendation reason of the merchant according to the similarity between the user characteristic vector and the characteristic vector of the recommendation reason.
And step 250, determining the user characteristic vector of the current user according to the historical behavior information and the real-time behavior information of the current user.
The determining the user feature vector of the current user according to the historical behavior information and the real-time behavior information of the current user comprises the following steps: determining a historical behavior feature vector of the current user according to a word vector of a keyword of the historical behavior information of the current user, which is obtained through a preset word vector model; determining the real-time behavior feature vector of the current user according to the word vector of the keyword of the real-time behavior information of the current user, which is obtained through the preset word vector model; determining a user feature vector of the current user by performing weighted summation on the historical behavior feature vector and a real-time behavior feature vector, wherein the real-time behavior information comprises: behavioral intent and/or behavioral scenarios.
The user feature vector in the embodiment of the application is determined according to the keywords describing the historical behavior information and the real-time behavior information of the current user. Taking the current user to input the query word 'vegetable' and then perform a search operation as an example, historical behavior information related to the query word 'vegetable' in the historical behavior information of the user can be obtained according to the query word 'vegetable', such as information related to behaviors of clicking, purchasing, browsing, querying, commenting and the like related to the query word 'vegetable', and information of keywords, time, word frequency and the like in the information is extracted; then, determining a word vector of each keyword through a pre-trained word vector model; finally, determining the weight of the word vector of the keyword according to the time, the frequency and other information of the historical behaviors corresponding to the keyword, and performing weighted summation on the feature vectors of the keyword to obtain the user historical behavior feature vector. The keywords related to the query word "vegetable" may be entity words, degree words, evaluation words, and the like in the user original data. For example, the keywords "vegetable", "freshness" may be determined from a piece of user history behavior information "vegetable was purchased in 10 months in 2017, vegetable is very fresh".
In specific implementation, the real-time behavior information includes the following information of the current user: behavioral intent and/or behavioral scenarios, etc. Wherein the behavioral intent may include: the behavior scene can be a search scene, a query scene, and information such as behavior place and time. The method comprises the steps of extracting keywords describing real-time behavior information, and then determining word vectors of the keywords through a pre-trained word vector model. Still taking the current user to input the query word "vegetable" and then perform a search operation as an example, according to the query word "vegetable", the keyword describing the behavioral intention of the user can be determined as "vegetable", and the feature vector of the keyword "vegetable" of the behavioral intention is determined by a pre-trained word vector model. Furthermore, when the current user is in a search scene, the behavior scene is used as an input, meanwhile, a keyword 'vegetable' is input, and the feature vector of the behavior scene is determined through a pre-trained word vector model. In specific implementation, the behavior intention characteristic vector and the behavior scene characteristic vector are determined according to the keywords, the scene of the real-time behavior of the user and a pre-trained word vector model. For example, for the keyword "vegetable," the behavior scene feature vector and the behavior intent feature vector may be the same when in the search scene, and different when in the recommendation scene.
And finally, determining the user characteristic vector of the current user by performing weighted summation on the historical behavior characteristic vector and the real-time behavior characteristic vector. When the method is implemented, the formula V can be useduser=WhisVhis+WqueryVquery+WsceneVsceneDetermining a user feature vector VuserWherein V ishisFor the user's historical behavior feature vector, WhisWeight of the feature vector of the user's historical behavior, VqueryFor behavioral intention feature vectors, WqueryWeights for behavioral intention feature vectors, VsceneFor behavioral scene feature vectors, WsceneIs the weight of the behavior scene feature vector. In this embodiment, the real-time behavior feature vector includes a behavior scene feature vector and a behavior intention feature vector. In specific implementation, the historical behavior feature vector, the behavior scene feature vector and the behavior intention feature vector are obtained through the same word vector model, and therefore, the historical behavior feature vector, the behavior scene feature vector and the behavior intention feature vector are located in the same vector space and can be calculated.
Feature vector V due to behavioral intentqueryIs an N-dimensional vector abstracted based on key words for describing real-time searching behavior information of a user, and a behavior scene characteristic vector VsceneThe method is an N-dimensional vector abstracted based on scene (searching or recommending) information describing the current behavior of a user, and the historical behavior feature vector of the user is an N-dimensional vector abstracted based on historical behavior information describing the correlation between the user and the current behavior, namely the historical behavior, the real-time intention and the scene information of the user are comprehensively considered by the user feature vector, so that the information contained in the obtained vector is more comprehensive。
And during specific implementation, weights of the historical behavior characteristic vector and the real-time behavior characteristic vector of the user characteristic vector are calculated and determined according to business requirements. For example, when the user is in a search scene, the scene information at this time is the search behavior of the user, so that it is not necessary to repeatedly calculate the real-time scene vector, i.e. the weight W of the feature vector of the behavior scenesceneAnd setting the weight of the historical behavior feature vector and the behavior intention feature vector to be 0, and adjusting the weight according to the test effect. When the user is in a recommended scenario, the user does not search for behavior at this time, and therefore will be the weight W of the behavior intent feature vectorqueryAnd setting the weight of the historical behavior feature vector and the behavior scene feature vector to be 0, and adjusting the weight according to the test effect.
Step 260, determining the feature vector of each recommendation reason in the recommendation reason pool of the target merchant.
And the target merchant is a merchant recommended to the current user according to the real-time behavior of the current user.
In specific implementation, the determining of the recommendation reason pool of the target merchant includes a recommendation reason and an entity word corresponding to the recommendation reason, and the determining of the feature vector of each recommendation reason in the recommendation reason pool of the target merchant includes: and acquiring a feature vector of the entity word through a preset word vector model according to the entity word corresponding to the recommendation reason in the target merchant recommendation reason pool, wherein the feature vector is used as the feature vector of the recommendation reason corresponding to the entity word. In specific implementation, the recommendation reason pool is composed of recommendation reasons and entity words described by the recommendation reasons. By inputting an entity word corresponding to a certain recommendation reason into a pre-trained word vector model, a feature vector of the entity word, namely a feature vector of the recommendation reason corresponding to the entity word, can be obtained.
And 270, determining the recommendation reason of the merchant according to the similarity between the user feature vector and the feature vector of the recommendation reason.
The user feature vector is determined according to the word vector of the keyword for describing the real-time behavior and the historical behavior information of the user, the feature vector of the merchant recommendation reason is determined according to the word vector of the entity word for describing the merchant recommendation reason, and the word vectors are obtained through the same word vector model, so that the word vectors and the merchant recommendation reason are in the same vector space, and the word vectors and the merchant recommendation reason can be compared with each other in similarity. In specific implementation, the similarity between the user feature vector and the feature vector of the recommendation reason can be represented by cosine similarity between the user feature vector and the feature vector of the recommendation reason.
When the key words are the same or similar, the word vectors are also the same or similar. Therefore, when the keywords describing the user behavior are the same as or similar to the entity words describing the recommendation reason, the corresponding user feature vectors and the recommendation reason feature vectors of the merchants are also similar. Further, according to the similarity between the user feature vector and the recommendation reason feature vector, similar user behavior information and the recommendation reason of the business can be determined, so that the recommendation reason which is interested by the user is found.
For example, if the user a frequently searches and purchases the "roasted lamb leg", the user feature vector determined when the user a searches the "roasted lamb leg" again is generated based on the keyword extracted from the description information related to the "roasted lamb leg", and the keyword necessarily includes the "roasted lamb leg". For the recommendation pool of the target merchant B recommended to the user by aiming at the search operation of the user A, a plurality of recommendation reasons exist, wherein the recommendation reason that the roasted lamb leg is fresh exists, the corresponding entity word is the roasted lamb leg, and therefore the feature vectors of the recommendation reasons generated according to the entity word roasted lamb leg are necessarily similar, and the feature vectors of the user feature vectors and the recommendation reasons are generated on the basis of the keyword roasted lamb leg.
The method for mining the merchant recommendation reason disclosed by the embodiment of the application excavates the recommendation reason of the target merchant based on user original data of the target merchant, constructs a recommendation reason pool of the target merchant, and then determines a user feature vector of the current user according to historical behavior information and real-time behavior information of the current user in a specific application process; determining a feature vector of each recommendation reason in a recommendation reason pool of a target merchant, wherein the target merchant is a merchant recommended to the current user according to the real-time behavior of the current user; and finally, determining the recommendation reason of the merchant according to the similarity between the user feature vector and the feature vector of the recommendation reason, so that the problem of inaccurate mining recommendation reason in the prior art is solved. According to the method for mining the merchant recommendation reason disclosed by the embodiment of the application, the user characteristic vector is determined by combining the historical behavior information and the real-time behavior information of the user, and the merchant recommendation reason is matched under the real-time change condition, so that the variable and rich recommendation reason can be obtained. Meanwhile, the matching condition is generated based on the user information, so that different recommendation reasons can be matched for different users, the recommendation reasons are more pertinent, the personalized display of the recommendation reasons is realized, the decision cost of the users is reduced, and the user experience is further improved.
EXAMPLE III
As shown in fig. 3, the mining apparatus for merchant recommendation reasons disclosed in this embodiment includes:
a candidate recommendation reason and feature vector determination module 310, configured to determine a candidate recommendation reason and a feature vector of the candidate recommendation reason based on user original data of a target merchant;
a candidate recommendation reason set evaluation score determining module 320, configured to determine, according to a preset recommendation reason classification model, a high-quality candidate recommendation reason and an evaluation score of the high-quality candidate recommendation reason according to a feature vector of the candidate recommendation reason;
a recommendation reason pool construction module 330, configured to construct a recommendation reason pool for the target merchant based on the premium candidate recommendation reason and the evaluation score of the premium candidate recommendation reason.
Optionally, as shown in fig. 4, the recommendation reason pool building module 330 further includes:
the screening submodule 3301 is configured to screen out, according to the entity word frequency feature value in the feature vector of the high-quality candidate recommendation reason, a high-quality candidate recommendation reason that the entity word frequency is greater than a preset word frequency threshold value, as a high-quality candidate recommendation reason for the target merchant;
the merging submodule 3302 is configured to use one high-quality candidate recommendation reason with the highest evaluation score among the high-quality candidate recommendation reasons corresponding to the same group of entity words as a recommendation reason corresponding to the same group of entity words, so as to determine a recommendation reason corresponding to each group of entity words;
the recommendation reason pool constructing sub-module 3303 is configured to construct the recommendation reason pool of the target merchant according to the recommendation reason corresponding to each group of entity words.
Optionally, as shown in fig. 4, the candidate reason for recommendation and feature vector determining module 310 further includes:
a candidate recommendation reason mining submodule 3101, configured to perform data processing on user original data of a target merchant to obtain a plurality of clauses, where each clause corresponds to a candidate recommendation reason;
a candidate recommendation reason feature vector determination sub-module 3102, configured to determine a feature vector of each candidate recommendation reason based on a preset reference word bank;
the reference word bank is obtained by cleaning data of a preset word bank by combining an emotion analysis method and a word frequency screening method based on the user original data of the target merchant; the reference word bank comprises a front idiom, a degree word, a high-level evaluation word and a common evaluation word.
The original data formats of the users are complex, the contents are various, dirty data are screened out by cleaning the original data of the users, and the accuracy of the trained recommendation reason classification model and the effectiveness of the mined recommendation reason can be improved. The accuracy of the extracted features can be improved by cleaning the preset lexicon when the user original data is subjected to data processing. Optionally, the feature vector of the recommendation reason includes any one or more of the following dimensions: syntax structure, whether a language word is contained, a sentence text score, whether a common evaluation word is contained, the number of common evaluation words, whether a high-level evaluation word is contained, the number of high-level evaluation words, whether a degree word is contained, the number of degree words, whether an idiom is contained, the number of idioms, an emotion score, a comment score, whether a merchant descriptor is contained, the number of merchant descriptors, merchant descriptor weight, whether an entity is present, the number of entities, entity word frequency, whether a viewpoint is present, the number of viewpoints, and viewpoint score.
Optionally, the comments are given by the formula:
the reviewScore is calculated by (log (follows + hits + voteagoes +1) + isQuality T + star W). Timescore, wherein followss represents the number of scores, hits represents the number of clicks, voteagoes represents the number of likes, isQuality represents whether the comment is a good comment, T represents a good comment regulating factor, star represents the comment star grade, W represents a comment star grade regulating factor, Timescore represents a time attenuation factor, and the formula is used for calculating the time attenuation factor
timeScore ═ (3650-x)/3650 was obtained, where x represents the number of days from the current time at which the publication was evaluated.
Optionally, the word frequency of the entity word is the maximum value of the word frequency of each entity word included in the current clause.
The mining device for the merchant recommendation reasons, disclosed by the embodiment of the application, determines candidate recommendation reasons and feature vectors of the candidate recommendation reasons through user original data based on a target merchant; determining a high-quality candidate recommendation reason and an evaluation score of the high-quality candidate recommendation reason according to the feature vector of the candidate recommendation reason through a preset recommendation reason classification model; and constructing a recommendation reason pool of the target merchant based on the high-quality candidate recommendation reason and the evaluation score of the high-quality candidate recommendation reason, so that the problem that the recommendation reason mined in the prior art is inaccurate is solved. According to the method for mining the merchant recommendation reason, the candidate recommendation reason is mined based on user original data, the candidate recommendation reason is calculated through a pre-trained classification model, the evaluation score is calculated based on the preset dimensional characteristics, and then the high-quality candidate recommendation reason is determined according to the evaluation score to serve as the merchant recommendation reason, so that the problem that the subjective factors and rules introduced by manual operation are matched and the singleness limit is introduced is avoided, and the accuracy of the mined recommendation reason is effectively improved.
In another embodiment of the present application, optionally, as shown in fig. 5, the apparatus further includes:
the user feature vector determining module 340 is configured to determine a user feature vector of the current user according to historical behavior information and real-time behavior information of the current user;
a recommendation reason feature vector determining module 350, configured to determine a recommendation reason feature vector of each recommendation reason in a recommendation reason pool of a target merchant, where the target merchant is a merchant recommended to the current user according to the real-time behavior of the current user;
and a recommendation reason mining module 360, configured to determine the recommendation reason of the merchant according to the similarity between the user feature vector determined by the user feature vector determination module and the recommendation reason feature vector determined by the recommendation reason feature vector determination module.
Optionally, the recommendation reason feature vector determining module 350 is further configured to:
and acquiring a feature vector of the entity word through a preset word vector model according to the entity word corresponding to the recommendation reason in the target merchant recommendation reason pool, wherein the feature vector is used as the feature vector of the recommendation reason corresponding to the entity word.
Optionally, the user feature vector determination module 340 is further configured to:
determining a historical behavior feature vector of the current user according to a word vector of a keyword of the historical behavior information of the current user, which is obtained through a preset word vector model; determining the real-time behavior feature vector of the current user according to the word vector of the keyword of the real-time behavior information of the current user, which is obtained through the preset word vector model;
determining a user characteristic vector of the current user by performing weighted summation on the historical behavior characteristic vector and the real-time behavior characteristic vector;
wherein the real-time behavior information comprises: behavioral intent and/or behavioral scenarios
The user historical behavior feature vector is a multidimensional vector abstracted based on historical behavior information describing the user and the current behavior, namely the user feature vector comprehensively considers the historical behavior, the real-time intention and the scene information of the user, so that the obtained vector contains more comprehensive information.
The device for mining the merchant recommendation reason disclosed in the embodiment of the application excavates the recommendation reason of the target merchant through user original data based on the target merchant, constructs a recommendation reason pool of each target merchant, and then determines the user feature vector of the current user according to the historical behavior information and the real-time behavior information of the current user in a specific application process; determining a recommendation reason feature vector of each recommendation reason in a recommendation reason pool of a target merchant, wherein the target merchant is a merchant recommended to the current user according to the real-time behavior of the current user; and finally, determining the recommendation reason of the merchant according to the similarity between the user feature vector and the recommendation reason feature vector, and solving the problem of inaccurate recommendation reason mined in the prior art. The digging device for the merchant recommendation reasons disclosed by the embodiment of the application determines the user characteristic vector by combining the historical behavior information and the real-time behavior information of the user, and matches the merchant recommendation reasons under the real-time changing conditions, so that the varied and abundant recommendation reasons can be obtained. Meanwhile, the matching condition is generated based on the user information, so that different recommendation reasons can be matched for different users, the recommendation reasons are more pertinent, personalized display of the recommendation reasons is realized, the decision cost of the users is reduced, and the user experience is further improved.
Correspondingly, the application also discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to implement the mining method for the merchant recommendation reasons according to the first embodiment and the second embodiment of the application. The electronic device can be a PC, a mobile terminal, a personal digital assistant, a tablet computer and the like.
The application also discloses a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the merchant recommendation mining method as described in the first and second embodiments of the application.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the relevant points, refer to the partial description of the method embodiment.
The method and the device for mining the merchant recommendation reason provided by the application are introduced in detail, a specific example is applied in the method to explain the principle and the implementation of the application, and the description of the embodiment is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Through the above description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments.

Claims (18)

1. A method for mining merchant recommendation reasons is characterized by comprising the following steps:
determining candidate recommendation reasons and feature vectors of the candidate recommendation reasons based on user original data of a target merchant;
determining a high-quality candidate recommendation reason and an evaluation score of the high-quality candidate recommendation reason according to the feature vector of the candidate recommendation reason through a preset recommendation reason classification model;
constructing a recommendation reason pool of the target merchant based on the high-quality candidate recommendation reason and the evaluation score of the high-quality candidate recommendation reason;
wherein, after the step of constructing the recommendation reason pool of the target business based on the premium candidate recommendation reason and the evaluation score of the premium candidate recommendation reason, the method further comprises:
determining a user characteristic vector of a current user according to historical behavior information and real-time behavior information of the current user;
determining a feature vector of each recommendation reason in a recommendation reason pool of a target merchant, wherein the target merchant is a merchant recommended to the current user according to the real-time behavior of the current user;
and determining the recommendation reason of the merchant according to the similarity between the user feature vector and the feature vector of the recommendation reason.
2. The method according to claim 1, wherein the step of constructing the recommendation reason pool for the target merchant based on the premium candidate recommendation reason and the evaluation score of the premium candidate recommendation reason comprises:
screening out high-quality candidate recommendation reasons with the entity word frequency larger than a preset word frequency threshold value according to the entity word frequency characteristic value in the characteristic vector of the high-quality candidate recommendation reasons, and taking the high-quality candidate recommendation reasons as the high-quality candidate recommendation reasons of the target merchant;
taking one high-quality candidate recommendation reason with the highest evaluation score in the high-quality candidate recommendation reasons corresponding to the same group of entity words as the recommendation reason corresponding to the same group of entity words to determine the recommendation reason corresponding to each group of entity words;
and constructing a recommendation reason pool of the target merchant according to the recommendation reason corresponding to each group of entity words.
3. The method of claim 1, wherein the step of determining a candidate reason for recommendation and a feature vector of the candidate reason for recommendation based on the user-originated data of the target merchant comprises:
carrying out data processing on user original data of a target merchant to obtain a plurality of clauses, wherein each clause corresponds to a candidate recommendation reason;
determining a feature vector of each candidate recommendation reason based on a preset reference word bank;
the reference word bank is obtained by cleaning data of a preset word bank by combining an emotion analysis method and a word frequency screening method based on the user original data of the target merchant; the reference word bank comprises a front idiom, a degree word, a high-level evaluation word and a common evaluation word.
4. The method of claim 1, wherein the feature vector of the recommendation reason comprises any one or more of the following dimensions: syntax structure, whether a language word is contained, a sentence text score, whether a common evaluation word is contained, the number of common evaluation words, whether a high-level evaluation word is contained, the number of high-level evaluation words, whether a level word is contained, the number of level words, whether an idiom is contained, the number of idioms, an emotion score, a comment score, whether a merchant descriptor is contained, the number of merchant descriptors, the weight of merchant descriptors, whether an entity exists, the number of entities, the word frequency of the entity words, whether a viewpoint exists, the number of viewpoints and the viewpoint score.
5. The method of claim 4, wherein the comment score is given by the formula:
the reviewScore is calculated by (log (follows + hits + voteGoods +1) + isQuality T + star) time score, wherein follows represents the number of scores, hits represents the number of clicks, voteGoods represents the number of likes, isQuality represents whether the comment is a good comment, T represents a good comment adjustment factor, star represents a comment star, W represents a comment star adjustment factor, and time score represents a time decay factor, and is obtained by the formula time score (3650-x)/3650, wherein x represents the number of days from the evaluation publication time to the current time.
6. The method of claim 4, wherein the word frequency of the entity word is a maximum value of the word frequencies of the entity words included in the current clause.
7. The method of claim 1, wherein the step of determining the feature vector of each recommendation reason in the recommendation reason pool of the target merchant comprises:
and acquiring a feature vector of the entity word through a preset word vector model according to the entity word corresponding to the recommendation reason in the target merchant recommendation reason pool, wherein the feature vector is used as the feature vector of the recommendation reason corresponding to the entity word.
8. The method of claim 1, wherein the step of determining the user feature vector of the current user according to the historical behavior information and the real-time behavior information of the current user comprises:
determining a historical behavior feature vector of the current user according to a word vector of a keyword of the historical behavior information of the current user, which is obtained through a preset word vector model; determining the real-time behavior feature vector of the current user according to the word vector of the keyword of the real-time behavior information of the current user, which is obtained through the preset word vector model;
determining a user characteristic vector of the current user by performing weighted summation on the historical behavior characteristic vector and the real-time behavior characteristic vector;
wherein the real-time behavior information comprises: behavioral intent and/or behavioral scenarios.
9. A mining apparatus for a merchant recommendation reason, comprising:
the candidate recommendation reason and feature vector determining module is used for determining candidate recommendation reasons and feature vectors of the candidate recommendation reasons based on user original data of the target merchants;
the candidate recommendation reason set evaluation score determining module is used for determining a high-quality candidate recommendation reason and an evaluation score of the high-quality candidate recommendation reason according to a feature vector of the candidate recommendation reason through a preset recommendation reason classification model;
the recommendation reason pool building module is used for building a recommendation reason pool of the target merchant based on the high-quality candidate recommendation reason and the evaluation score of the high-quality candidate recommendation reason;
wherein the apparatus further comprises:
the user characteristic vector determining module is used for determining the user characteristic vector of the current user according to the historical behavior information and the real-time behavior information of the current user;
the recommendation reason feature vector determination module is used for determining a recommendation reason feature vector of each recommendation reason in a recommendation reason pool of a target merchant, wherein the target merchant is a merchant recommended to the current user according to the real-time behavior of the current user;
and the recommendation reason mining module is used for determining the recommendation reason of the merchant according to the similarity between the user feature vector determined by the user feature vector determining module and the recommendation reason feature vector determined by the recommendation reason feature vector determining module.
10. The apparatus of claim 9, wherein the recommendation reason pool construction module further comprises:
the screening submodule is used for screening out a high-quality candidate recommendation reason of which the entity word frequency is greater than a preset word frequency threshold value according to the entity word frequency characteristic value in the characteristic vector of the high-quality candidate recommendation reason, and the high-quality candidate recommendation reason is used as the high-quality candidate recommendation reason of the target merchant;
the merging submodule is used for taking one high-quality candidate recommendation reason with the highest evaluation score in the high-quality candidate recommendation reasons corresponding to the same group of entity words as the recommendation reason corresponding to the same group of entity words so as to determine the recommendation reason corresponding to each group of entity words;
and the recommendation reason pool constructing submodule is used for constructing the recommendation reason pool of the target merchant according to the recommendation reason corresponding to each group of entity words.
11. The apparatus of claim 9, wherein the candidate reason for recommendation and feature vector determination module further comprises:
the candidate recommendation reason mining submodule is used for carrying out data processing on user original data of the target merchant to obtain a plurality of clauses, wherein each clause corresponds to one candidate recommendation reason;
the candidate recommendation reason feature vector determining submodule is used for determining the feature vector of each candidate recommendation reason based on a preset reference word bank;
the reference word bank is obtained by cleaning data of a preset word bank by combining an emotion analysis method and a word frequency screening method based on the user original data of the target merchant; the reference word bank comprises a front idiom, a degree word, a high-level evaluation word and a common evaluation word.
12. The apparatus of claim 9, wherein the feature vector of the recommendation reason comprises any one or more of the following dimensions: syntax structure, whether a language word is contained, a sentence text score, whether a common evaluation word is contained, the number of common evaluation words, whether a high-level evaluation word is contained, the number of high-level evaluation words, whether a level word is contained, the number of level words, whether an idiom is contained, the number of idioms, an emotion score, a comment score, whether a merchant descriptor is contained, the number of merchant descriptors, the weight of merchant descriptors, whether an entity exists, the number of entities, the word frequency of the entity words, whether a viewpoint exists, the number of viewpoints and the viewpoint score.
13. The apparatus of claim 12, wherein the comment score is given by the formula:
the reviewScore is calculated by (log (follows + hits + voteGoods +1) + isQuality T + star) time score, wherein follows represents the number of scores, hits represents the number of clicks, voteGoods represents the number of likes, isQuality represents whether the comment is a good comment, T represents a good comment adjustment factor, star represents a comment star, W represents a comment star adjustment factor, and time score represents a time decay factor, and is obtained by the formula time score (3650-x)/3650, wherein x represents the number of days from the evaluation publication time to the current time.
14. The apparatus of claim 12, wherein the word frequency of the entity word is a maximum value of the word frequency of each entity word included in the current clause.
15. The apparatus of claim 9, wherein the recommendation reason feature vector determination module is further configured to:
and acquiring a feature vector of the entity word through a preset word vector model according to the entity word corresponding to the recommendation reason in the target merchant recommendation reason pool, wherein the feature vector is used as the feature vector of the recommendation reason corresponding to the entity word.
16. The apparatus of claim 9, wherein the user feature vector determination module is further configured to:
determining a historical behavior feature vector of the current user according to a word vector of a keyword of the historical behavior information of the current user, which is obtained through a preset word vector model; determining the real-time behavior feature vector of the current user according to the word vector of the keyword of the real-time behavior information of the current user, which is obtained through the preset word vector model;
determining a user characteristic vector of the current user by performing weighted summation on the historical behavior characteristic vector and the real-time behavior characteristic vector;
wherein the real-time behavior information comprises: behavioral intent and/or behavioral scenarios.
17. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for mining the merchant recommendation according to any one of claims 1 to 8 when executing the computer program.
18. A computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the steps of the mining method for merchant recommendation reasons according to any one of claims 1 to 8.
CN201810447255.0A 2018-05-11 2018-05-11 Method and device for mining merchant recommendation reason and electronic equipment Active CN108694647B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810447255.0A CN108694647B (en) 2018-05-11 2018-05-11 Method and device for mining merchant recommendation reason and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810447255.0A CN108694647B (en) 2018-05-11 2018-05-11 Method and device for mining merchant recommendation reason and electronic equipment

Publications (2)

Publication Number Publication Date
CN108694647A CN108694647A (en) 2018-10-23
CN108694647B true CN108694647B (en) 2021-04-23

Family

ID=63847372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810447255.0A Active CN108694647B (en) 2018-05-11 2018-05-11 Method and device for mining merchant recommendation reason and electronic equipment

Country Status (1)

Country Link
CN (1) CN108694647B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7176443B2 (en) * 2019-03-11 2022-11-22 トヨタ自動車株式会社 Recommendation statement generation device, recommendation statement generation method, and recommendation statement generation program
CN109961357B (en) * 2019-03-25 2021-09-03 上海拉扎斯信息科技有限公司 User data processing method and device, electronic equipment and storage medium
CN110147499B (en) * 2019-05-21 2021-09-14 智者四海(北京)技术有限公司 Labeling method, recommendation method and recording medium
CN110457460A (en) * 2019-06-20 2019-11-15 拉扎斯网络科技(上海)有限公司 Text recommended method, device, server and storage medium
CN110852846A (en) * 2019-11-11 2020-02-28 京东数字科技控股有限公司 Processing method and device for recommended object, electronic equipment and storage medium
CN111046138B (en) * 2019-11-15 2023-06-27 北京三快在线科技有限公司 Recommendation reason generation method and device, electronic equipment and storage medium
CN111125544A (en) * 2019-12-20 2020-05-08 腾讯数码(天津)有限公司 User recommendation method and device
CN113111264B (en) * 2021-06-15 2021-09-07 深圳追一科技有限公司 Interface content display method and device, electronic equipment and storage medium
CN113688335B (en) * 2021-07-23 2023-09-01 北京三快在线科技有限公司 Ranking reason generation method, device, electronic equipment and storage medium
CN116740525B (en) * 2023-08-16 2023-10-31 南京迅集科技有限公司 Intelligent manufacturing quality management method based on data fusion

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776568A (en) * 2016-12-26 2017-05-31 成都康赛信息技术有限公司 Based on the rationale for the recommendation generation method that user evaluates
CN107944911A (en) * 2017-11-18 2018-04-20 电子科技大学 A kind of recommendation method of the commending system based on text analyzing

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4465564B2 (en) * 2000-02-28 2010-05-19 ソニー株式会社 Voice recognition apparatus, voice recognition method, and recording medium
CN103246672B (en) * 2012-02-09 2016-06-08 中国科学技术大学 User is carried out method and the device of personalized recommendation
CN104572851B (en) * 2014-12-16 2018-09-07 北京百度网讯科技有限公司 The method and apparatus for obtaining recommendation information
CN107577759B (en) * 2017-09-01 2021-07-30 安徽广播电视大学 Automatic recommendation method for user comments

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776568A (en) * 2016-12-26 2017-05-31 成都康赛信息技术有限公司 Based on the rationale for the recommendation generation method that user evaluates
CN107944911A (en) * 2017-11-18 2018-04-20 电子科技大学 A kind of recommendation method of the commending system based on text analyzing

Also Published As

Publication number Publication date
CN108694647A (en) 2018-10-23

Similar Documents

Publication Publication Date Title
CN108694647B (en) Method and device for mining merchant recommendation reason and electronic equipment
US10748164B2 (en) Analyzing sentiment in product reviews
CN108628833B (en) Method and device for determining summary of original content and method and device for recommending original content
He et al. Trirank: Review-aware explainable recommendation by modeling aspects
US9489688B2 (en) Method and system for recommending search phrases
CN107544988B (en) Method and device for acquiring public opinion data
CN108280124B (en) Product classification method and device, ranking list generation method and device, and electronic equipment
CN104794154B (en) Medical instrument O2O service quality evaluation model based on text mining
EP3189449A2 (en) Sentiment rating system and method
CN107944911B (en) Recommendation method of recommendation system based on text analysis
CN111400507B (en) Entity matching method and device
WO2015135110A1 (en) Systems and methods for keyword suggestion
CN107133282B (en) Improved evaluation object identification method based on bidirectional propagation
CN110955750A (en) Combined identification method and device for comment area and emotion polarity, and electronic equipment
CN111339439A (en) Collaborative filtering recommendation method and device fusing comment text and time sequence effect
Kiran et al. User specific product recommendation and rating system by performing sentiment analysis on product reviews
KR101712291B1 (en) System for recommending a user-customized famous place based on opinion mining and Method of the Same
JP2022035314A (en) Information processing unit and program
CN109670922B (en) Online book value discovery method based on mixed features
KR101074820B1 (en) Recommendation searching system using internet and method thereof
WO2019242453A1 (en) Information processing method and device, storage medium, and electronic device
Chaurasiya et al. Improving performance of product recommendations using user reviews
CN107291686B (en) Method and system for identifying emotion identification
CN111625619B (en) Query omission method, device, computer readable medium and electronic equipment
CN108694171B (en) Information pushing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant