CN111475719B - Information pushing method and device based on data mining and storage medium - Google Patents

Information pushing method and device based on data mining and storage medium Download PDF

Info

Publication number
CN111475719B
CN111475719B CN202010239972.1A CN202010239972A CN111475719B CN 111475719 B CN111475719 B CN 111475719B CN 202010239972 A CN202010239972 A CN 202010239972A CN 111475719 B CN111475719 B CN 111475719B
Authority
CN
China
Prior art keywords
user group
information
user
characteristic vector
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010239972.1A
Other languages
Chinese (zh)
Other versions
CN111475719A (en
Inventor
韦炳田
陈健
李福宇
高宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Finance Technology Co Ltd
Original Assignee
China Merchants Finance Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Finance Technology Co Ltd filed Critical China Merchants Finance Technology Co Ltd
Priority to CN202010239972.1A priority Critical patent/CN111475719B/en
Publication of CN111475719A publication Critical patent/CN111475719A/en
Application granted granted Critical
Publication of CN111475719B publication Critical patent/CN111475719B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an artificial intelligence technology, and discloses an information pushing method based on data mining, which comprises the following steps: acquiring a product information set and a user group information set; extracting the characteristics of the user group information set to obtain a user group characteristic vector set, and converting the user group characteristic vector set by using a preset conversion algorithm to obtain a user group mark characteristic vector set; acquiring a target characteristic vector of a target user, and calculating a distance value between the target characteristic vector and a user group sign characteristic vector in the user group sign characteristic vector set; and when the distance value is smaller than a preset distance, obtaining the information of the product corresponding to the user group mark feature vector from the product information set, and pushing the information to the target user. The invention also provides an information pushing device based on data mining, electronic equipment and a computer readable storage medium. The invention can solve the problems of high cost and low accuracy in the product recommendation process.

Description

Information pushing method and device based on data mining and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an information pushing method and device based on data mining, electronic equipment and a computer readable storage medium.
Background
With the rise of big data and artificial intelligence, the demand for recommending products to potential customers is increasing, but because the number of products on the market is large and the customer group is large, the potential customers cannot be accurately found and the products suitable for the potential customers are recommended to the customers.
Most of the existing product recommendation methods are complex and occupy computing resources, and in the product recommendation process, matching of products and potential customers applicable to the products is lacked, so that the product recommendation process is high in cost and low in accuracy.
Disclosure of Invention
The invention provides an information pushing method and device based on data mining, electronic equipment and a computer readable storage medium, and mainly aims to solve the problems of high cost and low accuracy in a product recommendation process.
In order to achieve the above object, the information pushing method based on data mining provided by the present invention includes:
acquiring information of at least two products and information of at least two user groups corresponding to the at least two products to obtain a product information set and a user group information set;
extracting the characteristics of the user group information set to obtain a user group characteristic vector set, and converting the user group characteristic vector set by using a preset conversion algorithm to obtain a user group mark characteristic vector set;
acquiring a target characteristic vector of a target user, and calculating a distance value between the target characteristic vector and a user group sign characteristic vector in the user group sign characteristic vector set;
and when the distance value is smaller than a preset distance, obtaining the information of the product corresponding to the user group mark feature vector from the product information set, and pushing the information to the target user.
Optionally, the performing feature extraction on the user group information set to obtain a user group feature vector set includes:
randomly grouping the information of the user groups in the user group information set to obtain a grouping result, wherein the grouping comprises at least two groups and the information of the user groups contained in the at least two groups;
performing score calculation on the grouping result according to a preset grouping scoring template to obtain a score result, wherein the score result comprises scores of at least two groups;
comparing the scores of at least two groups in the score result with a preset score threshold respectively;
regrouping groups having a score less than or equal to the score threshold;
carrying out similarity calculation on the information of the user group contained in the group with the score larger than the preset score threshold value and the sampling features in the preset sampling user feature set to obtain a similarity value;
and acquiring a sampling user feature set when the similarity value is greater than a preset similarity threshold value, and determining the sampling user feature set as the user group feature vector set.
Optionally, before the feature extraction is performed on the user group information set, the method further includes:
carrying out exception removal processing on the user group information set to obtain a user group information initial set;
and carrying out normalization processing on the initial set of the user group information to obtain a processed user group information set.
Optionally, the performing exception removal processing on the user group information set to obtain an initial set of user group information includes:
carrying out numerical processing on the user group information set to obtain a numerical set;
screening the numerical value set through a threshold interval to obtain an abnormal numerical value set and a normal numerical value set;
calculating the average value of the normal numerical value set, and replacing the data in the abnormal numerical value set by the average value to obtain a modified abnormal numerical value set;
and determining the set of the normal numerical value set and the abnormal numerical value correction set as the initial set of the user group information.
Optionally, the normalizing the initial set of user group information to obtain a processed user group information set includes:
normalizing the user information in the initial user data set by using a normalization algorithm x = (x-mu)/sigma to obtain a processed user group information set x;
wherein, x is the processed user group information set, μ is the mean value of all initial user group information in the user group information initial set, and σ is the variance of all initial user group information in the user group information initial set.
Optionally, the method further comprises:
acquiring a median q in the value set;
of said median
Figure GDA0004068202640000031
As a lower bound ^ of the median>
Figure GDA0004068202640000032
As an upper bound, the threshold interval marked with the lower bound and the upper bound is found->
Figure GDA0004068202640000033
Wherein n is>m, n and m are preset constants.
Optionally, the calculating a distance value between the target feature vector and a user group identity feature vector in the user group identity feature vector set includes:
calculating the distance value between the target feature vector and the user group sign feature vector in the user group sign feature vector set by using the following distance algorithm:
Figure GDA0004068202640000034
wherein L (X, Y) is the distance value, X is the target feature vector, Y i And marking the characteristic vectors for the user groups in the user group mark characteristic vector set.
In order to solve the above problems, the present invention also provides a product recommendation apparatus, comprising:
the information acquisition module is used for acquiring information of at least two products and information of at least two user groups corresponding to the at least two products to obtain a product information set and a user group information set;
the characteristic extraction module is used for extracting the characteristics of the user group information set to obtain a user group characteristic vector set, and converting the user group characteristic vector set by using a preset conversion algorithm to obtain a user group mark characteristic vector set;
the distance calculation module is used for acquiring a target characteristic vector of a target user and calculating a distance value between the target characteristic vector and a user group sign characteristic vector in the user group sign characteristic vector set;
and the information pushing module is used for acquiring the information of the product corresponding to the user group mark feature vector from the product information set when the distance value is smaller than a preset distance, and pushing the information to the target user.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the information pushing method based on data mining.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, which stores at least one instruction, where the at least one instruction is executed by a processor in an electronic device to implement the data mining-based information pushing method according to any one of the above.
According to the embodiment of the invention, the product information set and the user group information set are obtained by acquiring the information of at least two products and the information of at least two user groups corresponding to the at least two products, so that the follow-up accurate product recommendation based on the user groups is facilitated; meanwhile, feature extraction is carried out on the user group information set to obtain a user group feature vector set, the user group feature vector set is converted by using a preset conversion algorithm to obtain a user group mark feature vector set, and features are extracted for processing, so that the data volume is reduced, and the occupation of computing resources is reduced; furthermore, a target characteristic vector of a target user is obtained, a distance value between the target characteristic vector and the user group mark characteristic vector in the user group mark characteristic vector set is calculated, the target user is matched with an applicable product by using the distance value, and the precision of product information recommendation is improved. Therefore, the information pushing method, the information pushing device and the computer readable storage medium based on data mining can realize high-precision product information recommendation.
Drawings
Fig. 1 is a schematic flowchart of an information pushing method based on data mining according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of an information pushing apparatus based on data mining according to an embodiment of the present invention;
fig. 3 is a schematic internal structural diagram of an electronic device implementing an information pushing method based on data mining according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of acquiring an initial set of user group information according to an information pushing method based on data mining according to an embodiment of the present invention;
fig. 5 is a schematic flowchart of a process of obtaining a user group feature vector set according to an information pushing method based on data mining according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an information pushing method based on data mining. Fig. 1 is a schematic flow chart of an information pushing method based on data mining according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the information pushing method based on data mining includes:
s1, obtaining information of at least two products and information of at least two user groups corresponding to the at least two products to obtain a product information set and a user group information set.
The embodiment of the invention retrieves the product information of the product stored in the database and the user group information of the user who purchases the product, such as the price of the product, the quality of the product, the age of the user, the sex of the user and the like, from a preset customer database.
Further, the customer database may be a database used by any company to store the product information and the user group information, such as msql database, oracle database, and the like.
After product information and the family group information are retrieved from the customer database, the retrieved product information is collected to obtain a product information set; and meanwhile, collecting the retrieved user group information to obtain the user group information set.
S2, extracting the features of the user group information set to obtain a user group feature vector set, and converting the user group feature vector set by using a preset conversion algorithm to obtain a user group mark feature vector set.
Further, when the acquired user group information in the user group information set may have data abnormalities such as errors and deletions, the user group information in the user group information set needs to be preprocessed.
In detail, before the feature extraction is performed on the user group information set, the method further includes:
carrying out exception removal processing on the user group information set to obtain a user group information initial set;
and carrying out normalization processing on the initial set of the user group information to obtain a processed user group information set.
Further, the process of performing exception removal processing on the user group information set to obtain the initial set of user group information is shown in fig. 4, and includes:
s20, carrying out numerical processing on the user group information set to obtain a numerical set;
s21, screening the numerical value set through a threshold interval to obtain an abnormal numerical value set and a normal numerical value set;
s22, calculating an average value of the normal numerical value set, and replacing data in the abnormal numerical value set with the average value to obtain a modified abnormal numerical value set;
and S23, determining that the set of the normal numerical value set and the abnormal numerical value correction set is the initial set of the user group information.
In the embodiment of the present invention, it is assumed that the collected user group information in the user group information set includes the gender of the user, and the gender of the user is non-numerical data, and in order to better process the user group information set in the following, the non-numerical data needs to be digitized, for example, when the gender of the user is male, the gender of the user is digitized to 1, and when the gender of the user is female, the gender of the user is digitized to 0; and obtaining the numerical value set after all the user group information in the user group information set is numerically numerical value information.
Further, the method further comprises:
acquiring a median q in the value set;
of said median
Figure GDA0004068202640000061
As a lower bound ^ of the median>
Figure GDA0004068202640000062
As an upper bound, the threshold interval identified by the lower bound and the upper bound is taken>
Figure GDA0004068202640000063
Wherein n is>m, n and m are preset constants.
After obtaining the threshold interval, utilizing the threshold interval
Figure GDA0004068202640000064
Screening the numerical value information in the numerical value set, and obtaining the numerical value in the threshold value intervalAnd collecting information to obtain the normal value set, and collecting numerical value information outside the threshold interval to obtain the abnormal value set.
Further, in the embodiment of the present invention, an average of the normal value set is calculated, and the numerical information in the abnormal value set is replaced with the average, so as to correct the abnormal value set, obtain the corrected abnormal value set, and determine that the set of the normal value set and the corrected abnormal value set is the initial set of the user group information.
Through the steps, the wrong information in the user group information set is corrected, so that the subsequent processing is facilitated, and a more accurate result can be obtained.
Further, in order to make the data in the initial user group information set more comparable, the embodiment of the present invention performs normalization processing on the initial user group information set by using the following normalization algorithm, so as to obtain a processed user group information set x:
x=(x-μ)/σ
wherein, x is the processed user group information set, μ is the mean value of all initial user group information in the user group information initial set, and σ is the variance of all initial user group information in the user group information initial set.
In this embodiment, after the normalization processing is performed on the initial set of user group information to obtain a processed user group information set, feature extraction is performed to obtain a user group feature vector set, and feature extraction is performed to obtain a user group feature vector set, as shown in fig. 5.
Further, the performing feature extraction on the user group information set to obtain a user group feature vector set includes:
s200, randomly grouping the information of the user groups in the user group information set to obtain a grouping result, wherein the grouping comprises at least two groups and the information of the user groups contained in the at least two groups;
s201, performing score calculation on the grouping result according to a preset grouping scoring template to obtain a score result, wherein the score result comprises scores of at least two groups;
s202, judging whether the scores of different groups in the score result are all larger than a preset score threshold value;
returning to execute the S200 for the group with the score less than or equal to the score threshold value, and regrouping;
s203, carrying out similarity calculation on the information of the user group contained in the group with the score larger than the preset score threshold value and the sampling features in the preset sampling user feature set to obtain a similarity value;
s204, acquiring a sampling user feature set when the similarity value is larger than a preset similarity threshold value, and determining the sampling user feature set as the user group feature vector set.
In detail, the grouping scoring template is made based on user information of users of historically different products.
The score calculation of the grouping result according to a preset grouping scoring template to obtain a score result comprises the following steps:
and performing score calculation on the grouping result by using the following score calculation formula to obtain a score result F:
F=def{f 1 ,f 2 }
wherein f is 1 Dividing the template for said standard, f 2 Is the grouping result.
Further, the sampling user feature set is a set of feature information which is extracted in advance according to user information of users of different products historically and represents the users.
In detail, the calculating the similarity between the information of the user group included in the group with the score greater than the preset score threshold and the sampling feature in the preset sampling user feature set to obtain the similarity value includes:
calculating the similarity value using the following similarity calculation formula:
Sim=a[log(α-β)]
wherein Sim is the similarity value, a is a preset constant representing an error factor, α is a sampling feature in the sampling user feature set, and β is information of a user group included in a group whose score is greater than the preset score threshold.
Further, the similarity value is compared with a preset similarity threshold value for judgment, and if the similarity value is smaller than or equal to the similarity threshold value and indicates that the information of the user group does not accord with the sampling feature, similarity calculation is carried out again on the information of the user group and the new sampling feature;
if the similarity value is larger than the similarity threshold value, which indicates that the information of the user group is in accordance with the sampling features, collecting the sampling features corresponding to the information of the user group to obtain the user group feature vector set.
By using the method to obtain the user group feature vector set, the accurate matching of the user and the sampling feature can be realized, and the target user can be conveniently and accurately recommended with a suitable product.
Further, in the embodiment of the present invention, the user group feature vectors in the user group feature vector set are converted into the user group identifier feature vector f (x) by using the following conversion algorithm:
Figure GDA0004068202640000081
wherein i is the number of the user group feature vectors in the user group feature vector set, b i And the user group feature vectors are set for the user group feature vectors.
And after all the user group feature vectors in the user group feature vector set are converted into the user group mark feature vectors, collecting the user group mark feature vectors into the user group mark feature vector set.
By utilizing the steps, each user group feature vector in the user group feature vector set is converted to be used as the overall feature of the user group, so that the subsequent calculation amount is conveniently reduced, and the occupation of a calculation memory is reduced.
S3, obtaining a target characteristic vector of a target user, and calculating a distance value between the target characteristic vector and the user group sign characteristic vector in the user group sign characteristic vector set.
In detail, the method for obtaining the target feature vector of the target user is consistent with the method for obtaining the user group mark feature vector.
The target user may be a user in any one of at least two user groups, or may be a user in another user group.
Further, the calculating a distance value between the target feature vector and a user group identity feature vector in the user group identity feature vector set includes:
calculating the distance value between the target feature vector and the user group sign feature vector in the user group sign feature vector set by using the following distance algorithm:
Figure GDA0004068202640000091
wherein L (X, Y) is the distance value, X is the target feature vector, Y i And marking the feature vectors for the user groups in the feature vector set for the user groups.
By using the method for calculating the distance value, the similarity degree of the target characteristic vector and the user group mark characteristic vectors in the user group mark characteristic vector set can be displayed more intuitively, and the method is favorable for subsequently recommending products according to the calculated distance value.
And S4, when the distance value is smaller than a preset distance, obtaining the information of the product corresponding to the user group mark feature vector from the product information set, and pushing the information to the target user.
Further, when the distance value is greater than or equal to a preset distance threshold value, reselecting a user group mark feature vector from the user group mark feature vector set to calculate the distance value;
and when the distance value is smaller than the preset threshold value, obtaining the information of the product corresponding to the user group mark feature vector from the product information set, pushing the information to the target user, and recommending the information of the product to the user in a short message, telephone and other modes.
According to the embodiment of the invention, the product information set and the user group information set are obtained by acquiring the information of at least two products and the information of at least two user groups corresponding to the at least two products, so that the follow-up accurate product recommendation based on the user groups is facilitated; meanwhile, feature extraction is carried out on the user group information set to obtain a user group feature vector set, the user group feature vector set is converted by using a preset conversion algorithm to obtain a user group mark feature vector set, and features are extracted for processing, so that the data volume is reduced, and the occupation of computing resources is reduced; further, a target characteristic vector of a target user is obtained, a distance value between the target characteristic vector and the user group mark characteristic vector in the user group mark characteristic vector set is calculated, the target user is matched with an applicable product by using the distance value, and the precision of product information recommendation is improved.
Fig. 2 is a functional block diagram of the product recommendation device of the present invention.
The product recommendation device 100 of the present invention may be installed in an electronic device. According to the realized functions, the product recommendation device may include an information acquisition module 101, a feature extraction module 102, a distance calculation module 103, and an information push module 104. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the information acquisition module 101 is configured to acquire information of at least two products and information of at least two user groups corresponding to the at least two products, so as to obtain a product information set and a user group information set;
the feature extraction module 102 is configured to perform feature extraction on the user group information set to obtain a user group feature vector set, and convert the user group feature vector set by using a preset conversion algorithm to obtain a user group identification feature vector set;
the distance calculation module 103 is configured to obtain a target feature vector of a target user, and calculate a distance value between the target feature vector and a user group identity feature vector in the user group identity feature vector set;
the information pushing module 104 is configured to, when the distance value is smaller than a preset distance, obtain information of a product corresponding to the user group identifier feature vector from the product information set, and push the information to the target user.
In detail, the specific implementation steps of each module of the product recommendation device are as follows:
the information obtaining module 101 obtains information of at least two products and information of at least two user groups corresponding to the at least two products to obtain a product information set and a user group information set.
The embodiment of the invention retrieves the product information of the product stored in the database and the user group information of the user who purchases the product, such as the price of the product, the quality of the product, the age of the user, the sex of the user and the like, from a preset customer database.
Further, the customer database may be a database used by any company to store the product information and the user group information, such as msql database, oracle database, and the like.
After product information and the family group information are retrieved from the customer database, the retrieved product information is collected to obtain a product information set; meanwhile, the retrieved user group information is collected to obtain the user group information set.
The feature extraction module 102 performs feature extraction on the user group information set to obtain a user group feature vector set, and converts the user group feature vector set by using a preset conversion algorithm to obtain a user group sign feature vector set.
Further, when the acquired user group information in the user group information set may have data abnormalities such as errors and deletions, the user group information in the user group information set needs to be preprocessed.
In detail, before the feature extraction is performed on the user group information set, the method further includes:
carrying out exception removal processing on the user group information set to obtain a user group information initial set;
and carrying out normalization processing on the initial set of the user group information to obtain a processed user group information set.
Further, the performing exception removal processing on the user group information set to obtain an initial set of user group information includes:
carrying out numerical processing on the user group information set to obtain a numerical set;
screening the numerical value set through a threshold interval to obtain an abnormal numerical value set and a normal numerical value set;
calculating the average value of the normal numerical value set, and replacing the data in the abnormal numerical value set by the average value to obtain a modified abnormal numerical value set;
and determining the set of the normal numerical value set and the abnormal numerical value correction set as the initial set of the user group information.
In the embodiment of the present invention, it is assumed that the collected user group information in the user group information set includes the gender of the user, and the gender of the user is non-numerical data, and in order to better process the user group information set in the following, the non-numerical data needs to be digitized, for example, when the gender of the user is male, the gender of the user is digitized to 1, and when the gender of the user is female, the gender of the user is digitized to 0; and obtaining the numerical value set after all the user group information in the user group information set is numerically numerical value information.
Further, the method further comprises:
acquiring a median q in the value set;
of said median
Figure GDA0004068202640000111
As a lower bound ^ of the median>
Figure GDA0004068202640000112
As an upper bound, the threshold interval marked with the lower bound and the upper bound is found->
Figure GDA0004068202640000113
Wherein n is>m, n and m are preset constants.
After obtaining the threshold interval, utilizing the threshold interval
Figure GDA0004068202640000114
And screening the numerical value information in the numerical value set, collecting the numerical value information in the threshold value interval to obtain the normal numerical value set, and collecting the numerical value information outside the threshold value interval to obtain the abnormal numerical value set.
Further, in the embodiment of the present invention, an average of the normal value set is calculated, and the numerical information in the abnormal value set is replaced with the average, so as to correct the abnormal value set, obtain the corrected abnormal value set, and determine that the set of the normal value set and the corrected abnormal value set is the initial set of the user group information.
Further, in order to make the data in the initial user group information set more comparable, the embodiment of the present invention performs normalization processing on the initial user group information set by using the following normalization algorithm, so as to obtain a processed user group information set x:
x=(x-μ)/σ
wherein, x is the processed user group information set, μ is the mean value of all initial user group information in the user group information initial set, and σ is the variance of all initial user group information in the user group information initial set.
In this embodiment, after the normalization processing is performed on the initial set of user group information, a processed user group information set is obtained, and then, when feature extraction is performed to obtain a user group feature vector set, and feature extraction is performed to obtain a user group feature vector set.
Further, the performing feature extraction on the user group information set to obtain a user group feature vector set includes:
randomly grouping the information of the user groups in the user group information set to obtain a grouping result, wherein the grouping comprises at least two groups and the information of the user groups contained in the at least two groups;
calculating scores of the grouping results according to a preset grouping scoring template to obtain score results, wherein the score results comprise scores of at least two groups;
judging whether the scores of different groups in the score result are all larger than a preset score threshold value;
regrouping for groups with scores less than or equal to the score threshold;
carrying out similarity calculation on the information of the user group contained in the group with the score larger than the preset score threshold value and the sampling features in the preset sampling user feature set to obtain a similarity value;
and acquiring a sampling user feature set when the similarity value is greater than a preset similarity threshold value, and determining the sampling user feature set as the user group feature vector set.
In detail, the grouping scoring template is made according to user information of users of historically different products.
The score calculation of the grouping result according to a preset grouping scoring template to obtain a score result comprises the following steps:
and performing score calculation on the grouping result by using the following score calculation formula to obtain a score result F:
F=def{f 1 ,f 2 }
wherein f is 1 Partitioning the template for said standard, f 2 Is the grouping result.
Further, the sampling user feature set is a set of feature information which is extracted in advance according to user information of users of different products historically and represents the users.
In detail, the calculating the similarity between the information of the user group included in the group with the score greater than the preset score threshold and the sampling feature in the preset sampling user feature set to obtain the similarity value includes:
calculating the similarity value using the following similarity calculation formula:
Sim=a[log(α-β)]
wherein Sim is the similarity value, a is a preset constant representing an error factor, α is a sampling feature in the sampling user feature set, and β is information of a user group included in a group whose score is greater than the preset score threshold.
Further, the similarity value is compared with a preset similarity threshold value for judgment, and if the similarity value is smaller than or equal to the similarity threshold value and indicates that the information of the user group does not accord with the sampling feature, similarity calculation is carried out again on the information of the user group and the new sampling feature;
if the similarity value is larger than the similarity threshold value, which indicates that the information of the user group is in accordance with the sampling features, collecting the sampling features corresponding to the information of the user group to obtain the user group feature vector set.
By using the method to obtain the user group feature vector set, the accurate matching of the user and the sampling feature can be realized, and the target user can be conveniently and accurately recommended with a suitable product.
Further, in the embodiment of the present invention, the user group feature vectors in the user group feature vector set are converted into the user group identifier feature vector f (x) by using the following conversion algorithm:
Figure GDA0004068202640000131
wherein i is the number of the user group feature vectors in the user group feature vector set, b i And the user group feature vectors are set for the user group feature vectors.
And after all the user group feature vectors in the user group feature vector set are converted into the user group mark feature vectors, collecting the user group mark feature vectors into the user group mark feature vector set.
By utilizing the steps, each user group feature vector in the user group feature vector set is converted to be used as the overall feature of the user group, so that the subsequent calculation amount is conveniently reduced, and the occupation of a calculation memory is reduced.
The distance calculation module 103 obtains a target feature vector of a target user, and calculates a distance value between the target feature vector and a user group sign feature vector in the user group sign feature vector set.
In detail, the method for obtaining the target feature vector of the target user is consistent with the method for obtaining the user group mark feature vector.
The target user may be a user in any one of at least two user groups, or may be a user in another user group.
Further, the calculating a distance value between the target feature vector and a user group identity feature vector in the user group identity feature vector set includes:
calculating the distance value between the target characteristic vector and each product characteristic vector in the product characteristic vector set by using the following distance algorithm:
Figure GDA0004068202640000141
wherein L (X, Y) is the distance value, X is the target feature vector, Y i Marking the user group marks in the feature vector set for the user groupAnd (5) recording the feature vector.
By using the method for calculating the distance value, the similarity degree of the target characteristic vector and the user group mark characteristic vectors in the user group mark characteristic vector set can be displayed more intuitively, and the method is favorable for subsequently recommending products according to the calculated distance value.
And the information pushing module 104 is used for acquiring the information of the product corresponding to the user group mark feature vector from the product information set when the distance value is smaller than a preset distance, and pushing the information to the target user.
Further, when the distance value is larger than or equal to a preset distance threshold value, reselecting the user group mark feature vector from the user group mark feature vector set to calculate the distance value;
and when the distance value is smaller than the preset threshold value, obtaining the information of the product corresponding to the user group mark feature vector from the product information set, pushing the information to the target user, and recommending the product to the user in a short message mode, a telephone mode and the like.
Fig. 3 is a schematic structural diagram of an electronic device for implementing an information pushing method based on data mining according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a resource scheduler, etc., but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing an information push program based on data mining, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The product recommendation program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
acquiring information of at least two products and information of at least two user groups corresponding to the at least two products to obtain a product information set and a user group information set;
extracting the characteristics of the user group information set to obtain a user group characteristic vector set, and converting the user group characteristic vector set by using a preset conversion algorithm to obtain a user group mark characteristic vector set;
acquiring a target characteristic vector of a target user, and calculating a distance value between the target characteristic vector and a user group sign characteristic vector in the user group sign characteristic vector set;
and when the distance value is smaller than a preset distance, obtaining the information of the product corresponding to the user group mark feature vector from the product information set, and pushing the information to the target user.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. An information pushing method based on data mining, characterized in that the method comprises:
acquiring information of at least two products and information of at least two user groups corresponding to the at least two products to obtain a product information set and a user group information set;
extracting the characteristics of the user group information set to obtain a user group characteristic vector set, and converting the user group characteristic vector set by using a preset conversion algorithm to obtain a user group mark characteristic vector set;
acquiring a target characteristic vector of a target user, and calculating a distance value between the target characteristic vector and a user group sign characteristic vector in the user group sign characteristic vector set;
and when the distance value is smaller than a preset distance, obtaining the information of the product corresponding to the user group mark feature vector from the product information set, and pushing the information to the target user.
2. The information pushing method based on data mining as claimed in claim 1, wherein said performing feature extraction on said user group information set to obtain a user group feature vector set includes:
randomly grouping the information of the user groups in the user group information set to obtain a grouping result, wherein the grouping result comprises at least two groups and the information of the user groups contained in the at least two groups;
calculating scores of the grouping results according to a preset grouping scoring template to obtain score results, wherein the score results comprise scores of at least two groups;
comparing the scores of at least two groups in the score result with a preset score threshold respectively;
regrouping groups having a score less than or equal to the preset score threshold;
carrying out similarity calculation on the information of the user group contained in the group with the score larger than the preset score threshold value and the sampling features in the preset sampling user feature set to obtain a similarity value;
and acquiring a sampling user feature set when the similarity value is greater than a preset similarity threshold value, and determining the sampling user feature set as the user group feature vector set.
3. The data mining-based information pushing method of claim 1, wherein before the feature extraction of the user group information set, the method further comprises:
carrying out exception removal processing on the user group information set to obtain a user group information initial set;
and carrying out normalization processing on the initial set of the user group information to obtain a processed user group information set.
4. The information push method based on data mining as claimed in claim 3, wherein the performing the exception handling on the user group information set to obtain an initial set of user group information includes:
carrying out numerical processing on the user group information set to obtain a numerical set;
screening the numerical value set through a threshold interval to obtain an abnormal numerical value set and a normal numerical value set;
calculating the average value of the normal numerical value set, and replacing the data in the abnormal numerical value set by the average value to obtain a modified abnormal numerical value set;
and determining the set of the normal numerical value set and the abnormal numerical value correction set as the initial set of the user group information.
5. The information pushing method based on data mining as claimed in claim 3, wherein the normalizing the initial set of user group information to obtain a processed user group information set comprises:
normalizing the user information in the initial user group information set by using a normalization algorithm x = (x-mu)/sigma to obtain a processed user group information set;
wherein, x is the processed user group information set, μ is the mean value of all initial user group information in the user group information initial set, and σ is the variance of all initial user group information in the user group information initial set.
6. The data mining-based information pushing method of claim 4, wherein the method further comprises:
acquiring a median q in the value set;
of said median
Figure FDA0004068202630000021
As a lower bound ^ of the median>
Figure FDA0004068202630000022
As an upper bound, the threshold interval marked with the lower bound and the upper bound is found->
Figure FDA0004068202630000023
Wherein n is>m, n and m are preset constants.
7. The method for pushing information based on data mining as claimed in claim 1, wherein said calculating the distance value between the target feature vector and the user group identity feature vectors in the user group identity feature vector set comprises:
calculating the distance value between the target feature vector and the user group sign feature vector in the user group sign feature vector set by using the following distance algorithm:
Figure FDA0004068202630000024
wherein L (X, Y) is the distance value, X is the target feature vector, Y i And marking the characteristic vectors for the user groups in the user group mark characteristic vector set.
8. A product recommendation device, the device comprising:
the information acquisition module is used for acquiring information of at least two products and information of at least two user groups corresponding to the at least two products to obtain a product information set and a user group information set;
the characteristic extraction module is used for extracting the characteristics of the user group information set to obtain a user group characteristic vector set, and converting the user group characteristic vector set by using a preset conversion algorithm to obtain a user group mark characteristic vector set;
the distance calculation module is used for acquiring a target characteristic vector of a target user and calculating a distance value between the target characteristic vector and a user group sign characteristic vector in the user group sign characteristic vector set;
and the information pushing module is used for acquiring the information of the product corresponding to the user group mark feature vector from the product information set when the distance value is smaller than a preset distance, and pushing the information to the target user.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data mining based information push method of any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the data mining-based information pushing method according to any one of claims 1 to 7.
CN202010239972.1A 2020-03-30 2020-03-30 Information pushing method and device based on data mining and storage medium Active CN111475719B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010239972.1A CN111475719B (en) 2020-03-30 2020-03-30 Information pushing method and device based on data mining and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010239972.1A CN111475719B (en) 2020-03-30 2020-03-30 Information pushing method and device based on data mining and storage medium

Publications (2)

Publication Number Publication Date
CN111475719A CN111475719A (en) 2020-07-31
CN111475719B true CN111475719B (en) 2023-04-07

Family

ID=71749419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010239972.1A Active CN111475719B (en) 2020-03-30 2020-03-30 Information pushing method and device based on data mining and storage medium

Country Status (1)

Country Link
CN (1) CN111475719B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688984A (en) * 2017-07-27 2018-02-13 上海壹账通金融科技有限公司 Product information method for pushing, device, storage medium and computer equipment
WO2019061976A1 (en) * 2017-09-28 2019-04-04 平安科技(深圳)有限公司 Fund product recommendation method and apparatus, terminal device, and storage medium
CN109582857A (en) * 2018-10-15 2019-04-05 深圳壹账通智能科技有限公司 Based on big data information-pushing method, device, computer equipment and storage medium
CN110209928A (en) * 2019-04-28 2019-09-06 平安科技(深圳)有限公司 A kind of information recommendation method, device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688984A (en) * 2017-07-27 2018-02-13 上海壹账通金融科技有限公司 Product information method for pushing, device, storage medium and computer equipment
WO2019061976A1 (en) * 2017-09-28 2019-04-04 平安科技(深圳)有限公司 Fund product recommendation method and apparatus, terminal device, and storage medium
CN109582857A (en) * 2018-10-15 2019-04-05 深圳壹账通智能科技有限公司 Based on big data information-pushing method, device, computer equipment and storage medium
CN110209928A (en) * 2019-04-28 2019-09-06 平安科技(深圳)有限公司 A kind of information recommendation method, device and storage medium

Also Published As

Publication number Publication date
CN111475719A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
CN111814962A (en) Method and device for acquiring parameters of recognition model, electronic equipment and storage medium
CN111932534A (en) Medical image picture analysis method and device, electronic equipment and readable storage medium
CN113592019A (en) Fault detection method, device, equipment and medium based on multi-model fusion
CN114979120B (en) Data uploading method, device, equipment and storage medium
CN112860905A (en) Text information extraction method, device and equipment and readable storage medium
CN112579621A (en) Data display method and device, electronic equipment and computer storage medium
CN113032403A (en) Data insight method, device, electronic equipment and storage medium
CN111460293B (en) Information pushing method and device and computer readable storage medium
CN114708461A (en) Multi-modal learning model-based classification method, device, equipment and storage medium
CN113869456A (en) Sampling monitoring method and device, electronic equipment and storage medium
CN113344125A (en) Long text matching identification method and device, electronic equipment and storage medium
CN112633988A (en) User product recommendation method and device, electronic equipment and readable storage medium
CN111475719B (en) Information pushing method and device based on data mining and storage medium
CN111402068A (en) Premium data analysis method and device based on big data and storage medium
CN115203364A (en) Software fault feedback processing method, device, equipment and readable storage medium
CN114996386A (en) Business role identification method, device, equipment and storage medium
CN114610854A (en) Intelligent question and answer method, device, equipment and storage medium
CN113888265A (en) Product recommendation method, device, equipment and computer-readable storage medium
CN113221888A (en) License plate number management system testing method and device, electronic equipment and storage medium
CN113343102A (en) Data recommendation method and device based on feature screening, electronic equipment and medium
CN113515591A (en) Text bad information identification method and device, electronic equipment and storage medium
CN111414452A (en) Search word matching method and device, electronic equipment and readable storage medium
CN111414398B (en) Data analysis model determining method, device and storage medium
CN114581157B (en) Sales volume prediction method and device based on big data, electronic equipment and medium
CN114864032B (en) Clinical data acquisition method and device based on HIS system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant