CN113656694A - Information recommendation method, device and equipment based on machine learning and storage medium - Google Patents

Information recommendation method, device and equipment based on machine learning and storage medium Download PDF

Info

Publication number
CN113656694A
CN113656694A CN202110947458.8A CN202110947458A CN113656694A CN 113656694 A CN113656694 A CN 113656694A CN 202110947458 A CN202110947458 A CN 202110947458A CN 113656694 A CN113656694 A CN 113656694A
Authority
CN
China
Prior art keywords
user data
field
value
field factor
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110947458.8A
Other languages
Chinese (zh)
Other versions
CN113656694B (en
Inventor
殷子墨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202110947458.8A priority Critical patent/CN113656694B/en
Publication of CN113656694A publication Critical patent/CN113656694A/en
Application granted granted Critical
Publication of CN113656694B publication Critical patent/CN113656694B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an information recommendation method, device, equipment and storage medium based on machine learning, and relates to an artificial intelligence technology. The model parameters of the obtained prediction model are fully trained and adjusted, and the accuracy of the prediction result of the prediction model is improved.

Description

Information recommendation method, device and equipment based on machine learning and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence intelligent decision making, in particular to an information recommendation method, device, equipment and storage medium based on machine learning.
Background
At present, when a certain enterprise provides a product (such as an insurance product, a financial product, a digital product, and the like) and performs measurement and popularization in a small range, the obtained user data is very little, the currently obtained user data is used as a training set to train a prediction model, and the model parameters of the trained prediction model are not sufficiently trained and adjusted due to the fact that the sample size is insufficient and the number of effective data dimensions involved in sample data is small, so that the number of effective data dimensions involved in the sample data is small.
When the currently acquired less user data is used as a training set to train the prediction model, the input vector corresponding to each user data is generally acquired according to a conversion strategy for uniformly converting the values of each field into vector values according to an undifferentiated field type, which results in insufficient mining of effective information in the user information and poor prediction effect of the finally trained prediction model.
Disclosure of Invention
The embodiment of the invention provides an information recommendation method, device, equipment and storage medium based on machine learning, and aims to solve the problem that in the prior art, when less user data obtained currently is used as a training set to train a prediction model, effective information in user information is not sufficiently mined, so that the prediction effect of the finally trained prediction model is poor.
In a first aspect, an embodiment of the present invention provides an information recommendation method based on machine learning, including:
responding to a data set expansion instruction, and acquiring optimal target product attribute data with the maximum similarity between the optimal target product attribute data and first product attribute data in a local database according to the data set expansion instruction;
acquiring a second historical user data set corresponding to the optimal target product attribute data, and combining a first historical user data set corresponding to the first product attribute data and the second historical user data set to obtain a combined user data set;
classifying the field factors included in the combined user data set according to the field factor value types to obtain field factor classification results;
acquiring each piece of user data in the combined user data set, and carrying out vector value conversion on each piece of user data and field factor values corresponding to each field factor group by calling a corresponding field value conversion strategy to obtain a user input vector and a user output vector corresponding to each piece of user data;
forming training data by using a user input vector and a user output vector corresponding to each user data of the combined user data set, and performing model training on a prediction model to be trained by using the training set formed by the training data to obtain a prediction model;
if user data to be predicted uploaded by a user side is received, vector value conversion is carried out on the user data to be predicted and field factor values corresponding to field factor groups by calling corresponding field value conversion strategies, and user input vectors to be predicted are obtained; and
and inputting the user input vector to be predicted into the prediction model for operation to obtain a user prediction result to be predicted, and sending the user prediction result to be predicted to a user side.
In a second aspect, an embodiment of the present invention provides an information recommendation apparatus based on machine learning, including:
the optimal attribute data acquisition unit is used for responding to a data set expansion instruction and acquiring optimal target product attribute data with the maximum similarity between the optimal target product attribute data and the first product attribute data in a local database according to the data set expansion instruction;
the data set combination unit is used for acquiring a second historical user data set corresponding to the optimal target product attribute data, and combining a first historical user data set corresponding to the first product attribute data with the second historical user data set to obtain a combined user data set;
a field factor classifying unit, configured to classify the field factors included in the combined user data set according to field factor value types, so as to obtain a field factor classification result;
the vector value conversion unit is used for acquiring each piece of user data in the combined user data set, and carrying out vector value conversion on each piece of user data and field factor values corresponding to each field factor group by calling a corresponding field value conversion strategy to obtain a user input vector and a user output vector corresponding to each piece of user data;
the prediction model training unit is used for forming training data by using a user input vector and a user output vector corresponding to each user data of the combined user data set, and performing model training on a prediction model to be trained by using the training set formed by the training data to obtain a prediction model;
the user input vector acquisition unit is used for carrying out vector value conversion on the user data to be predicted and field factor values corresponding to each field factor group by calling a corresponding field value conversion strategy to obtain a user input vector to be predicted if the user data to be predicted uploaded by a user side is received; and
and the result sending unit is used for inputting the user input vector to be predicted into the prediction model for operation to obtain a user prediction result to be predicted and sending the user prediction result to be predicted to the user side.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the machine learning-based information recommendation method according to the first aspect.
In a fourth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the machine learning-based information recommendation method according to the first aspect.
The embodiment of the invention provides an information recommendation method, device, equipment and storage medium based on machine learning, which can rapidly expand extension sample data with higher effectiveness based on a few sample data and participate in the training of a prediction model, so that model parameters of the obtained prediction model are fully trained and adjusted, and the accuracy of a prediction result of the prediction model is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of an information recommendation method based on machine learning according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of an information recommendation method based on machine learning according to an embodiment of the present invention;
FIG. 3 is a schematic block diagram of an information recommendation apparatus based on machine learning according to an embodiment of the present invention;
FIG. 4 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a machine learning-based information recommendation method according to an embodiment of the present invention; fig. 2 is a flowchart illustrating a machine learning-based information recommendation method according to an embodiment of the present invention, where the machine learning-based information recommendation method is applied to a server and is executed by application software installed in the server.
As shown in fig. 2, the method includes steps S101 to S107.
S101, responding to a data set expansion instruction, and acquiring the optimal target product attribute data with the maximum similarity between the optimal target product attribute data and the first product attribute data in a local database according to the data set expansion instruction.
In the present embodiment, in order to more clearly understand the technical solution of the present application, the following detailed description is made on the execution subject involved. The technical scheme is described by taking a server as an execution subject.
A server in which user data of a plurality of products (e.g., insurance products, financial products, digital products, etc.) is stored. For example, in the present application, AN insurance product is taken as AN example, N (for example, N is a positive integer) insurance products are stored in the server, where the N insurance products respectively correspond to one product attribute data and one historical user data set, more specifically, AN insurance product a1 corresponds to a product attribute data a11 and a historical user data set a12, AN insurance product a2 corresponds to a product attribute data a21 and historical user data sets a22 and a … …, and AN insurance product AN corresponds to a product attribute data AN1 and a historical user data set AN 2. Similar product attribute data which is most similar to the first product attribute data uploaded by the user side and is also most similar to the field factor set can be screened out in the server as optimal target product attribute data, then a second historical user data set corresponding to the optimal target product attribute data is obtained, data expansion of the first historical user data set corresponding to the first product attribute data is achieved through the second historical user data set, the data size of a sample is improved, a prediction model to be trained is trained more fully, and therefore the prediction model with higher prediction accuracy is obtained. And after the user data to be predicted is uploaded to the server, the user can calculate based on the prediction model trained in the server to output the prediction result of the user to be predicted.
The user end can upload a historical user data set to serve as training data used by the server to train the prediction model to be trained, and can also upload user data to be predicted to obtain a prediction result of the user to be predicted.
And the business server stores user data of a plurality of products, and a user can select the product attribute data of one of the products as first product attribute data to send to the server so as to trigger a subsequent prediction model training process.
In order to implement the model training process in the server, at this time, first product attribute data corresponding to a target product needs to be selected from the user side or the service server, for example, product attribute data a11 corresponding to insurance product a1 is selected as the first product attribute data. When a server receives first product attribute data uploaded by a user side or a service server, a first historical user data set corresponding to the first product attribute data is retrieved and obtained (because the historical user data set corresponding to each product attribute data is stored in a corresponding partition data table, a corresponding score data table can be retrieved by a name corresponding to the first product attribute data, so that a corresponding historical user data set is obtained), then the total number of the first historical user data in the first historical user data set is obtained in a statistical manner, and a subsequent rapid user data screening and model training process can be triggered by the way that a user selects any product attribute data autonomously.
In one embodiment, step S101 includes:
responding to a data set expansion instruction, acquiring first product attribute data, a first historical user data set corresponding to the first product attribute data and the total number of first historical user data in the first historical user data set according to the data set expansion instruction, and forming a first field factor set by field factors included in the first historical user data set;
when the total number of the first historical user data is judged not to exceed a preset data volume threshold, acquiring target product attribute data with the data similarity higher than a preset similarity threshold with the first product attribute data from a local database to form a target product attribute data set;
and acquiring a target product field factor set corresponding to each target product attribute data in the target product attribute data set, if the total similarity between the field factors of the target product field factor set and the first field factor set is determined to be the maximum value, acquiring the corresponding target product field factor set as an optimal target product field factor set, and acquiring the target product attribute data corresponding to the optimal target product field factor set as optimal target product attribute data.
In this embodiment, in order to ensure that the data amount for the subsequent prediction model training is sufficient, it is required to first determine whether the total number of the first historical user data exceeds a preset data amount threshold (for example, set to 10000), and if it is determined that the total number of the first historical user data does not exceed the preset data amount threshold, it indicates that the data amount in the first historical user data set is insufficient, and the data amount needs to be expanded. Specifically, the target product attribute data with the data similarity higher than the preset similarity threshold with the first product attribute data can be obtained from the local database to form a target product attribute data set. For example, the product attribute data in which the similarity threshold is set to 0.6 and the data similarity with the first product attribute data, product attribute data a11, exceeds 0.6 is composed of product attribute data a21, product attribute data a41, and product attribute data a71, and at this time, a target product attribute data set is composed of product attribute data a21, product attribute data a41, and product attribute data a 71.
When the data similarity of the two product attribute data is calculated, semantic vectors corresponding to the two product attribute data may be obtained first.
Because each target product attribute data included in the target product attribute data set corresponds to one historical user data set, and each historical user data set corresponds to one product field factor set, that is, each target product attribute data can be regarded as corresponding to one product field factor set, and the first product attribute data is also a corresponding first field factor set, the similarity between the target product field factor set corresponding to each target product attribute data in the target product attribute data set and the first field factor set can be calculated after the information is known.
In an embodiment, the obtaining a field factor set of a target product corresponding to each target product attribute data in the target product attribute data set, and if it is determined that a total similarity between field factors of the field factor set of the target product and the field factor set of the first field factor set is a maximum value, obtaining the corresponding field factor set of the target product as an optimal field factor set of the target product includes:
acquiring a first semantic vector corresponding to a text formed by field factors included in the first field factor set;
acquiring target semantic vectors corresponding to each target product field factor set respectively;
calculating and obtaining the vector similarity between the target semantic vector corresponding to each target product field factor set and the first semantic vector, obtaining the maximum value of the vector similarity between each target semantic vector and the first semantic vector, and obtaining a final target semantic vector corresponding to the maximum value of the vector similarity between each target semantic vector and the first semantic vector;
and acquiring a target product field factor set corresponding to the final target semantic vector as an optimal target product field factor set.
Wherein, when calculating the similarity between the target product field factor set corresponding to each target product attribute data in the target product attribute data set and the first field factor set, semantic vectors corresponding to texts composed of each field factor in the first field factor set can be calculated first, then calculating target semantic vectors respectively corresponding to texts respectively formed by target product field factor sets corresponding to each target product attribute data, then calculating vector similarity between the target semantic vector respectively corresponding to each target product field factor set and the first semantic vector and obtaining the maximum value in the similarity, and finally determining the optimal target product field factor set according to the final target semantic vector corresponding to the maximum value of the vector similarity between the first semantic vector and the target product field factor set, and taking the target product attribute data corresponding to the optimal target product field factor set as optimal target product attribute data. By the method, the product attribute data which is most similar to the first product attribute data can be quickly screened out from the historical user data sets corresponding to the products, so that the historical user data sets corresponding to the most similar product attribute data can be used for supplementing the first historical user data set, and the reasonable expansion of the training sample number of the training set is realized.
S102, a second historical user data set corresponding to the optimal target product attribute data is obtained, and a first historical user data set corresponding to the first product attribute data and the second historical user data set are combined to obtain a combined user data set.
In this embodiment, a second historical user data set corresponding to the optimal target product attribute data may be retrieved and obtained in a local database according to a product name corresponding to the optimal target product attribute data in a server, and at this time, in order to quickly implement data expansion, the first historical user data set and the second historical user data set may be directly combined to obtain a combined user data set.
In one embodiment, step S102 includes:
acquiring a union set of the first field factor set and the same field factor in the optimal target product field factor set to obtain a combined field factor set;
performing field factor expansion on each user data in the first historical user data set according to the combined field factor set, and supplementing a field factor value missing value in each user data according to a corresponding field factor value average value in the second historical user data set to obtain a supplemented first historical user data set;
performing field factor expansion on each user data in the second historical user data set according to the combined field factor set, and supplementing a field factor value missing value in each user data according to a corresponding field factor value average value in the first historical user data set to obtain a supplemented second historical user data set;
and combining the supplemented first historical user data set and the supplemented second historical user data set to obtain a combined user data set.
Since, during the assembly process, there may be a case where the field factors included in the first field factor set and the optimal target product field factor set are not exactly the same, for example the first set of field factors comprises 10 field factors and is denoted B1 to B10 respectively, the optimal target product field factor set comprises 12 field factors and is respectively marked as B3-B14, when the field factor of the first field factor set which is the same as the optimal target product field factor set is B3-B10 (a total of 8 identical field factors), these identical field factors are not reserved for any processing, the field factors of the first field factor set different from the optimal target product field factor set may all be reserved, this results in a combined user data set that eventually includes the 14 field factors B1-B14.
More specifically, for example, after the original first field factor set includes 10 field factors and is expanded to 14 field factors, the field factor values of the 4 field factors B11 to B14 are null values, at this time, the values of the B11 field factor expanded by the first field factor set may be padded with reference to the average field factor value of the B11 field factor in the second historical user data set, and the values of the B12 field factor expanded by the first field factor set may be padded with reference to the average field factor value of the B12 field factor in the second historical user data set, the values of the B13 field factors extended by the first set of field factors may be padded with reference to the average field factor value of the B13 field factors in the second set of historical user data, the values of the B14 field factors extended by the first set of field factors may be padded with reference to the average field factor value of the B14 field factors in the second set of historical user data.
More specifically, for example, after the original optimal target product field factor set includes 12 field factors and is expanded to 14 field factors, the field factor values of the 2 field factors B2 to B2 are null values, at this time, the values of the B1 field factors expanded by the optimal target product field factor set may be filled with reference to the average field factor value of the B1 field factors in the first historical user data set, and the values of the B2 field factors expanded by the optimal target product field factor set may be filled with reference to the average field factor value of the B2 field factors in the first historical user data set.
In an embodiment, step S102 is followed by:
and acquiring the data saturation of each piece of user data in the combined user data set, and deleting the user data with the data saturation lower than a preset data saturation threshold from the combined user data set to update the combined user data.
In this embodiment, when calculating the data saturation of a certain piece of user data in the combined user data set, the total number of non-null values corresponding to field factor values that are not null values in the piece of user data is obtained, then the total number of non-null values is divided by the total number of factor values of the field factor values corresponding to the piece of user data, the data saturation of the piece of user data is obtained by dividing the total number of non-null values by the total number of factor values, and calculating the data saturation of other pieces of user data refers to the above calculation method. And when the data saturation of each piece of user data in the combined user data set is obtained, screening out the user data with the data saturation lower than a preset data saturation threshold value, and deleting the user data from the combined user data set to update the combined user data. By the screening method, the reserved user data are all effective user data with high data saturation.
In one embodiment, as another embodiment of obtaining similar expansion data of the first historical user data set (i.e., the second historical user data set), the similar expansion data may be obtained by data cloning.
Specifically, the first historical user data set is cloned for multiple times, for example, the first historical user data set includes N1 pieces of original data, the N1 pieces of data can be cloned for 4 times to obtain 5N1 pieces of data, at this time, in order to avoid repeating each piece of original data for 5 times, data values in the other 4 pieces of data cloned for each piece of original data can be erased, so that 4N1 pieces of data whose values are null values are obtained, and then, data values in the other 4 pieces of cloned data are filled according to each field factor original value and an adjustment step value calculated according to a preset rule.
Because each piece of original data in the previous N1 pieces of original data corresponds to a plurality of field factors, the average value of each field factor can be obtained by averaging all the factor values corresponding to the field factor, and at this time, the difference between each field factor value of each piece of original data and the average value of the corresponding field factor can be obtained and then divided by 4, so as to obtain the adjustment step value of the corresponding field factor value. For example, taking the first original data S1 of N1 original data as an example, which includes 5 field factor values, each of which is denoted as T1-T5, where the average value of the field factor corresponding to T1 is TA, the adjustment step value of the field factor corresponding to T1 can be obtained from (TA-T1)/4, and since T1 is cloned 4 times in the above listed cloning manner and the cloned T1 value is erased, in this case, to refill the value, the (TA-T1)/4, (TA-T1)/2, 3(TA-T1)/4, (TA-T1) can be added by T1, respectively, so as to obtain 4 expanded data related to the T1 value. By analogy, each field factor value in T2-T5 can expand 4 new expansion data. The original data is expanded in data volume by the data cloning mode.
S103, classifying the field factors in the combined user data set according to the field factor value types to obtain field factor classification results; the field factor value types comprise a binary value type, a numerical value type, a character type value type and an output value type, the field factor classification result comprises a plurality of field factor groups, and each field factor group corresponds to one field factor value type.
In this embodiment, the total amount of user data included in a general combined user data set exceeds a preset data amount threshold, at this time, in order to quickly pre-process the user data in the combined user data set, field factors included in the combined user data set (that is, field factors included in a combined field factor set) are first included, and at this time, the field factors need to be classified according to field factor value types, so as to obtain a field factor classification result.
For example, a binary type value type, which is characterized by a user status of one of the two, such as married (yes/no) and gender (male/female).
The numerical value type is characterized in that the value is a number, the number has a rough value range but is large in quantity, and the numerical value type is not convenient to exhaust, such as payroll income, age, working life and the like. For such features we take a segmentation approach, such as age, and can follow the following rules:
0.1 for age 0-10, 0.2 for age 10-20, 0.3 for age 20-30, 0.4 for age 30-40, 0.5 for age 40-50, and 0.6 for age 50 and up. The above is merely an example, and the specific values may be divided according to a preset segmentation rule.
The font type takes the value type, and the characteristic can be a category description or a word, such as occupation, hobby and the like. Such features can be vectorized using a pre-trained language model, which is an NLP technique, that is a model that can give a vector representation of a segment of textual description by training on large-scale corpora.
The field name of the field factor corresponding to the characteristics is generally the code number or name of a product specifically purchased by a user, and only one field factor value corresponding to the field factor is generally selected as an output value, so that after the field factor which is most suitable as the output value is screened out from the combined user data set, the rest field factors can only be one of the binary value type, the numerical value type and the character type value type.
After the field factors included in the combined user data set are classified according to the field factor value types to obtain field factor classification results, the field factor value corresponding to each piece of user data in the combined user data set can also be divided into 4 major parts, namely a binary value type part, a numerical value type part, a text font value type part and an output value type part.
S104, obtaining each piece of user data in the combined user data set, and carrying out vector value conversion on each piece of user data and field factor values corresponding to each field factor group by calling a corresponding field value conversion strategy to obtain a user input vector and a user output vector corresponding to each piece of user data.
In this embodiment, taking the example of obtaining the user data C in the combined user data set, the corresponding combined field factor set includes 14 field factors, where the field factors B1-B3 are binary value types, the field factors B4-B7 are numeric value types, the field factors B8-B13 are text value types, and the field factor B14 is an output value type, and at this time, the field factors B1-B3 may perform vector value conversion by invoking a first field value conversion policy, so as to obtain vector value values corresponding to the field factors B1-B3, respectively (because of the binary value types, vector values obtained by conversion of the converted field factor values are 0 or 1); vector value conversion is carried out on the field factors B4-B7 by calling a second field value conversion strategy, so that vector values respectively corresponding to the field factors B4-B7 are obtained (because the field factors are numerical value types, the assignment conversion of characteristic values is carried out on all the values in a segmentation mode, specifically, the mode that the age values are correspondingly converted into the characteristic values according to the segmentation value mode can be referred to); vector value conversion is carried out on the field factors B8-B13 by calling a third field value conversion strategy, vector values corresponding to the field factors B8-B13 are obtained (because the vector values are text type values, a pre-training language model is adopted for vectorization, and feature value conversion is carried out on each text value), and the field factor values corresponding to the field factors B14 are directly used as user output vectors. Through the conversion, the user input vector and the user output vector corresponding to each user data can be accurately obtained.
In one embodiment, step S104 includes:
acquiring a first field value conversion strategy corresponding to a binary value type, and converting field factor values corresponding to each user data of the combined user data set as the binary value type into a corresponding first type input vector value set;
acquiring a second field value conversion strategy corresponding to the numerical value type, and converting field factor values corresponding to each user data of the combined user data set as the numerical value type into a corresponding second type input vector value set;
acquiring a third field value conversion strategy corresponding to the text type value type, and converting field factor values, corresponding to the text type value type, of each piece of user data in the combined user data set into a corresponding third type input vector value set;
splicing and combining a first type input vector value set, a second type input vector value set and a third type input vector value set corresponding to each piece of user data in sequence to obtain a user input vector corresponding to each piece of user data;
and taking the field factor value of each piece of user data as the output value type as a user output vector of the corresponding user data.
In this embodiment, through the above conversion process, each piece of user data in the combined user data set may be preprocessed correspondingly and converted into user data that can be used as training data.
And S105, forming training data by using the user input vector and the user output vector corresponding to each user data in the combined user data set, and performing model training on a prediction model to be trained by using the training set formed by the training data to obtain the prediction model.
In this embodiment, after each piece of user data of the combined user data set is correspondingly converted into training data, each piece of training data can be used in the training process of the prediction model to be trained, and model parameters are adjusted and optimized continuously through training until the prediction model with a better final prediction effect is obtained through training.
The prediction model may be implemented by a logistic regression model, a linear regression model, a polynomial regression model, a neural network model, or the like. And then, the trained prediction model is used for prediction, and as only the model is used and only numerical calculation can be carried out during final prediction, the original information of any user cannot be leaked.
And S106, if the user data to be predicted uploaded by the user side is received, vector value conversion is carried out on the user data to be predicted and the field factor values corresponding to the field factor groups by calling corresponding field value conversion strategies, and the user input vector to be predicted is obtained.
In this embodiment, after the training of the prediction model is completed in the server, the prediction model may predict a name or a code corresponding to a product that the user finally desires to purchase according to the user data to be predicted uploaded by the user. Obviously, when the server receives the user data to be predicted, the user data to be predicted is converted into the user input vector to be predicted according to the processing mode of converting the vector values by field value types in the training data, and the prediction model can be input more quickly for prediction processing through the conversion processing.
In one embodiment, step S106 includes:
converting field factor values corresponding to binary value types in the user data to be predicted into corresponding first type input vector value sets according to the first field value conversion strategy, converting field factor values corresponding to numerical value types in the user data to be predicted into corresponding second type input vector value sets according to the second field value conversion strategy, converting field factor values corresponding to text value types in the user data to be predicted into corresponding third type input vector value sets according to the third field value conversion strategy, and splicing and combining the first type input vector value sets, the second type input vector value sets and the third type input vector value sets corresponding to the user data to be predicted in sequence to obtain user input vectors corresponding to the user data to be predicted.
In this embodiment, the process of converting the user data to be predicted into the user input vector also refers to the process of converting the processing mode of converting the vector values into the user input vector to be predicted by dividing the field into the field values according to the field value types in the training data, and the mode of respectively converting the field values according to the field value types can more accurately mine the user characteristics of each type of field values, so that the mined user input characteristics are more representative.
S107, inputting the user input vector to be predicted into the prediction model for operation to obtain a user prediction result to be predicted, and sending the user prediction result to be predicted to a user side.
In this embodiment, the user input vector to be predicted is input to the prediction model for operation, so that a user output vector to be predicted corresponding to the user input vector to be predicted can be obtained, and a specific vector value in the user output vector to be predicted is used as a prediction result of the user to be predicted and sent to the user side, so that rapid and accurate product recommendation is realized.
In an embodiment, step S107 is followed by:
and if the confirmation instruction uploaded by the user side is detected, acquiring the expense data corresponding to the prediction result of the user to be predicted, and sending the expense data to the user side.
In this embodiment, after the user receives the user prediction result to be predicted about the recommended product, the user operation user side carefully checks the user prediction result, and may finally decide whether to purchase the product, and once the purchase is determined (for example, after clicking a purchase virtual button on the interface), a confirmation instruction is triggered to be generated and sent to the server. When the server detects the confirmation instruction uploaded by the user side, the server can start to generate a bill including the expense data and send the bill to the user side, and the user side can visually check the expense data after receiving the expense data.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The method can rapidly expand the expansion sample data with higher effectiveness based on a few sample data and participate in the training of the prediction model, the model parameters of the obtained prediction model are fully trained and adjusted, and the accuracy of the prediction result of the prediction model is improved.
The embodiment of the invention also provides an information recommendation device based on machine learning, which is used for executing any embodiment of the information recommendation method based on machine learning. Specifically, referring to fig. 3, fig. 3 is a schematic block diagram of an information recommendation device based on machine learning according to an embodiment of the present invention. The machine learning based information recommendation apparatus 100 may be configured in a server.
The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.
As shown in fig. 3, the machine learning based information recommendation apparatus 100 includes: an optimal attribute data acquisition unit 101, a data set combination unit 102, a field factor classification unit 103, a vector value conversion unit 104, a prediction model training unit 105, a user input vector acquisition unit 106, and a result transmission unit 107.
The optimal attribute data acquiring unit 101 is configured to, in response to a data set extension instruction, acquire, in the local database, optimal target product attribute data with a maximum similarity to the first product attribute data according to the data set extension instruction.
In this embodiment, in order to implement the model training process in the server, at this time, the user end or the service server needs to select first product attribute data corresponding to a target product, for example, the product attribute data a11 corresponding to the insurance product a1 is selected as the first product attribute data, and since the user data sets corresponding to all the products are stored in the server, at this time, the user end or the service server only needs to send the first product attribute data to the server. When a server receives first product attribute data uploaded by a user side or a service server, a first historical user data set corresponding to the first product attribute data is retrieved and obtained (because the historical user data set corresponding to each product attribute data is stored in a corresponding partition data table, a corresponding score data table can be retrieved by a name corresponding to the first product attribute data, so that a corresponding historical user data set is obtained), then the total number of the first historical user data in the first historical user data set is obtained in a statistical manner, and a subsequent rapid user data screening and model training process can be triggered by the way that a user selects any product attribute data autonomously.
In an embodiment, the optimal attribute data obtaining unit 101 includes:
a first obtaining unit, configured to, in response to a data set expansion instruction, obtain first product attribute data, a first historical user data set corresponding to the first product attribute data, and a total number of first historical user data in the first historical user data set according to the data set expansion instruction, and form a first field factor set from field factors included in the first historical user data set;
a second obtaining unit, configured to, when it is determined that the total number of the first historical user data does not exceed a preset data amount threshold, obtain, in a local database, target product attribute data whose data similarity with the first product attribute data is higher than a preset similarity threshold, and form a target product attribute data set;
a third obtaining unit, configured to obtain a target product field factor set corresponding to each target product attribute data in the target product attribute data set, and if it is determined that total similarity between field factors of the target product field factor set and the first field factor set is a maximum value, obtain the corresponding target product field factor set as an optimal target product field factor set, and obtain target product attribute data corresponding to the optimal target product field factor set as optimal target product attribute data.
In this embodiment, in order to ensure that the data amount for the subsequent prediction model training is sufficient, it is required to first determine whether the total number of the first historical user data exceeds a preset data amount threshold (for example, set to 10000), and if it is determined that the total number of the first historical user data does not exceed the preset data amount threshold, it indicates that the data amount in the first historical user data set is insufficient, and the data amount needs to be expanded. Specifically, the target product attribute data with the data similarity higher than the preset similarity threshold with the first product attribute data can be obtained from the local database to form a target product attribute data set. For example, the product attribute data in which the similarity threshold is set to 0.6 and the data similarity with the first product attribute data, product attribute data a11, exceeds 0.6 is composed of product attribute data a21, product attribute data a41, and product attribute data a71, and at this time, a target product attribute data set is composed of product attribute data a21, product attribute data a41, and product attribute data a 71.
When the data similarity of the two product attribute data is calculated, semantic vectors corresponding to the two product attribute data may be obtained first.
In this embodiment, each target product attribute data included in the target product attribute data set corresponds to one historical user data set, and each historical user data set corresponds to one product field factor set, that is, each target product attribute data can be regarded as corresponding to one product field factor set, and the first product attribute data is also a corresponding first field factor set, so that the similarity between the target product field factor set corresponding to each target product attribute data in the target product attribute data set and the first field factor set can be calculated after the above information is known.
In one embodiment, the third obtaining unit includes:
a first semantic vector obtaining unit, configured to obtain a first semantic vector corresponding to a text composed of field factors included in the first field factor set;
the target semantic vector acquisition unit is used for acquiring a target semantic vector corresponding to each target product field factor set;
a final semantic vector obtaining unit, configured to calculate and obtain vector similarities between target semantic vectors corresponding to each target product field factor set and the first semantic vector, obtain a maximum value of the vector similarities between each target semantic vector and the first semantic vector, and obtain a final target semantic vector corresponding to the maximum value of the vector similarities between the target semantic vectors and the first semantic vector;
and the optimal target product field factor set acquisition unit is used for acquiring a target product field factor set corresponding to the final target semantic vector as an optimal target product field factor set.
Wherein, when calculating the similarity between the target product field factor set corresponding to each target product attribute data in the target product attribute data set and the first field factor set, semantic vectors corresponding to texts composed of each field factor in the first field factor set can be calculated first, then calculating target semantic vectors respectively corresponding to texts respectively formed by target product field factor sets corresponding to each target product attribute data, then calculating vector similarity between the target semantic vector respectively corresponding to each target product field factor set and the first semantic vector and obtaining the maximum value in the similarity, and finally determining the optimal target product field factor set according to the final target semantic vector corresponding to the maximum value of the vector similarity between the first semantic vector and the target product field factor set, and taking the target product attribute data corresponding to the optimal target product field factor set as optimal target product attribute data. By the method, the product attribute data which is most similar to the first product attribute data can be quickly screened out from the historical user data sets corresponding to the products, so that the historical user data sets corresponding to the most similar product attribute data can be used for supplementing the first historical user data set, and the reasonable expansion of the training sample number of the training set is realized.
And the data set combining unit 102 is configured to obtain a second historical user data set corresponding to the optimal target product attribute data, and combine the first historical user data set corresponding to the first product attribute data and the second historical user data set to obtain a combined user data set.
In this embodiment, a second historical user data set corresponding to the optimal target product attribute data may be retrieved and obtained in a local database according to a product name corresponding to the optimal target product attribute data in a server, and at this time, in order to quickly implement data expansion, the first historical user data set and the second historical user data set may be directly combined to obtain a combined user data set.
In an embodiment, the data set combining unit 102 comprises:
a combined field factor set obtaining unit, configured to obtain a union set of the first field factor set and the same field factor in the optimal target product field factor set, so as to obtain a combined field factor set;
a first supplementing unit, configured to perform field factor expansion on each user data in the first historical user data set according to the combined field factor set, and supplement a field factor value missing value in each user data according to a corresponding field factor value average value in the second historical user data set, so as to obtain a supplemented first historical user data set;
a second supplementing unit, configured to perform field factor expansion on each user data in the second historical user data set according to the combined field factor set, and supplement a field factor value missing value in each user data according to a corresponding field factor value average value in the first historical user data set, so as to obtain a supplemented second historical user data set;
and the first combination unit is used for combining the supplemented first historical user data set and the supplemented second historical user data set to obtain a combined user data set.
Since, during the assembly process, there may be a case where the field factors included in the first field factor set and the optimal target product field factor set are not exactly the same, for example the first set of field factors comprises 10 field factors and is denoted B1 to B10 respectively, the optimal target product field factor set comprises 12 field factors and is respectively marked as B3-B14, when the field factor of the first field factor set which is the same as the optimal target product field factor set is B3-B10 (a total of 8 identical field factors), these identical field factors are not reserved for any processing, the field factors of the first field factor set different from the optimal target product field factor set may all be reserved, this results in a combined user data set that eventually includes the 14 field factors B1-B14.
More specifically, for example, after the original first field factor set includes 10 field factors and is expanded to 14 field factors, the field factor values of the 4 field factors B11 to B14 are null values, at this time, the values of the B11 field factor expanded by the first field factor set may be padded with reference to the average field factor value of the B11 field factor in the second historical user data set, and the values of the B12 field factor expanded by the first field factor set may be padded with reference to the average field factor value of the B12 field factor in the second historical user data set, the values of the B13 field factors extended by the first set of field factors may be padded with reference to the average field factor value of the B13 field factors in the second set of historical user data, the values of the B14 field factors extended by the first set of field factors may be padded with reference to the average field factor value of the B14 field factors in the second set of historical user data.
More specifically, for example, after the original optimal target product field factor set includes 12 field factors and is expanded to 14 field factors, the field factor values of the 2 field factors B2 to B2 are null values, at this time, the values of the B1 field factors expanded by the optimal target product field factor set may be filled with reference to the average field factor value of the B1 field factors in the first historical user data set, and the values of the B2 field factors expanded by the optimal target product field factor set may be filled with reference to the average field factor value of the B2 field factors in the first historical user data set.
In one embodiment, the machine learning based information recommendation apparatus 100 further includes:
and the data screening unit is used for acquiring the data saturation of each piece of user data in the combined user data set, and deleting the user data with the data saturation lower than a preset data saturation threshold from the combined user data set to update the combined user data.
In this embodiment, when calculating the data saturation of a certain piece of user data in the combined user data set, the total number of non-null values corresponding to field factor values that are not null values in the piece of user data is obtained, then the total number of non-null values is divided by the total number of factor values of the field factor values corresponding to the piece of user data, the data saturation of the piece of user data is obtained by dividing the total number of non-null values by the total number of factor values, and calculating the data saturation of other pieces of user data refers to the above calculation method. And when the data saturation of each piece of user data in the combined user data set is obtained, screening out the user data with the data saturation lower than a preset data saturation threshold value, and deleting the user data from the combined user data set to update the combined user data. By the screening method, the reserved user data are all effective user data with high data saturation.
In one embodiment, as another embodiment of obtaining similar expansion data of the first historical user data set (i.e., the second historical user data set), the similar expansion data may be obtained by data cloning.
Specifically, the first historical user data set is cloned for multiple times, for example, the first historical user data set includes N1 pieces of original data, the N1 pieces of data can be cloned for 4 times to obtain 5N1 pieces of data, at this time, in order to avoid repeating each piece of original data for 5 times, data values in the other 4 pieces of data cloned for each piece of original data can be erased, so that 4N1 pieces of data whose values are null values are obtained, and then, data values in the other 4 pieces of cloned data are filled according to each field factor original value and an adjustment step value calculated according to a preset rule.
Because each piece of original data in the previous N1 pieces of original data corresponds to a plurality of field factors, the average value of each field factor can be obtained by averaging all the factor values corresponding to the field factor, and at this time, the difference between each field factor value of each piece of original data and the average value of the corresponding field factor can be obtained and then divided by 4, so as to obtain the adjustment step value of the corresponding field factor value. For example, taking the first original data S1 of N1 original data as an example, which includes 5 field factor values, each of which is denoted as T1-T5, where the average value of the field factor corresponding to T1 is TA, the adjustment step value of the field factor corresponding to T1 can be obtained from (TA-T1)/4, and since T1 is cloned 4 times in the above listed cloning manner and the cloned T1 value is erased, in this case, to refill the value, the (TA-T1)/4, (TA-T1)/2, 3(TA-T1)/4, (TA-T1) can be added by T1, respectively, so as to obtain 4 expanded data related to the T1 value. By analogy, each field factor value in T2-T5 can expand 4 new expansion data. The original data is expanded in data volume by the data cloning mode.
A field factor classifying unit 103, configured to classify the field factors included in the combined user data set according to field factor value types, so as to obtain a field factor classification result; the field factor value types comprise a binary value type, a numerical value type, a character type value type and an output value type, the field factor classification result comprises a plurality of field factor groups, and each field factor group corresponds to one field factor value type.
In this embodiment, the total amount of user data included in a general combined user data set exceeds a preset data amount threshold, at this time, in order to quickly pre-process the user data in the combined user data set, field factors included in the combined user data set (that is, field factors included in a combined field factor set) are first included, and at this time, the field factors need to be classified according to field factor value types, so as to obtain a field factor classification result.
For example, a binary type value type, which is characterized by a user status of one of the two, such as married (yes/no) and gender (male/female).
The numerical value type is characterized in that the value is a number, the number has a rough value range but is large in quantity, and the numerical value type is not convenient to exhaust, such as payroll income, age, working life and the like. For such features we take a segmentation approach, such as age, and can follow the following rules:
0.1 for age 0-10, 0.2 for age 10-20, 0.3 for age 20-30, 0.4 for age 30-40, 0.5 for age 40-50, and 0.6 for age 50 and up. The above is merely an example, and the specific values may be divided according to a preset segmentation rule.
The font type takes the value type, and the characteristic can be a category description or a word, such as occupation, hobby and the like. Such features can be vectorized using a pre-trained language model, which is an NLP technique, that is a model that can give a vector representation of a segment of textual description by training on large-scale corpora.
The field name of the field factor corresponding to the characteristics is generally the code number or name of a product specifically purchased by a user, and only one field factor value corresponding to the field factor is generally selected as an output value, so that after the field factor which is most suitable as the output value is screened out from the combined user data set, the rest field factors can only be one of the binary value type, the numerical value type and the character type value type.
After the field factors included in the combined user data set are classified according to the field factor value types to obtain field factor classification results, the field factor value corresponding to each piece of user data in the combined user data set can also be divided into 4 major parts, namely a binary value type part, a numerical value type part, a text font value type part and an output value type part.
And the vector value conversion unit 104 is configured to obtain each piece of user data in the combined user data set, perform vector value conversion on each piece of user data and a field factor value corresponding to each field factor group by invoking a corresponding field value conversion policy, and obtain a user input vector and a user output vector corresponding to each piece of user data.
In this embodiment, taking the example of obtaining the user data C in the combined user data set, the corresponding combined field factor set includes 14 field factors, where the field factors B1-B3 are binary value types, the field factors B4-B7 are numeric value types, the field factors B8-B13 are text value types, and the field factor B14 is an output value type, and at this time, the field factors B1-B3 may perform vector value conversion by invoking a first field value conversion policy, so as to obtain vector value values corresponding to the field factors B1-B3, respectively (because of the binary value types, vector values obtained by conversion of the converted field factor values are 0 or 1); vector value conversion is carried out on the field factors B4-B7 by calling a second field value conversion strategy, so that vector values respectively corresponding to the field factors B4-B7 are obtained (because the field factors are numerical value types, the assignment conversion of characteristic values is carried out on all the values in a segmentation mode, specifically, the mode that the age values are correspondingly converted into the characteristic values according to the segmentation value mode can be referred to); vector value conversion is carried out on the field factors B8-B13 by calling a third field value conversion strategy, vector values corresponding to the field factors B8-B13 are obtained (because the vector values are text type values, a pre-training language model is adopted for vectorization, and feature value conversion is carried out on each text value), and the field factor values corresponding to the field factors B14 are directly used as user output vectors. Through the conversion, the user input vector and the user output vector corresponding to each user data can be accurately obtained.
In one embodiment, the vector value conversion unit 104 includes:
a first type vector value obtaining unit, configured to obtain a first field value conversion policy corresponding to a binary value type, and convert a field factor value, which is corresponding to the binary value type, of each piece of user data in the combined user data set into a corresponding first type input vector value set;
a second type vector value obtaining unit, configured to obtain a second field value conversion policy corresponding to a numerical value type, and convert a field factor value, which is corresponding to the numerical value type of each piece of user data in the combined user data set, into a corresponding second type input vector value set;
a third type vector value obtaining unit, configured to obtain a third field value conversion policy corresponding to a text type value type, and convert a field factor value, which is a value of each user data corresponding to the text type value type in the combined user data set, into a corresponding third type input vector value set;
the second combination unit is used for splicing and combining the first type input vector value set, the second type input vector value set and the third type input vector value set corresponding to each piece of user data in sequence to obtain a user input vector corresponding to each piece of user data;
and the user output vector acquisition unit is used for taking the field factor value of each piece of user data as the output value type as the user output vector of the corresponding user data.
In this embodiment, through the above conversion process, each piece of user data in the combined user data set may be preprocessed correspondingly and converted into user data that can be used as training data.
And the prediction model training unit 105 is configured to combine the user input vector and the user output vector corresponding to each user data of the combined user data set into training data, and perform model training on the prediction model to be trained by using the training set formed by the training data to obtain the prediction model.
In this embodiment, after each piece of user data of the combined user data set is correspondingly converted into training data, each piece of training data can be used in the training process of the prediction model to be trained, and model parameters are adjusted and optimized continuously through training until the prediction model with a better final prediction effect is obtained through training.
The prediction model may be implemented by a logistic regression model, a linear regression model, a polynomial regression model, a neural network model, or the like. And then, the trained prediction model is used for prediction, and as only the model is used and only numerical calculation can be carried out during final prediction, the original information of any user cannot be leaked.
The user input vector obtaining unit 106 is configured to, if user data to be predicted uploaded by a user side is received, perform vector value conversion on the user data to be predicted and field factor values corresponding to each field factor group by invoking a corresponding field value conversion policy, so as to obtain a user input vector to be predicted.
In this embodiment, after the training of the prediction model is completed in the server, the prediction model may predict a name or a code corresponding to a product that the user finally desires to purchase according to the user data to be predicted uploaded by the user. Obviously, when the server receives the user data to be predicted, the user data to be predicted is converted into the user input vector to be predicted according to the processing mode of converting the vector values by field value types in the training data, and the prediction model can be input more quickly for prediction processing through the conversion processing.
In an embodiment, the user input vector obtaining unit 106 is further configured to:
converting field factor values corresponding to binary value types in the user data to be predicted into corresponding first type input vector value sets according to the first field value conversion strategy, converting field factor values corresponding to numerical value types in the user data to be predicted into corresponding second type input vector value sets according to the second field value conversion strategy, converting field factor values corresponding to text value types in the user data to be predicted into corresponding third type input vector value sets according to the third field value conversion strategy, and splicing and combining the first type input vector value sets, the second type input vector value sets and the third type input vector value sets corresponding to the user data to be predicted in sequence to obtain user input vectors corresponding to the user data to be predicted.
In this embodiment, the process of converting the user data to be predicted into the user input vector also refers to the process of converting the processing mode of converting the vector values into the user input vector to be predicted by dividing the field into the field values according to the field value types in the training data, and the mode of respectively converting the field values according to the field value types can more accurately mine the user characteristics of each type of field values, so that the mined user input characteristics are more representative.
And the result sending unit 107 is configured to input the user input vector to be predicted to the prediction model for operation, obtain a user prediction result to be predicted, and send the user prediction result to be predicted to the user side.
In this embodiment, the user input vector to be predicted is input to the prediction model for operation, so that a user output vector to be predicted corresponding to the user input vector to be predicted can be obtained, and a specific vector value in the user output vector to be predicted is used as a prediction result of the user to be predicted and sent to the user side, so that rapid and accurate product recommendation is realized.
In an embodiment, the machine learning based information recommendation apparatus 100 further comprises:
and the expense data sending unit is used for acquiring expense data corresponding to the prediction result of the user to be predicted if the confirmation instruction uploaded by the user side is detected, and sending the expense data to the user side.
In this embodiment, after the user receives the user prediction result to be predicted about the recommended product, the user operation user side carefully checks the user prediction result, and may finally decide whether to purchase the product, and once the purchase is determined (for example, after clicking a purchase virtual button on the interface), a confirmation instruction is triggered to be generated and sent to the server. When the server detects the confirmation instruction uploaded by the user side, the server can start to generate a bill including the expense data and send the bill to the user side, and the user side can visually check the expense data after receiving the expense data.
The device can rapidly expand the expansion sample data with higher effectiveness based on a few sample data and participate in the training of the prediction model, the model parameters of the obtained prediction model are fully trained and adjusted, and the accuracy of the prediction result of the prediction model is improved.
The above-mentioned machine learning based information recommendation apparatus may be implemented in the form of a computer program, which may be run on a computer device as shown in fig. 4.
Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 500 is a server, and the server may be an independent server or a server cluster composed of a plurality of servers.
Referring to fig. 4, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a storage medium 503 and an internal memory 504.
The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a machine learning-based information recommendation method.
The processor 502 is used to provide computing and control capabilities that support the operation of the overall computer device 500.
The internal memory 504 provides an environment for running the computer program 5032 in the storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be caused to execute the information recommendation method based on machine learning.
The network interface 505 is used for network communication, such as providing transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing device 500 to which aspects of the present invention may be applied, and that a particular computing device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The processor 502 is configured to run the computer program 5032 stored in the memory to implement the information recommendation method based on machine learning disclosed in the embodiment of the present invention.
Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 4 does not constitute a limitation on the specific construction of the computer device, and that in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 4, and are not described herein again.
It should be understood that, in the embodiment of the present invention, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer-readable storage medium may be a nonvolatile computer-readable storage medium or a volatile computer-readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the machine learning-based information recommendation method disclosed by the embodiments of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A machine learning-based information recommendation method is characterized by comprising the following steps:
responding to a data set expansion instruction, and acquiring optimal target product attribute data with the maximum similarity between the optimal target product attribute data and first product attribute data in a local database according to the data set expansion instruction;
acquiring a second historical user data set corresponding to the optimal target product attribute data, and combining a first historical user data set corresponding to the first product attribute data and the second historical user data set to obtain a combined user data set;
classifying the field factors included in the combined user data set according to the field factor value types to obtain field factor classification results;
acquiring each piece of user data in the combined user data set, and carrying out vector value conversion on each piece of user data and field factor values corresponding to each field factor group by calling a corresponding field value conversion strategy to obtain a user input vector and a user output vector corresponding to each piece of user data;
forming training data by using a user input vector and a user output vector corresponding to each user data of the combined user data set, and performing model training on a prediction model to be trained by using the training set formed by the training data to obtain a prediction model;
if user data to be predicted uploaded by a user side is received, vector value conversion is carried out on the user data to be predicted and field factor values corresponding to field factor groups by calling corresponding field value conversion strategies, and user input vectors to be predicted are obtained; and
and inputting the user input vector to be predicted into the prediction model for operation to obtain a user prediction result to be predicted, and sending the user prediction result to be predicted to a user side.
2. The machine-learning-based information recommendation method according to claim 1, wherein the obtaining, in response to a data set extension instruction, optimal target product attribute data with a maximum similarity to the first product attribute data in the local database according to the data set extension instruction comprises:
responding to a data set expansion instruction, acquiring first product attribute data, a first historical user data set corresponding to the first product attribute data and the total number of first historical user data in the first historical user data set according to the data set expansion instruction, and forming a first field factor set by field factors included in the first historical user data set;
when the total number of the first historical user data is judged not to exceed a preset data volume threshold, acquiring target product attribute data with the data similarity higher than a preset similarity threshold with the first product attribute data from a local database to form a target product attribute data set;
and acquiring a target product field factor set corresponding to each target product attribute data in the target product attribute data set, if the total similarity between the field factors of the target product field factor set and the first field factor set is determined to be the maximum value, acquiring the corresponding target product field factor set as an optimal target product field factor set, and acquiring the target product attribute data corresponding to the optimal target product field factor set as optimal target product attribute data.
3. The machine-learning-based information recommendation method according to claim 2, wherein the obtaining a target product field factor set corresponding to each target product attribute data in the target product attribute data set, and if it is determined that the total similarity between the field factors of the target product field factor set and the first field factor set is the maximum value, obtaining the corresponding target product field factor set as an optimal target product field factor set comprises:
acquiring a first semantic vector corresponding to a text formed by field factors included in the first field factor set;
acquiring target semantic vectors corresponding to each target product field factor set respectively;
calculating and obtaining the vector similarity between the target semantic vector corresponding to each target product field factor set and the first semantic vector, obtaining the maximum value of the vector similarity between each target semantic vector and the first semantic vector, and obtaining a final target semantic vector corresponding to the maximum value of the vector similarity between each target semantic vector and the first semantic vector;
and acquiring a target product field factor set corresponding to the final target semantic vector as an optimal target product field factor set.
4. The machine learning-based information recommendation method according to claim 3, wherein the obtaining a second historical user data set corresponding to the optimal target product attribute data, and combining a first historical user data set corresponding to the first product attribute data with the second historical user data set to obtain a combined user data set comprises:
acquiring a union set of the first field factor set and the same field factor in the optimal target product field factor set to obtain a combined field factor set;
performing field factor expansion on each user data in the first historical user data set according to the combined field factor set, and supplementing a field factor value missing value in each user data according to a corresponding field factor value average value in the second historical user data set to obtain a supplemented first historical user data set;
performing field factor expansion on each user data in the second historical user data set according to the combined field factor set, and supplementing a field factor value missing value in each user data according to a corresponding field factor value average value in the first historical user data set to obtain a supplemented second historical user data set;
and combining the supplemented first historical user data set and the supplemented second historical user data set to obtain a combined user data set.
5. The machine learning-based information recommendation method according to claim 1, wherein after obtaining a second historical user data set corresponding to the optimal target product attribute data, and combining a first historical user data set corresponding to the first product attribute data with the second historical user data set, obtaining a combined user data set, the method further comprises:
and acquiring the data saturation of each piece of user data in the combined user data set, and deleting the user data with the data saturation lower than a preset data saturation threshold from the combined user data set to update the combined user data.
6. The machine learning-based information recommendation method according to claim 1, wherein the obtaining of each piece of user data in the combined user data set, performing vector value conversion on each piece of user data and a field factor value corresponding to each field factor group by invoking a corresponding field value conversion policy, and obtaining a user input vector and a user output vector corresponding to each piece of user data comprises:
acquiring a first field value conversion strategy corresponding to a binary value type, and converting field factor values corresponding to each user data of the combined user data set as the binary value type into a corresponding first type input vector value set;
acquiring a second field value conversion strategy corresponding to the numerical value type, and converting field factor values corresponding to each user data of the combined user data set as the numerical value type into a corresponding second type input vector value set;
acquiring a third field value conversion strategy corresponding to the text type value type, and converting field factor values, corresponding to the text type value type, of each piece of user data in the combined user data set into a corresponding third type input vector value set;
splicing and combining a first type input vector value set, a second type input vector value set and a third type input vector value set corresponding to each piece of user data in sequence to obtain a user input vector corresponding to each piece of user data;
and taking the field factor value of each piece of user data as the output value type as a user output vector of the corresponding user data.
7. The machine learning-based information recommendation method according to claim 6, wherein the vector value conversion is performed on the user data to be predicted and the field factor values corresponding to the field factor groups by invoking corresponding field value conversion strategies to obtain the user input vector to be predicted, and the method comprises:
converting field factor values corresponding to binary value types in the user data to be predicted into corresponding first type input vector value sets according to the first field value conversion strategy, converting field factor values corresponding to numerical value types in the user data to be predicted into corresponding second type input vector value sets according to the second field value conversion strategy, converting field factor values corresponding to text value types in the user data to be predicted into corresponding third type input vector value sets according to the third field value conversion strategy, and splicing and combining the first type input vector value sets, the second type input vector value sets and the third type input vector value sets corresponding to the user data to be predicted in sequence to obtain user input vectors corresponding to the user data to be predicted.
8. An information recommendation device based on machine learning, comprising:
the optimal attribute data acquisition unit is used for responding to a data set expansion instruction and acquiring optimal target product attribute data with the maximum similarity between the optimal target product attribute data and the first product attribute data in a local database according to the data set expansion instruction;
the data set combination unit is used for acquiring a second historical user data set corresponding to the optimal target product attribute data, and combining a first historical user data set corresponding to the first product attribute data with the second historical user data set to obtain a combined user data set;
a field factor classifying unit, configured to classify the field factors included in the combined user data set according to field factor value types, so as to obtain a field factor classification result;
the vector value conversion unit is used for acquiring each piece of user data in the combined user data set, and carrying out vector value conversion on each piece of user data and field factor values corresponding to each field factor group by calling a corresponding field value conversion strategy to obtain a user input vector and a user output vector corresponding to each piece of user data;
the prediction model training unit is used for forming training data by using a user input vector and a user output vector corresponding to each user data of the combined user data set, and performing model training on a prediction model to be trained by using the training set formed by the training data to obtain a prediction model;
the user input vector acquisition unit is used for carrying out vector value conversion on the user data to be predicted and field factor values corresponding to each field factor group by calling a corresponding field value conversion strategy to obtain a user input vector to be predicted if the user data to be predicted uploaded by a user side is received; and
and the result sending unit is used for inputting the user input vector to be predicted into the prediction model for operation to obtain a user prediction result to be predicted and sending the user prediction result to be predicted to the user side.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the machine learning-based information recommendation method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the machine learning-based information recommendation method according to any one of claims 1 to 7.
CN202110947458.8A 2021-08-18 2021-08-18 Information recommendation method, device, equipment and storage medium based on machine learning Active CN113656694B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110947458.8A CN113656694B (en) 2021-08-18 2021-08-18 Information recommendation method, device, equipment and storage medium based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110947458.8A CN113656694B (en) 2021-08-18 2021-08-18 Information recommendation method, device, equipment and storage medium based on machine learning

Publications (2)

Publication Number Publication Date
CN113656694A true CN113656694A (en) 2021-11-16
CN113656694B CN113656694B (en) 2023-07-25

Family

ID=78480801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110947458.8A Active CN113656694B (en) 2021-08-18 2021-08-18 Information recommendation method, device, equipment and storage medium based on machine learning

Country Status (1)

Country Link
CN (1) CN113656694B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108776673A (en) * 2018-05-23 2018-11-09 哈尔滨工业大学 Automatic switching method, device and the storage medium of relation schema
CN110457581A (en) * 2019-08-02 2019-11-15 达而观信息科技(上海)有限公司 A kind of information recommended method, device, electronic equipment and storage medium
CN112989007A (en) * 2021-04-20 2021-06-18 平安科技(深圳)有限公司 Knowledge base expansion method and device based on countermeasure network and computer equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108776673A (en) * 2018-05-23 2018-11-09 哈尔滨工业大学 Automatic switching method, device and the storage medium of relation schema
CN110457581A (en) * 2019-08-02 2019-11-15 达而观信息科技(上海)有限公司 A kind of information recommended method, device, electronic equipment and storage medium
CN112989007A (en) * 2021-04-20 2021-06-18 平安科技(深圳)有限公司 Knowledge base expansion method and device based on countermeasure network and computer equipment

Also Published As

Publication number Publication date
CN113656694B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN110598206B (en) Text semantic recognition method and device, computer equipment and storage medium
CN109271521B (en) Text classification method and device
US11636341B2 (en) Processing sequential interaction data
US11704500B2 (en) Techniques to add smart device information to machine learning for increased context
US11641330B2 (en) Communication content tailoring
CN111859960A (en) Semantic matching method and device based on knowledge distillation, computer equipment and medium
US20090228233A1 (en) Rank-based evaluation
CN111737546B (en) Method and device for determining entity service attribute
CN112384938A (en) Text prediction based on recipient's electronic messages
US20190228297A1 (en) Artificial Intelligence Modelling Engine
US11599666B2 (en) Smart document migration and entity detection
CN111859988A (en) Semantic similarity evaluation method and device and computer-readable storage medium
CN112579733A (en) Rule matching method, rule matching device, storage medium and electronic equipment
CN113407677A (en) Method, apparatus, device and storage medium for evaluating quality of consultation session
Hamsagayathri et al. Machine learning algorithms to empower Indian women entrepreneur in E-commerce clothing
CN111782782B (en) Consultation reply method and device for intelligent customer service, computer equipment and storage medium
CN110717537B (en) Method and device for training user classification model and executing user classification prediction
CN113656694B (en) Information recommendation method, device, equipment and storage medium based on machine learning
CN110766465A (en) Financial product evaluation method and verification method and device thereof
CN110852094B (en) Method, apparatus and computer readable storage medium for searching target
KR20210074246A (en) Method for recommending object, neural network and training method thereof, device, and medium
CN113657496A (en) Information matching method, device, equipment and medium based on similarity matching model
CN112749530A (en) Text encoding method, device, equipment and computer readable storage medium
US20240013057A1 (en) Information processing method, information processing apparatus, and non-transitory computer-readable storage medium
KR102520414B1 (en) A technique for generating a knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant