CN109685574A - Data determination method, device, electronic equipment and computer readable storage medium - Google Patents

Data determination method, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN109685574A
CN109685574A CN201811593719.5A CN201811593719A CN109685574A CN 109685574 A CN109685574 A CN 109685574A CN 201811593719 A CN201811593719 A CN 201811593719A CN 109685574 A CN109685574 A CN 109685574A
Authority
CN
China
Prior art keywords
data
feature
user
feature vector
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811593719.5A
Other languages
Chinese (zh)
Inventor
周小又
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rajax Network Technology Co Ltd
Lazhasi Network Technology Shanghai Co Ltd
Original Assignee
Lazhasi Network Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lazhasi Network Technology Shanghai Co Ltd filed Critical Lazhasi Network Technology Shanghai Co Ltd
Priority to CN201811593719.5A priority Critical patent/CN109685574A/en
Publication of CN109685574A publication Critical patent/CN109685574A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the present disclosure discloses a kind of data determination method, device, electronic equipment and computer readable storage medium, and the data determination method includes: the user's history data obtained in preset time period, and extracts the feature vector of the historical data;To in described eigenvector discrete features and continuous feature pre-process respectively, according to pretreated feature vector obtain training feature vector;Data are obtained using training feature vector training and determine model, and determine that model carries out data and determines according to the data.The technical solution can be improved the accuracy that the data such as single probability determine under user, provides reliable data for platform, for trade company and supports.

Description

Data determination method, device, electronic equipment and computer readable storage medium
Technical field
This disclosure relates to technical field of data processing, and in particular to a kind of data determination method, device, electronic equipment and meter Calculation machine readable storage medium storing program for executing.
Background technique
With the development of internet technology, more and more businessmans or service provider by internet platform come for Family provides service.In order to improve service quality, promoted the usage experience of user, while improving the rate that places an order of user, many platforms Probability that active user places an order all is determined according to the characteristic information of single user under history.But the prior art is carrying out lower single probability When determining, the platform properties feature of user is usually only considered, feature is relatively simple, in addition, the prior art holds all features The identical processing operation of row, the characteristics of not accounting for feature, this allows for true for probability single under active user in the prior art Fixed accuracy is more low, can not provide reliable data for platform, for trade company and support.
Summary of the invention
The embodiment of the present disclosure provides a kind of data determination method, device, electronic equipment and computer readable storage medium.
In a first aspect, providing a kind of data determination method in the embodiment of the present disclosure.
Specifically, the data determination method, comprising:
The user's history data in preset time period are obtained, and extract the feature vector of the historical data;
To in described eigenvector discrete features and continuous feature pre-process respectively;
Training feature vector is obtained according to pretreated feature vector;
Data are obtained using training feature vector training and determine model, and determine that model is counted according to the data According to determination.
With reference to first aspect, for the disclosure in the first implementation of first aspect, described eigenvector includes following One of feature is a variety of: identification characteristics, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics.
With reference to first aspect with the first implementation of first aspect, second in first aspect of the embodiment of the present invention In implementation, the user's history data obtained in preset time period, and the feature vector of the historical data is extracted, it wraps It includes:
Obtain the user's history data in preset time period;
Identification characteristics, attributive character, position feature, the hobby feature, row of the user are extracted based on the historical data It is characterized and label characteristics;
By the identification characteristics of the user, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics group Altogether, the feature vector of the user is obtained.
With reference to first aspect, second of implementation of the first implementation of first aspect and first aspect, this hair Bright embodiment is in the third implementation of first aspect, the discrete features and continuous feature in described eigenvector It is pre-processed respectively, comprising:
The discrete features in described eigenvector are obtained, coded treatment are carried out for the discrete features, and described in generation The first index value and the First Eigenvalue of discrete features;
The continuous feature in described eigenvector is obtained, the continuous feature is normalized, and generate institute State the second index value and Second Eigenvalue of continuous feature.
With reference to first aspect, the first implementation of first aspect, first aspect second of implementation and first The third implementation of aspect, the embodiment of the present invention are described according to pretreatment in the 4th kind of implementation of first aspect Feature vector afterwards obtains training feature vector, comprising:
Combine the first index value and second index value, obtain aspect indexing set, combine the First Eigenvalue and Second Eigenvalue obtains characteristic value collection;
By the aspect indexing set, characteristic value collection and corresponding label characteristics of the user form the training characteristics to Amount.
With reference to first aspect, the first implementation, second of implementation of first aspect, first party of first aspect The third implementation in face and the 4th kind of implementation of first aspect, five kind reality of the embodiment of the present invention in first aspect In the mode of showing, utilization training feature vector training obtains data and determines model, and determines model according to the data Data are carried out to determine, comprising:
Data, which are obtained, using training feature vector training determines model;
Obtain the aspect indexing and characteristic value of user to be determined;
The aspect indexing of the user to be determined and characteristic value are input to the data to determine in model, obtain it is described to Determine the data definitive result of user.
With reference to first aspect, the first implementation, second of implementation of first aspect, first party of first aspect The 5th kind of implementation of the third implementation in face, the 4th kind of implementation of first aspect and first aspect, the present invention For embodiment in the 6th kind of implementation of first aspect, the data determine that model is DeepFM model.
With reference to first aspect, the first implementation, second of implementation of first aspect, first party of first aspect The third implementation in face, the 4th kind of implementation of first aspect, first aspect the 5th kind of implementation and first party The 6th kind of implementation in face, the embodiment of the present invention in the 7th kind of implementation of first aspect, the training feature vector, The aspect indexing and characteristic value of the user to be determined is all from the pretreated feature vector.
Second aspect provides a kind of data determining device in the embodiment of the present disclosure.
Specifically, the data determining device, comprising:
Extraction module is configured as obtaining the user's history data in preset time period, and extracts the historical data Feature vector;
Preprocessing module, be configured as in described eigenvector discrete features and continuous feature located in advance respectively Reason;
Module is obtained, is configured as obtaining training feature vector according to pretreated feature vector;
Determining module is configured as obtaining data using training feature vector training and determining model, and according to described Data determine that model carries out data and determines.
In conjunction with second aspect, the embodiment of the present invention is in the first implementation of second aspect, described eigenvector packet Include one of following characteristics or a variety of: identification characteristics, attributive character, position feature, hobby feature, behavioural characteristic and label are special Sign.
In conjunction with the first of second aspect and second aspect implementation, second in second aspect of the embodiment of the present invention In implementation, the extraction module includes:
First acquisition submodule is configured as obtaining the user's history data in preset time period;
Extracting sub-module is configured as extracting the identification characteristics, attributive character, position of the user based on the historical data Set feature, hobby feature, behavioural characteristic and label characteristics;
First combination submodule is configured as the identification characteristics of the user, attributive character, position feature, hobby is special Sign, behavioural characteristic and label characteristics combine, and obtain the feature vector of the user.
In conjunction with the first implementation of second aspect, second aspect and second of implementation of second aspect, this hair In the third implementation of second aspect, the preprocessing module includes: bright embodiment
Encoding submodule is configured as obtaining the discrete features in described eigenvector, the discrete features is carried out Coded treatment, and generate the first index value and the First Eigenvalue of the discrete features;
Normalize submodule, be configured as obtain described eigenvector in continuous feature, for the continuous feature into Row normalized, and generate the second index value and Second Eigenvalue of the continuous feature.
In conjunction with the first implementation of second aspect, second aspect, second of implementation and second of second aspect The third implementation of aspect, the embodiment of the present invention is in the 4th kind of implementation of second aspect, the acquisition module packet It includes:
Second combination submodule, is configured as combining the first index value and second index value, obtains aspect indexing collection It closes, combines the First Eigenvalue and Second Eigenvalue, obtain characteristic value collection;
Third combines submodule, is configured as the aspect indexing set of the user, characteristic value collection and corresponding mark It signs feature and forms the training feature vector.
The first implementation, second of implementation of second aspect, second party in conjunction with second aspect, second aspect The third implementation in face and the 4th kind of implementation of second aspect, five kind reality of the embodiment of the present invention in second aspect In existing mode, the determining module includes:
Training submodule is configured as obtaining data using training feature vector training and determining model;
Second acquisition submodule is configured as obtaining the aspect indexing and characteristic value of user to be determined;
It determines submodule, is configured as the aspect indexing of the user to be determined and characteristic value being input to the data true In cover half type, the data definitive result of the user to be determined is obtained.
The first implementation, second of implementation of second aspect, second party in conjunction with second aspect, second aspect The 5th kind of implementation of the third implementation in face, the 4th kind of implementation of second aspect and second aspect, the present invention For embodiment in the 6th kind of implementation of second aspect, the data determine that model is DeepFM model.
The first implementation, second of implementation of second aspect, second party in conjunction with second aspect, second aspect The third implementation in face, the 4th kind of implementation of second aspect, second aspect the 5th kind of implementation and second party The 6th kind of implementation in face, the embodiment of the present invention in the 7th kind of implementation of second aspect, the training feature vector, The aspect indexing and characteristic value of the user to be determined is all from the pretreated feature vector.
The third aspect, the embodiment of the present disclosure provide a kind of electronic equipment, including memory and processor, the memory The computer instruction of data determination method in above-mentioned first aspect is executed for storing one or more support data determining device, The processor is configured to for executing the computer instruction stored in the memory.The data determining device can be with Including communication interface, for data determining device and other equipment or communication.
Fourth aspect, the embodiment of the present disclosure provide a kind of computer readable storage medium, determine dress for storing data Computer instruction used is set, it includes be involved by data determining device for executing data determination method in above-mentioned first aspect And computer instruction.
The technical solution that the embodiment of the present disclosure provides can include the following benefits:
Above-mentioned technical proposal has comprehensively considered a plurality of types of features of user, and different types of feature is distinguished Processing reuses the determination that the feature that combination obtains carries out user data, and therefore, which can be improved user and place an order generally The accuracy that the data such as rate determine provides reliable data for platform, for trade company and supports.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
In conjunction with attached drawing, by the detailed description of following non-limiting embodiment, the other feature of the disclosure, purpose and excellent Point will be apparent.In the accompanying drawings:
Fig. 1 shows the flow chart of the data determination method according to one embodiment of the disclosure;
Fig. 2 shows the flow charts of the step S101 of the data determination method of embodiment according to Fig. 1;
Fig. 3 shows the flow chart of the step S102 of the data determination method of embodiment according to Fig. 1;
Fig. 4 shows the flow chart of the step S103 of the data determination method of embodiment according to Fig. 1;
Fig. 5 shows the flow chart of the step S104 of the data determination method of embodiment according to Fig. 1;
Fig. 6 shows the structural block diagram of the data determining device according to one embodiment of the disclosure;
Fig. 7 shows the structural block diagram of the extraction module 601 of the data determining device of embodiment according to Fig.6,;
Fig. 8 shows the structural block diagram of the preprocessing module 602 of the data determining device of embodiment according to Fig.6,;
Fig. 9 shows the structural block diagram of the acquisition module 603 of the data determining device of embodiment according to Fig.6,;
Figure 10 shows the structural block diagram of the determining module 604 of the data determining device of embodiment according to Fig.6,;
Figure 11 shows the structural block diagram of the electronic equipment according to one embodiment of the disclosure;
Figure 12 is adapted for the knot for realizing the computer system of the data determination method according to one embodiment of the disclosure Structure schematic diagram.
Specific embodiment
Hereinafter, the illustrative embodiments of the disclosure will be described in detail with reference to the attached drawings, so that those skilled in the art can Easily realize them.In addition, for the sake of clarity, the portion unrelated with description illustrative embodiments is omitted in the accompanying drawings Point.
In the disclosure, it should be appreciated that the term of " comprising " or " having " etc. is intended to refer to disclosed in this specification Feature, number, step, behavior, the presence of component, part or combinations thereof, and be not intended to exclude other one or more features, A possibility that number, step, behavior, component, part or combinations thereof exist or are added.
It also should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure It can be combined with each other.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
The overall evaluation of a technical project that the embodiment of the present disclosure provides considers a plurality of types of features of user, and for different type Feature be handled differently, reuse the determination that the obtained feature of combination carries out user data, therefore, which can The accuracy that the data such as single probability determine under user is improved, reliable data is provided for platform, for trade company and supports.
Fig. 1 shows the flow chart of the data determination method according to one embodiment of the disclosure.As shown in Figure 1, the data The method of determination includes the following steps S101-S103:
In step s101, the user's history data in preset time period are obtained, and extract the feature of the historical data Vector;
In step s 102, in described eigenvector discrete features and continuous feature pre-process respectively;
In step s 103, training feature vector is obtained according to pretreated feature vector;
In step S104, data are obtained using training feature vector training and determine model, and according to the data Determine that model carries out data and determines.
Mentioned above, with the development of internet technology, more and more businessmans or service provider pass through internet Platform for user provides service.In order to improve service quality, promoted the usage experience of user, while improving placing an order for user Rate, many platforms all determine probability that active user places an order according to the characteristic information of single user under history.But the prior art into The lower single probability timing really of row, usually only considers the platform properties feature of user, feature is relatively simple, in addition, the prior art pair In all features execute identical processing operation, do not account for feature the characteristics of, this is allowed in the prior art for currently using The accuracy that the data such as single probability determine under family is more low, can not provide reliable data for platform, for trade company and support.
In view of drawbacks described above, in this embodiment, a kind of data determination method is proposed, this method has comprehensively considered use A plurality of types of features in family, and different types of feature is handled differently, it reuses the feature that combination obtains and is used The determination of user data, therefore, the technical solution can be improved the accuracy that the data such as single probability determine under user, are platform, are Trade company provides reliable data and supports.
In an optional implementation of the present embodiment, the user's history data refer to that the user once occurred Data, for example, lower forms data of some user on some platform, transaction data, click data, browsing data etc. Deng.Wherein, the user can may be multiple users for a user, certainly, in order to guarantee the determining standard of follow-up data True property, the user can be set as multiple users.
Wherein, the data refer to data relevant to a certain or certain user, with certain speciality, for example, User behavior data, user's operation data etc..
In an optional implementation of the present embodiment, data to be determined may include being sent out for a certain object of action Raw behavioral data, for example place an order, transaction occurs, clicks, browse etc..Wherein, the object of action such as can be a certain The objects such as businessman, a certain seller, a certain trade company, a certain service provider.
In an optional implementation of the present embodiment, the feature vector of the historical data includes in following characteristics It is one or more: identification characteristics, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics.Wherein, the mark Know feature to be used to be identified the unique identification of the user, such as ID of the user etc.;The attributive character is used In information such as the attribute informations, such as age, gender, occupation, health status for characterizing the user;The position feature is used for table Levy the position where the user, such as geographical location, latitude and longitude information, map interest point information etc.;The hobby feature For characterizing the preference information of the user, for example, it is service preferences, product preference, businessman's preference, cuisine taste preference, preferential quick Sensitivity, price preference, service preferences, resource preference etc.;The behavioural characteristic is used to characterize the behavioural information of the user, than Such as whether place an order, the object that places an order, the object whether clicked, clicked, the object for whether browsing, browsing, it is preferential before visitor's unit price, excellent Objective unit price, without privileges list accounting, lower single probability, lower single frequency, the accumulative coupons amount of money, coupons utilization rate, lower single canal after favour The frequency, the browsing frequency, visitation frequency, always completion list amount etc. are clicked in road;The label characteristics refer to the behavior of the user Label, for example, if goal behavior to be determined is set as lower single operation, if the user descended singly within a preset period of time, Then its corresponding label characteristics is just set as 1, otherwise, corresponding if the user is single without descending within a preset period of time Label characteristics are just set as 0.
Wherein, before the preset time period refers to a period, such as current time before current time 90 days.The specific length of the preset time period can be configured according to the needs of practical application, and the present invention does not make it specifically It limits.
In an optional implementation of the present embodiment, as shown in Fig. 2, the step S101, i.e. acquisition preset time User's history data in section, and the step of extracting the feature vector of the historical data, including step S201-S203:
In step s 201, the user's history data in preset time period are obtained;
In step S202, the identification characteristics, attributive character, position for extracting the user based on the historical data are special Sign, hobby feature, behavioural characteristic and label characteristics;
In step S203, by the identification characteristics of the user, attributive character, position feature, hobby feature, behavioural characteristic It combines with label characteristics, obtains the feature vector of the user.
In order to obtain a plurality of types of characteristics, raising data determine the accuracy rate of model, in this embodiment, from It is extracted in user's history data in preset time period and obtains identification characteristics, attributive character, position feature, the hobby of the user Feature, behavioural characteristic and label characteristics;Then again that the identification characteristics of the user, attributive character, position feature, hobby is special Sign, behavioural characteristic and label characteristics combine, obtain being subsequently used for carrying out data determine model training user feature to Amount.
In an optional implementation of the present embodiment, above-mentioned polymorphic type can be extracted from different user's history data Characteristic, especially user's history data acquisition channel difference when.It is in a certain trade company with data to be determined It, can be from obtaining unirecord information under user in the user's history behavioral data, then from the use for the behavioral data to place an order It is whether excessively single under the trade company that the user is obtained under family in unirecord information, if descending Dan Ze by the corresponding label characteristics of the user It is set as 1, otherwise, the corresponding label characteristics of the user is set as 0, in conjunction with the identification characteristics of the user, such as ID number, can be obtained Corresponding label data set: [User ID, label], wherein label indicates the corresponding label characteristics of the user.Then, from The information such as above-mentioned attributive character, position feature, hobby feature, behavioural characteristic, knot are obtained in the user's history behavioral data again Other corresponding characteristic set: [User ID, value1, value2...] can be obtained in the identification characteristics for closing the user. The label data set is merged with other characteristic set using user identifier feature as major key, so that it may be marked The feature vector of the user infused.
In an optional implementation of the present embodiment, as shown in figure 3, the step S102, i.e., to the feature to Discrete features and continuous feature in amount carry out pretreated step, including step S301-S302 respectively:
In step S301, the discrete features in described eigenvector are obtained, the discrete features are carried out at coding Reason, and generate the first index value and the First Eigenvalue of the discrete features;
In step s 302, the continuous feature in described eigenvector is obtained, the continuous feature is normalized Processing, and generate the second index value and Second Eigenvalue of the continuous feature.
In view of there may be different types of feature, such as discrete features and continuous feature in described eigenvector, In this embodiment, processing is distinguished for different types of feature.Specifically, it obtains in described eigenvector first Discrete features carry out coded treatment for the discrete features, and generate first index value and the first spy of the discrete features Value indicative, wherein the coded treatment can be one-hot coded treatment, and the first index value is used for special for described first Value indicative is indexed label, to distinguish different the First Eigenvalues;Then the continuous feature in described eigenvector is obtained, is considered Determine that there are linear combination parts in model to the subsequent DeepFM data used, it is therefore desirable to which the continuous feature is carried out Normalized determines that model finds the speed of optimal solution to improve DeepFM data, while generating the of the continuous feature Two index values and Second Eigenvalue, wherein similar as above, the second index value is used to carry out the Second Eigenvalue Index label, to distinguish different Second Eigenvalues.
In an optional implementation of the present embodiment, as shown in figure 4, the step S103, i.e., according to pretreatment after Feature vector the step of obtaining training feature vector, including step S401-S402:
In step S401, the first index value and second index value are combined, obtains aspect indexing set, described in combination The First Eigenvalue and Second Eigenvalue, obtain characteristic value collection;
In step S402, the aspect indexing set, characteristic value collection and corresponding label characteristics of the user are formed The training feature vector.
In a upper embodiment, differentiation processing has been carried out for different types of feature, has been terrible in this embodiment To complete training feature vector, it is also necessary to the different types of feature Jing Guo different disposal is combined, it is specifically, first First the first index value and second index value sequence are combined, aspect indexing set is obtained, by the First Eigenvalue It combines with Second Eigenvalue sequence, obtains characteristic value collection;Then by the aspect indexing set of the user, characteristic value collection It closes, and label characteristics corresponding with user composition is subsequently used for determining the data training spy that model is trained Levy vector.
In an optional implementation of the present embodiment, as shown in figure 5, the step S104, that is, utilize the training Feature vector training obtains data and determines model, and determines that model carries out the step that data determine, including step according to the data Rapid S501-S503:
In step S501, data are obtained using training feature vector training and determine model;
In step S502, the aspect indexing and characteristic value of user to be determined are obtained;
In step S503, the aspect indexing of the user to be determined and characteristic value are input to the data and determine model In, obtain the data definitive result of the user to be determined.
It in this embodiment, will be with instruction after obtaining data using training feature vector training and determining model The aspect indexing and characteristic value for practicing the similar user to be determined of feature vector content are input to the data that training obtains and determine model In, the data definitive result of the user to be determined can be obtained.
In an optional implementation of the present embodiment, the data determine that model is selected as DeepFM model, wherein institute Stating DeepFM model is model that is a kind of while extracting low order assemblage characteristic Yu higher order combination feature, is contained in its structure Factorization machine (Factorization Machine) part and the portion deep neural network (Deep Neural Networks) Point, therefore neural network and advantage of the Factorization machine in feature learning can be effectively combined, make feature combination more effective.
In an optional implementation of the present embodiment, the training feature vector can for the step S102 and Whole training feature vectors that S103 is obtained, at this point, the user to be determined can be selected as the user different from the user, than Such as, currently log in the platform but have not occurred the user of behavior, corresponding aspect indexing and characteristic value can according to it is above Similar method obtains, in addition, the dimension of feature corresponding with the user to be determined and content should with the training characteristics The dimension of training characteristics and content are consistent in vector, to guarantee that data determine the accuracy of model.
In an optional implementation of the present embodiment, the training feature vector may be the step S102 and Feature vector corresponding to the certain customers in training feature vector that S103 is obtained, and at this point, the user to be determined can To be selected as remaining user, at this point, the training feature vector, the aspect indexing of the user to be determined and characteristic value can be equal From the pretreated feature vector, alternatively, the user randomly selected from remaining user, similar as above, it is described The corresponding aspect indexing of user to be determined and characteristic value can be obtained according to method similar as above, only it should be noted that It is, at this point, not including this content of label characteristics in the characteristic value of the user to be determined, because the content is needed by institute It states data and determines what model was determined.
Then, the aspect indexing of the user to be determined and characteristic value are input to the DeepFM data and determine model In, the data definitive result of the user to be determined can be obtained.Wherein, the DeepFM model is existing in the prior art Model, the present invention repeat no more its working principle.
In an optional implementation of the present embodiment, if the user data is user behavior data, the use User data definitive result can behave as the probability value of behavior generation, such as the probability that the user places an order in a certain trade company.So, exist In the implementation, user to be determined is obtained after the probability that the trade company places an order in determination, it can also be according to the big of probability value Small to carry out descending arrangement for multiple users to be determined, user in the top is considered as the user most possibly to place an order, so Afterwards can be to user's execution predetermined registration operation of N before ranking, for example implement the incentive measure etc. of preset rules to it.Wherein, The incentive measure may include providing the discount coupon of preset cost, implementing completely to subtract measure, give integral etc..
Following is embodiment of the present disclosure, can be used for executing embodiments of the present disclosure.
Fig. 6 shows the structural block diagram of the data determining device according to one embodiment of the disclosure, which can be by soft Part, hardware or both are implemented in combination with as some or all of of electronic equipment.As shown in fig. 6, the data determine dress It sets and includes:
Extraction module 601 is configured as obtaining the user's history data in preset time period, and extracts the historical data Feature vector;
Preprocessing module 602, be configured in described eigenvector discrete features and continuous feature located in advance respectively Reason;
Module 603 is obtained, is configured as obtaining training feature vector according to pretreated feature vector;
Determining module 604 is configured as obtaining data using training feature vector training and determining model, and according to institute It states data and determines that model carries out data and determines.
Mentioned above, with the development of internet technology, more and more businessmans or service provider pass through internet Platform for user provides service.In order to improve service quality, promoted the usage experience of user, while improving placing an order for user Rate, many platforms all determine probability that active user places an order according to the characteristic information of single user under history.But the prior art into The lower single probability timing really of row, usually only considers the platform properties feature of user, feature is relatively simple, in addition, the prior art pair In all features execute identical processing operation, do not account for feature the characteristics of, this is allowed in the prior art for currently using The accuracy that the data such as single probability determine under family is more low, can not provide reliable data for platform, for trade company and support.
In view of drawbacks described above, in this embodiment, a kind of data determining device is proposed, which has comprehensively considered use A plurality of types of features in family, and different types of feature is handled differently, it reuses the feature that combination obtains and is used The determination of user data, therefore, the technical solution can be improved the accuracy that the data such as single probability determine under user, are platform, are Trade company provides reliable data and supports.
In an optional implementation of the present embodiment, the user's history data refer to that the user once occurred Data, for example, lower forms data of some user on some platform, transaction data, click data, browsing data etc. Deng.Wherein, the user can may be multiple users for a user, certainly, in order to guarantee the determining standard of follow-up data True property, the user can be set as multiple users.
Wherein, the data refer to data relevant to a certain or certain user, with certain speciality, for example, User behavior data, user's operation data etc..
In an optional implementation of the present embodiment, data to be determined may include being sent out for a certain object of action Raw behavioral data, for example place an order, transaction occurs, clicks, browse etc..Wherein, the object of action such as can be a certain The objects such as businessman, a certain seller, a certain trade company, a certain service provider.
In an optional implementation of the present embodiment, the feature vector of the historical data includes in following characteristics It is one or more: identification characteristics, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics.Wherein, the mark Know feature to be used to be identified the unique identification of the user, such as ID of the user etc.;The attributive character is used In information such as the attribute informations, such as age, gender, occupation, health status for characterizing the user;The position feature is used for table Levy the position where the user, such as geographical location, latitude and longitude information, map interest point information etc.;The hobby feature For characterizing the preference information of the user, for example, it is service preferences, product preference, businessman's preference, cuisine taste preference, preferential quick Sensitivity, price preference, service preferences, resource preference etc.;The behavioural characteristic is used to characterize the behavioural information of the user, than Such as whether place an order, the object that places an order, the object whether clicked, clicked, the object for whether browsing, browsing, it is preferential before visitor's unit price, excellent Objective unit price, without privileges list accounting, lower single probability, lower single frequency, the accumulative coupons amount of money, coupons utilization rate, lower single canal after favour The frequency, the browsing frequency, visitation frequency, always completion list amount etc. are clicked in road;The label characteristics refer to the behavior of the user Label, for example, if goal behavior to be determined is set as lower single operation, if the user descended singly within a preset period of time, Then its corresponding label characteristics is just set as 1, otherwise, corresponding if the user is single without descending within a preset period of time Label characteristics are just set as 0.
Wherein, before the preset time period refers to a period, such as current time before current time 90 days.The specific length of the preset time period can be configured according to the needs of practical application, and the present invention does not make it specifically It limits.
In an optional implementation of the present embodiment, as shown in fig. 7, the extraction module 601 includes:
First acquisition submodule 701 is configured as obtaining the user's history data in preset time period;
Extracting sub-module 702 is configured as extracting the identification characteristics of the user, attribute spy based on the historical data Sign, position feature, hobby feature, behavioural characteristic and label characteristics;
First combination submodule 703, is configured as the identification characteristics of the user, attributive character, position feature, hobby Feature, behavioural characteristic and label characteristics combine, and obtain the feature vector of the user.
In order to obtain a plurality of types of characteristics, improves data and determine that the accuracy rate of model mentions in this embodiment Take submodule 702 extracted from the user's history data in preset time period the identification characteristics for obtaining the user, attributive character, Position feature, hobby feature, behavioural characteristic and label characteristics;First combination submodule 703 again by the identification characteristics of the user, Attributive character, position feature, hobby feature, behavioural characteristic and label characteristics combine, and obtain being subsequently used for carrying out data true Determine the feature vector of the user of model training.
In an optional implementation of the present embodiment, extracting sub-module 702 can be from different user's history data When extracting the acquisition channel difference of the characteristic of above-mentioned polymorphic type, especially user's history data.With number to be determined According to for for the behavioral data that a certain trade company places an order, can believe from unirecord under user is obtained in the user's history behavioral data Breath, then from obtained in unirecord information under the user user whether under the trade company it is excessively single, if descending Dan Ze by the user Corresponding label characteristics are set as 1, otherwise, the corresponding label characteristics of the user are set as 0, in conjunction with the identification characteristics of the user, than Such as ID number, corresponding label data set can be obtained: [User ID, label], wherein label indicates the corresponding mark of the user Sign feature.Then, above-mentioned attributive character, position feature, hobby feature, behavior are obtained again from the user's history behavioral data Other corresponding characteristic set can be obtained in conjunction with the identification characteristics of the user in the information such as feature: [User ID, Value1, value2...].By the label data set and other characteristic set using user identifier feature as major key into Row merges, so that it may the feature vector of the user marked.
In an optional implementation of the present embodiment, as shown in figure 8, the preprocessing module 602 includes:
Encoding submodule 801, be configured as obtain described eigenvector in discrete features, for the discrete features into Row coded treatment, and generate the first index value and the First Eigenvalue of the discrete features;
Submodule 802 is normalized, is configured as obtaining the continuous feature in described eigenvector, for the continuous feature It is normalized, and generates the second index value and Second Eigenvalue of the continuous feature.
In view of there may be different types of feature, such as discrete features and continuous feature in described eigenvector, In this embodiment, processing is distinguished for different types of feature.Specifically, encoding submodule 801 obtains the spy The discrete features in vector are levied, coded treatment is carried out for the discrete features, and generate the first index of the discrete features Value and the First Eigenvalue, wherein the coded treatment can be one-hot coded treatment, the first index value be used for for The First Eigenvalue is indexed label, to distinguish different the First Eigenvalues;Normalization submodule 802 obtains the feature Continuous feature in vector, it is contemplated that the subsequent DeepFM data used determine that there are linear combination parts in model, therefore need The continuous feature is normalized, determine that model finds the speed of optimal solution to improve DeepFM data, together The second index value and Second Eigenvalue of continuous feature described in Shi Shengcheng, wherein similar as above, the second index value is used for Label is indexed for the Second Eigenvalue, to distinguish different Second Eigenvalues.
In an optional implementation of the present embodiment, as shown in figure 9, the acquisition module 603 includes:
Second combination submodule 901, is configured as combining the first index value and second index value, obtains aspect indexing Set, combines the First Eigenvalue and Second Eigenvalue, obtains characteristic value collection;
Third combines submodule 902, is configured as the aspect indexing set of the user, characteristic value collection and corresponding Label characteristics form the training feature vector.
In a upper embodiment, differentiation processing has been carried out for different types of feature, has been terrible in this embodiment To complete training feature vector, it is also necessary to the different types of feature Jing Guo different disposal is combined, specifically, the Two combination submodules 901 combine the first index value and second index value sequence, aspect indexing set are obtained, by institute It states the First Eigenvalue and Second Eigenvalue sequence combines, obtain characteristic value collection;Third combines submodule 902 for the use Aspect indexing set, the characteristic value collection at family, and label characteristics corresponding with user composition are subsequently used for for the number According to the training feature vector for determining that model is trained.
In an optional implementation of the present embodiment, as shown in Figure 10, the determining module 604 includes:
Training submodule 1001 is configured as obtaining data using training feature vector training and determining model;
Second acquisition submodule 1002 is configured as obtaining the aspect indexing and characteristic value of user to be determined;
It determines submodule 1003, is configured as the aspect indexing of the user to be determined and characteristic value being input to the number According to determining in model, the data definitive result of the user to be determined is obtained.
In this embodiment, data are obtained using training feature vector training in training submodule 1001 and determines mould After type, it is similar with training feature vector content to be determined to determine that submodule 1003 obtains the second acquisition submodule 1002 The aspect indexing and characteristic value of user is input to the data that training obtains and determines in model, can be obtained the user's to be determined Data definitive result.
In an optional implementation of the present embodiment, the data determine that model is selected as DeepFM model, wherein institute Stating DeepFM model is model that is a kind of while extracting low order assemblage characteristic Yu higher order combination feature, is contained in its structure Factorization machine (Factorization Machine) part and the portion deep neural network (Deep Neural Networks) Point, therefore neural network and advantage of the Factorization machine in feature learning can be effectively combined, make feature combination more effective.
In an optional implementation of the present embodiment, the training feature vector can be the preprocessing module The 602 whole training feature vectors obtained with acquisition module 603, at this point, the user to be determined can be selected as and the user Different users, for example, currently logging in the platform but having not occurred the user of behavior, corresponding aspect indexing and characteristic value can To be obtained according to method similar as above, in addition, the dimension of feature corresponding with the user to be determined and content should be with The dimension of training characteristics and content are consistent in the training feature vector, to guarantee that data determine the accuracy of model.
In an optional implementation of the present embodiment, the training feature vector may be the preprocessing module 602 and obtain feature vector corresponding to certain customers in the obtained training feature vectors of module 603, and at this point, it is described to Determine that user can be selected as remaining user, at this point, the training feature vector, the aspect indexing of the user to be determined and Characteristic value can be all from the pretreated feature vector, and upper alternatively, the user randomly selected from remaining user Text is similar, and the corresponding aspect indexing of the user to be determined and characteristic value can be obtained according to method similar as above, only It is it should be noted that at this point, not including this content of label characteristics in the characteristic value of the user to be determined, because of the content It needs to determine that model is determined by the data.
Then, it is determined that the aspect indexing of the user to be determined and characteristic value are input to the DeepFM by submodule 1003 Data determine in model, and the data definitive result of the user to be determined can be obtained.Wherein, the DeepFM model is existing Model present in technology, the present invention repeat no more its working principle.
In an optional implementation of the present embodiment, if the user data is user behavior data, the use User data definitive result can behave as the probability value of behavior generation, such as the probability that the user places an order in a certain trade company.So, exist In the implementation, user to be determined is obtained after the probability that the trade company places an order in determination, it can also be according to the big of probability value Small to carry out descending arrangement for multiple users to be determined, user in the top is considered as the user most possibly to place an order, so Afterwards can be to user's execution predetermined registration operation of N before ranking, for example implement the incentive measure etc. of preset rules to it.Wherein, The incentive measure may include providing the discount coupon of preset cost, implementing completely to subtract measure, give integral etc..
The disclosure also discloses a kind of electronic equipment, and Figure 11 shows the knot of the electronic equipment according to one embodiment of the disclosure Structure block diagram, as shown in figure 11, the electronic equipment 1100 include memory 1101 and processor 1102;Wherein,
The memory 1101 is for storing one or more computer instruction, wherein one or more computer Instruction is executed by the processor 1102 to realize following methods step:
The user's history data in preset time period are obtained, and extract the feature vector of the historical data;
To in described eigenvector discrete features and continuous feature pre-process respectively;
Training feature vector is obtained according to pretreated feature vector;
Data are obtained using training feature vector training and determine model, and determine that model is counted according to the data According to determination.
In an optional implementation of the present embodiment, described eigenvector includes one of following characteristics or more Kind: identification characteristics, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics.
In an optional implementation of the present embodiment, the user's history data obtained in preset time period, and Extract the feature vector of the historical data, comprising:
Obtain the user's history data in preset time period;
Identification characteristics, attributive character, position feature, the hobby feature, row of the user are extracted based on the historical data It is characterized and label characteristics;
By the identification characteristics of the user, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics group Altogether, the feature vector of the user is obtained.
In an optional implementation of the present embodiment, the discrete features in described eigenvector and continuous spy Sign is pre-processed respectively, comprising:
The discrete features in described eigenvector are obtained, coded treatment are carried out for the discrete features, and described in generation The first index value and the First Eigenvalue of discrete features;
The continuous feature in described eigenvector is obtained, the continuous feature is normalized, and generate institute State the second index value and Second Eigenvalue of continuous feature.
It is described that training characteristics are obtained according to pretreated feature vector in an optional implementation of the present embodiment Vector, comprising:
Combine the first index value and second index value, obtain aspect indexing set, combine the First Eigenvalue and Second Eigenvalue obtains characteristic value collection;
By the aspect indexing set, characteristic value collection and corresponding label characteristics of the user form the training characteristics to Amount.
In an optional implementation of the present embodiment, it is true that utilization training feature vector training obtains data Cover half type, and determine that model carries out data and determines according to the data, comprising:
Data, which are obtained, using training feature vector training determines model;
Obtain the aspect indexing and characteristic value of user to be determined;
The aspect indexing of the user to be determined and characteristic value are input to the data to determine in model, obtain it is described to Determine the data definitive result of user.
In an optional implementation of the present embodiment, the data determine that model is DeepFM model.
In an optional implementation of the present embodiment, the training feature vector, the feature of the user to be determined Index and characteristic value are all from the pretreated feature vector.
Figure 12 is suitable for being used to realizing that the structure of the computer system of the data determination method according to disclosure embodiment is shown It is intended to.
As shown in figure 12, computer system 1200 include central processing unit (CPU) 1201, can according to be stored in only It reads the program in memory (ROM) 1202 or is loaded into random access storage device (RAM) 1203 from storage section 1208 Program and execute the various processing in above embodiment.In RAM1203, be also stored with system 1200 operate it is required various Program and data.CPU1201, ROM1202 and RAM1203 are connected with each other by bus 1204.Input/output (I/O) interface 1205 are also connected to bus 1204.
I/O interface 1205 is connected to lower component: the importation 1206 including keyboard, mouse etc.;Including such as cathode The output par, c 1207 of ray tube (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section including hard disk etc. 1208;And the communications portion 1209 of the network interface card including LAN card, modem etc..Communications portion 1209 passes through Communication process is executed by the network of such as internet.Driver 1212 is also connected to I/O interface 1205 as needed.It is detachable to be situated between Matter 1211, such as disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 1212, so as to In being mounted into storage section 1208 as needed from the computer program read thereon.
Particularly, according to embodiment of the present disclosure, method as described above may be implemented as computer software programs. For example, embodiment of the present disclosure includes a kind of computer program product comprising be tangibly embodied in and its readable medium on Computer program, the computer program includes program code for executing above-mentioned data determination method.In such reality It applies in mode, which can be downloaded and installed from network by communications portion 1209, and/or is situated between from detachable Matter 1211 is mounted.
Flow chart and block diagram in attached drawing illustrate system, method and computer according to the various embodiments of the disclosure The architecture, function and operation in the cards of program product.In this regard, each box in course diagram or block diagram can be with A part of a module, section or code is represented, a part of the module, section or code includes one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer The combination of order is realized.
Being described in unit or module involved in disclosure embodiment can be realized by way of software, can also It is realized in a manner of through hardware.Described unit or module also can be set in the processor, these units or module Title do not constitute the restriction to the unit or module itself under certain conditions.
As on the other hand, the disclosure additionally provides a kind of computer readable storage medium, the computer-readable storage medium Matter can be computer readable storage medium included in device described in above embodiment;It is also possible to individualism, Without the computer readable storage medium in supplying equipment.Computer-readable recording medium storage has one or more than one journey Sequence, described program is used to execute by one or more than one processor is described in disclosed method.
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.
The present disclosure discloses A1, a kind of data determination method, comprising: the user's history data in preset time period are obtained, And extract the feature vector of the historical data;To in described eigenvector discrete features and continuous feature located in advance respectively Reason;Training feature vector is obtained according to pretreated feature vector;It is true that data are obtained using training feature vector training Cover half type, and determine that model carries out data and determines according to the data.A2, method according to a1, described eigenvector packet Include one of following characteristics or a variety of: identification characteristics, attributive character, position feature, hobby feature, behavioural characteristic and label are special Sign.A3, the method according to A2, the user's history data obtained in preset time period, and extract the historical data Feature vector, comprising: obtain preset time period in user's history data;Extract the user's based on the historical data Identification characteristics, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics;By the identification characteristics of the user, Attributive character, position feature, hobby feature, behavioural characteristic and label characteristics combine, and obtain the feature vector of the user. A4, according to any method of A1-A3, the discrete features in described eigenvector and continuous feature carry out in advance respectively Processing, comprising: obtain the discrete features in described eigenvector, coded treatment is carried out for the discrete features, and generate institute State the first index value and the First Eigenvalue of discrete features;The continuous feature in described eigenvector is obtained, for described continuous Feature is normalized, and generates the second index value and Second Eigenvalue of the continuous feature.It is A5, according to a4 Method, it is described that training feature vector is obtained according to pretreated feature vector, comprising: to combine the first index value and second Index value obtains aspect indexing set, combines the First Eigenvalue and Second Eigenvalue, obtains characteristic value collection;It will be described Aspect indexing set, characteristic value collection and the corresponding label characteristics of user form the training feature vector.A6, according to A1- Any method of A5, utilization training feature vector training obtain data and determine model, and according to the data Determine that model carries out data and determines, comprising: obtain data using training feature vector training and determine model;It obtains to be determined The aspect indexing and characteristic value of user;The aspect indexing of the user to be determined and characteristic value are input to the data and determine mould In type, the data definitive result of the user to be determined is obtained.A7, the method according to A6, the data determine that model is DeepFM model.A8, the method according to A6 or A7, the training feature vector, the aspect indexing of the user to be determined The pretreated feature vector is all from characteristic value.
The present disclosure discloses B9, a kind of data determining device, comprising: extraction module is configured as obtaining preset time period Interior user's history data, and extract the feature vector of the historical data;Preprocessing module, be configured as to the feature to Discrete features and continuous feature in amount are pre-processed respectively;Obtain module, be configured as according to pretreated feature to Amount obtains training feature vector;Determining module is configured as obtaining data using training feature vector training and determining model, And determine that model carries out data and determines according to the data.B10, the device according to B9, described eigenvector include following One of feature is a variety of: identification characteristics, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics. B11, device according to b10, the extraction module include: the first acquisition submodule, are configured as obtaining preset time period Interior user's history data;Extracting sub-module is configured as being extracted the identification characteristics of the user based on the historical data, be belonged to Property feature, position feature, hobby feature, behavioural characteristic and label characteristics;First combination submodule, is configured as the user Identification characteristics, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics combine, obtain the user Feature vector.B12, according to any device of B9-B11, the preprocessing module includes: encoding submodule, is configured To obtain the discrete features in described eigenvector, coded treatment is carried out for the discrete features, and generate the discrete spy The first index value and the First Eigenvalue of sign;Submodule is normalized, is configured as obtaining the continuous feature in described eigenvector, The continuous feature is normalized, and generates the second index value and Second Eigenvalue of the continuous feature. B13, device according to b12, the acquisition module include: the second combination submodule, are configured as combining first rope Draw value and second index value, obtain aspect indexing set, combine the First Eigenvalue and Second Eigenvalue, obtains characteristic value collection It closes;Third combines submodule, is configured as the aspect indexing set, characteristic value collection and corresponding label characteristics of the user Form the training feature vector.B14, according to any device of B9-B13, the determining module includes: trained submodule Block is configured as obtaining data using training feature vector training and determining model;Second acquisition submodule is configured as obtaining Take the aspect indexing and characteristic value of user to be determined;It determines submodule, is configured as the aspect indexing of the user to be determined The data are input to characteristic value to determine in model, obtain the data definitive result of the user to be determined.B15, according to B14 The device, the data determine that model is DeepFM model.B16, the device according to B14 or B15, the training are special Sign vector, the aspect indexing of the user to be determined and characteristic value are all from the pretreated feature vector.
The present disclosure discloses C17, a kind of electronic equipment, including memory and processor;Wherein, the memory is for depositing Store up one or more computer instruction, wherein one or more computer instruction by the processor execute with realize with Lower method and step: the user's history data in preset time period are obtained, and extract the feature vector of the historical data;To described Discrete features and continuous feature in feature vector are pre-processed respectively;It is special that training is obtained according to pretreated feature vector Levy vector;Data are obtained using training feature vector training and determine model, and determine that model is counted according to the data According to determination.C18, the electronic equipment according to C17, described eigenvector include one of following characteristics or a variety of: mark Feature, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics.C19, the electronic equipment according to C18, User's history data in the acquisition preset time period, and extract the feature vector of the historical data, comprising: it obtains default User's history data in period;The identification characteristics, attributive character, position for extracting the user based on the historical data are special Sign, hobby feature, behavioural characteristic and label characteristics;The identification characteristics of the user, attributive character, position feature, hobby is special Sign, behavioural characteristic and label characteristics combine, and obtain the feature vector of the user.It is C20, any described according to C17-C19 Electronic equipment, the discrete features in described eigenvector and continuous feature pre-process respectively, comprising: obtain institute The discrete features in feature vector are stated, coded treatment are carried out for the discrete features, and generate the first of the discrete features Index value and the First Eigenvalue;The continuous feature in described eigenvector is obtained, place is normalized for the continuous feature Reason, and generate the second index value and Second Eigenvalue of the continuous feature.C21, the electronic equipment according to C20, it is described Training feature vector is obtained according to pretreated feature vector, comprising: is combined the first index value and second index value, is obtained To aspect indexing set, the First Eigenvalue and Second Eigenvalue are combined, characteristic value collection is obtained;By the feature of the user Index set, characteristic value collection and corresponding label characteristics form the training feature vector.C22, according to any institute of C17-C21 The electronic equipment stated, utilization training feature vector training obtains data and determines model, and is determined according to the data Model carries out data and determines, comprising: obtains data using training feature vector training and determines model;Obtain user to be determined Aspect indexing and characteristic value;The aspect indexing of the user to be determined and characteristic value are input to the data and determine model In, obtain the data definitive result of the user to be determined.C23, the electronic equipment according to C22, the data determine mould Type is DeepFM model.C24, the electronic equipment according to C22 or C23, the training feature vector, the user to be determined Aspect indexing and characteristic value be all from the pretreated feature vector.
The disclosure also discloses D25, a kind of computer readable storage medium, is stored thereon with computer instruction, the calculating Machine instruction realizes the described in any item method and steps of A1-A8 when being executed by processor.

Claims (10)

1. a kind of data determination method characterized by comprising
The user's history data in preset time period are obtained, and extract the feature vector of the historical data;
To in described eigenvector discrete features and continuous feature pre-process respectively;
Training feature vector is obtained according to pretreated feature vector;
Data are obtained using training feature vector training and determine model, and determine that model progress data are true according to the data It is fixed.
2. the method according to claim 1, wherein described eigenvector includes one of following characteristics or more Kind: identification characteristics, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics.
3. according to the method described in claim 2, it is characterized in that, it is described obtain preset time period in user's history data, And extract the feature vector of the historical data, comprising:
Obtain the user's history data in preset time period;
Identification characteristics, attributive character, position feature, the hobby feature, behavior spy of the user are extracted based on the historical data It seeks peace label characteristics;
The identification characteristics of the user, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics group are closed Come, obtains the feature vector of the user.
4. method according to claim 1 to 3, which is characterized in that the discrete features in described eigenvector It is pre-processed respectively with continuous feature, comprising:
The discrete features in described eigenvector are obtained, coded treatment are carried out for the discrete features, and generate described discrete The first index value and the First Eigenvalue of feature;
The continuous feature in described eigenvector is obtained, the continuous feature is normalized, and generate the company The second index value and Second Eigenvalue of continuous feature.
5. a kind of data determining device characterized by comprising
Extraction module is configured as obtaining the user's history data in preset time period, and extracts the feature of the historical data Vector;
Preprocessing module, be configured as in described eigenvector discrete features and continuous feature pre-process respectively;
Module is obtained, is configured as obtaining training feature vector according to pretreated feature vector;
Determining module is configured as obtaining data using training feature vector training and determining model, and according to the data Determine that model carries out data and determines.
6. device according to claim 5, which is characterized in that described eigenvector includes one of following characteristics or more Kind: identification characteristics, attributive character, position feature, hobby feature, behavioural characteristic and label characteristics.
7. device according to claim 6, which is characterized in that the extraction module includes:
First acquisition submodule is configured as obtaining the user's history data in preset time period;
Extracting sub-module, the identification characteristics, attributive character, position for being configured as extracting the user based on the historical data are special Sign, hobby feature, behavioural characteristic and label characteristics;
First combination submodule is configured as the identification characteristics of the user, attributive character, position feature, hobby feature, row It is characterized and combines with label characteristics, obtain the feature vector of the user.
8. according to any device of claim 5-7, which is characterized in that the preprocessing module includes:
Encoding submodule is configured as obtaining the discrete features in described eigenvector, the discrete features is encoded Processing, and generate the first index value and the First Eigenvalue of the discrete features;
Submodule is normalized, is configured as obtaining the continuous feature in described eigenvector, the continuous feature is returned One change processing, and generate the second index value and Second Eigenvalue of the continuous feature.
9. a kind of electronic equipment, which is characterized in that including memory and processor;Wherein,
The memory is for storing one or more computer instruction, wherein one or more computer instruction is by institute Processor is stated to execute to realize following methods step:
The user's history data in preset time period are obtained, and extract the feature vector of the historical data;
To in described eigenvector discrete features and continuous feature pre-process respectively;
Training feature vector is obtained according to pretreated feature vector;
Data are obtained using training feature vector training and determine model, and determine that model progress data are true according to the data It is fixed.
10. a kind of computer readable storage medium, is stored thereon with computer instruction, which is characterized in that the computer instruction quilt Claim 1-4 described in any item method and steps are realized when processor executes.
CN201811593719.5A 2018-12-25 2018-12-25 Data determination method, device, electronic equipment and computer readable storage medium Pending CN109685574A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811593719.5A CN109685574A (en) 2018-12-25 2018-12-25 Data determination method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811593719.5A CN109685574A (en) 2018-12-25 2018-12-25 Data determination method, device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN109685574A true CN109685574A (en) 2019-04-26

Family

ID=66189386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811593719.5A Pending CN109685574A (en) 2018-12-25 2018-12-25 Data determination method, device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109685574A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348215A (en) * 2019-07-16 2019-10-18 深圳众赢维融科技有限公司 Exception object recognition methods, device, electronic equipment and medium
CN110795638A (en) * 2019-11-13 2020-02-14 北京百度网讯科技有限公司 Method and apparatus for outputting information
CN110807681A (en) * 2019-09-10 2020-02-18 咪咕文化科技有限公司 Product customization method, electronic device and storage medium
CN111709784A (en) * 2020-06-18 2020-09-25 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating user retention time

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016016719A2 (en) * 2014-08-01 2016-02-04 Hitrader Technology Limited Online trading systems and methods
CN107578294A (en) * 2017-09-28 2018-01-12 北京小度信息科技有限公司 User's behavior prediction method, apparatus and electronic equipment
CN107808246A (en) * 2017-10-26 2018-03-16 上海维信荟智金融科技有限公司 The intelligent evaluation method and system of collage-credit data
CN107944913A (en) * 2017-11-21 2018-04-20 重庆邮电大学 High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016016719A2 (en) * 2014-08-01 2016-02-04 Hitrader Technology Limited Online trading systems and methods
CN107578294A (en) * 2017-09-28 2018-01-12 北京小度信息科技有限公司 User's behavior prediction method, apparatus and electronic equipment
CN107808246A (en) * 2017-10-26 2018-03-16 上海维信荟智金融科技有限公司 The intelligent evaluation method and system of collage-credit data
CN107944913A (en) * 2017-11-21 2018-04-20 重庆邮电大学 High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348215A (en) * 2019-07-16 2019-10-18 深圳众赢维融科技有限公司 Exception object recognition methods, device, electronic equipment and medium
CN110807681A (en) * 2019-09-10 2020-02-18 咪咕文化科技有限公司 Product customization method, electronic device and storage medium
CN110795638A (en) * 2019-11-13 2020-02-14 北京百度网讯科技有限公司 Method and apparatus for outputting information
CN111709784A (en) * 2020-06-18 2020-09-25 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating user retention time

Similar Documents

Publication Publication Date Title
CN109685574A (en) Data determination method, device, electronic equipment and computer readable storage medium
CN109522483B (en) Method and device for pushing information
CN109903095A (en) Data processing method, device, electronic equipment and computer readable storage medium
CN108268934A (en) Recommendation method and apparatus, electronic equipment, medium, program based on deep learning
CN107958382A (en) Abnormal behaviour recognition methods, device, electronic equipment and storage medium
CN109145280A (en) The method and apparatus of information push
CN108536694A (en) Estimation method, device and the terminal device of user preference
US10007645B2 (en) Modifying the presentation of a content item
KR20110032878A (en) Keyword ad. method and system for social networking service
CN109299981A (en) A kind of advertisement recommended method and device
CN109783741A (en) Method and apparatus for pushed information
CN108228463A (en) For detecting the method and apparatus of initial screen time
CN110298716A (en) Information-pushing method and device
CN108777701A (en) A kind of method and device of determining receiver
CN110413872A (en) Method and apparatus for showing information
US20170142119A1 (en) Method for creating group user profile, electronic device, and non-transitory computer-readable storage medium
CN109711917A (en) Information-pushing method and device
CN113407854A (en) Application recommendation method, device and equipment and computer readable storage medium
US10304081B1 (en) Yielding content recommendations based on serving by probabilistic grade proportions
CN111858873A (en) Method and device for determining recommended content, electronic equipment and storage medium
CN113010798A (en) Information recommendation method, information recommendation device, electronic equipment and readable storage medium
CN111429214A (en) Transaction data-based buyer and seller matching method and device
JP7206761B2 (en) Information processing equipment
CN107153697A (en) Product search method and device in a kind of commodity transaction website
CN111967970B (en) Bank product recommendation method and device based on spark platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190426

RJ01 Rejection of invention patent application after publication