CN114996590A - Classification method, classification device, classification equipment and storage medium - Google Patents

Classification method, classification device, classification equipment and storage medium Download PDF

Info

Publication number
CN114996590A
CN114996590A CN202210929907.0A CN202210929907A CN114996590A CN 114996590 A CN114996590 A CN 114996590A CN 202210929907 A CN202210929907 A CN 202210929907A CN 114996590 A CN114996590 A CN 114996590A
Authority
CN
China
Prior art keywords
user
classification model
classification
samples
labels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210929907.0A
Other languages
Chinese (zh)
Inventor
武晋琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Smk Network Technology Co ltd
Original Assignee
Shanghai Smk Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Smk Network Technology Co ltd filed Critical Shanghai Smk Network Technology Co ltd
Priority to CN202210929907.0A priority Critical patent/CN114996590A/en
Publication of CN114996590A publication Critical patent/CN114996590A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention discloses a classification method, a classification device, classification equipment and a storage medium, wherein the method comprises the following steps: acquiring user characteristic data corresponding to the characteristic labels of the users to be classified; respectively inputting the user characteristic data into each classification model of a plurality of classification models to obtain the probability calculated by each classification model; calculating an average of a plurality of said probabilities; determining the category of the user to be classified according to the average value; each classification model is obtained by training based on user feature data corresponding to a part of feature labels in a training sample. The classification method provided by the embodiment of the invention can improve the accuracy of classifying the users.

Description

Classification method, classification device, classification equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a classification method, apparatus, device, and storage medium.
Background
In some scenarios, before a merchant recommends goods or services for a user, the user needs to be classified first, and whether the user needs goods or services is determined according to the classification of the user.
In the existing method, data of a user to be classified is used as a sample to train a classifier, and the trained classifier is used to determine the category of the user to be classified.
When the existing method for classifying the users is adopted for classifying the users, the accuracy is low.
Disclosure of Invention
Embodiments of the present invention provide a classification method, apparatus, device, and storage medium, which can calculate a probability of a user to be classified according to a classification model obtained by training user feature data corresponding to a part of feature labels in a training sample, thereby accurately determining a category to which the user to be classified belongs, and implementing improvement of accuracy of user classification.
In a first aspect, an embodiment of the present invention provides a classification method, where the method includes: acquiring user characteristic data corresponding to the characteristic labels of the users to be classified; respectively inputting the user characteristic data into each classification model of the plurality of classification models to obtain the probability calculated by each classification model; calculating an average of the plurality of probabilities; determining the category of the user to be classified according to the average value; each classification model is obtained by training based on user feature data corresponding to part of feature labels in a training sample.
In a possible implementation manner, before obtaining the user feature data of the user to be classified, the method further includes: obtaining M first samples and N second samples, wherein the first samples comprise category labels of classified users and user feature data corresponding to the feature labels, and the second samples comprise user feature data corresponding to the feature labels of unclassified users; splicing the user characteristic data of the M first samples and the user characteristic data of the N second samples to obtain P training sample sets; aiming at each classification model to be trained in a plurality of classification models to be classified, the following steps are respectively executed: training the classification model to be trained by adopting partial feature labels according to a training sample set corresponding to the classification model to be trained to obtain a classification model; wherein M is a positive integer greater than 0, N is a positive integer greater than M, and P is a positive integer greater than 0.
In one possible implementation, the classification model to be trained comprises a decision tree classification model to be trained; the classification model comprises a decision tree classification model.
In one possible implementation, before obtaining the M first samples and the N second samples, the method further includes: obtaining a plurality of first initial samples and a plurality of second initial samples, wherein the first initial samples comprise category labels and user characteristic data of classified users, and the second initial samples comprise user characteristic data of unclassified users; classifying the user characteristic data in the plurality of first initial samples and the plurality of second initial samples to obtain a plurality of characteristic labels; taking the class labels of the classified users and the user feature data corresponding to the feature labels in the first initial sample as a first sample; taking the user characteristic data corresponding to the characteristic label of the unclassified user in the second initial sample as a second sample; wherein each feature tag corresponds to a type of user feature data.
In a possible implementation manner, determining the category to which the user to be classified belongs according to the average value specifically includes: comparing the average value with a preset threshold value; and when the average value is larger than the threshold value, determining that the user to be classified belongs to the target user.
In a second aspect, an embodiment of the present invention further provides a classification apparatus, where the apparatus includes: the first acquisition module is used for acquiring user characteristic data corresponding to the characteristic labels of the users to be classified; the first calculation module is used for respectively inputting the user characteristic data into each classification model in the plurality of classification models to obtain the probability calculated by each classification model; a second calculation module for calculating an average of the plurality of probabilities; the determining module is used for determining the category of the user to be classified according to the average value; each classification model is obtained by training based on user feature data corresponding to a part of feature labels in a training sample.
In one possible implementation, the apparatus further includes: the second acquisition module is used for acquiring M first samples and N second samples, wherein the first samples comprise category labels of classified users and user characteristic data corresponding to the characteristic labels, and the second samples comprise user characteristic data corresponding to the characteristic labels of unclassified users; the splicing module is used for splicing the user characteristic data of the M first samples and the user characteristic data of the N second samples to obtain P training sample sets; the training module is used for respectively executing the following steps aiming at each classification model to be trained in a plurality of classification models to be classified: training the classification model to be trained according to the training sample set corresponding to the classification model to be trained to obtain a classification model; wherein M is a positive integer greater than 0, N is a positive integer greater than M, and P is a positive integer greater than 0.
In one possible implementation, the classification model to be trained comprises a decision tree classification model to be trained; the classification model comprises a decision tree classification model.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, performs the method as in the first aspect or any possible implementation of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer storage medium having stored thereon computer program instructions that, when executed by a processor, implement a method as in the first aspect or any possible implementation manner of the first aspect.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects: according to the embodiment of the invention, the acquired user characteristic data corresponding to the characteristic labels of the users to be classified are input into each classification model of the multiple classification models to obtain the probability calculated by each classification model, then the average value of the multiple probabilities is calculated, and the category of the users to be classified is determined according to the average value.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a classification method according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a method for determining a category to which a user to be classified belongs according to an average value according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating a method for obtaining a classification model through training according to an embodiment of the present invention.
Fig. 4 is a schematic flowchart of a method for obtaining a feature tag according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a sorting apparatus according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In some scenarios, before a merchant recommends goods or services for a user, the user needs to be classified first, and whether the user needs goods or services is determined according to the classification of the user.
In the existing method, data of a user to be classified is used as a sample to train a classifier, and the trained classifier is used to determine the category of the user to be classified.
When the existing method for classifying the users is adopted for classifying the users, the accuracy is low.
The applicant finds that when the existing user classification method is used for user classification, the types of feature labels of samples used in classifier training are the same, so that the trained classifiers are almost the same, the generalization capability of a classification model is reduced, and the accuracy of a classification result is low.
Embodiments of the present invention provide a classification method, apparatus, device, and storage medium, which can calculate a probability of a user to be classified according to a classification model obtained by training based on user feature data corresponding to a part of feature labels in a training sample, thereby accurately determining a category to which the user to be classified belongs, and achieving an improvement in accuracy of classifying the user.
A classification method provided by the embodiment of the present invention will be described in detail below with reference to fig. 1.
As shown in fig. 1, the method may include the steps of: s110, user characteristic data corresponding to the characteristic labels of the users to be classified are obtained.
The unclassified user is a user to which the class of the unclassified user belongs, and the user to be classified may be any one of the unclassified users.
The feature labels may be labels selected according to classification purpose.
For example, if the classification purpose is to determine whether the user to be classified is a user who intends to purchase insurance services, the feature tag may select a tag related to the insurance services, and the feature tag may include the number of installations of the application software for insurance on the near day, the academic calendar, the age, the sex, the city or income, and the like.
The user feature data may include data corresponding to each feature tag of the user to be classified, for example, regarding a certain user to be classified, when the feature tag is the installation number of the near-day insurance application software, the user feature data is the installation number of the near-day insurance application software N times, where N is an integer greater than or equal to 0; when the feature tag is age, the user feature data is 40 years old.
The server collects user characteristic data corresponding to the characteristic labels of the unclassified users, stores the user characteristic data in a database of the server, and obtains user characteristic data corresponding to the characteristic labels of any unclassified user from the database so as to classify the unclassified users.
And S120, respectively inputting the user characteristic data into each classification model of the plurality of classification models to obtain the probability calculated by each classification model.
Each classification model is obtained by training based on user feature data corresponding to a part of feature labels in a training sample.
The user characteristic data corresponding to part of characteristic labels in the training sample is adopted to train the classification model to be trained to obtain a plurality of classification models, the user characteristic data of the user to be classified is respectively input into each classification model, each classification model calculates a probability, and the probability can represent the probability that the user to be classified is the user with service requirements. Because the feature labels adopted during training of each classification model are not identical, the probability that the user to be classified is the user with the service requirement can be reflected by the multiple probabilities obtained by calculation of the multiple classification models from different dimensions of different feature labels, and cannot be guided by some feature labels, so that the generalization capability of the classification models is improved, and the probability that the user to be classified is the user with the service requirement can be reflected more accurately.
S130, calculating the average value of the plurality of probabilities.
A plurality of probabilities are calculated from the plurality of classification models, and an average of the plurality of probabilities is calculated.
The average value may represent an average probability that the user to be classified has a service demand.
The probability of the user to be classified having the service requirement can be reflected from the dimensionalities of the feature labels by the multiple probabilities calculated by the classification models, so that the average probability of the user to be classified having the service requirement can be reflected from the dimensionalities of the feature labels by the average value, and the average probability cannot be dominated by certain feature labels, and the average probability of the user to be classified being the user having the service requirement can be reflected more accurately.
And S140, determining the category of the user to be classified according to the average value.
The category to which the user to be classified belongs may be a target user or a non-target user.
The target user represents a user with a service requirement, and the non-target user represents a user without a service requirement.
In some embodiments, a flow of the method for determining the category to which the user to be classified belongs according to the average value is shown in fig. 2, and specifically includes the following steps: and S210, comparing the average value with a preset threshold value.
The technician may preset a preset threshold.
And comparing the average value of the plurality of probabilities with a preset threshold value.
And S220, when the average value is larger than the threshold value, determining that the user to be classified belongs to the target user.
And when the average value is larger than the preset threshold value, indicating that the user to be classified is a user with service requirement and belongs to a target user.
In one example, the preset threshold is 0.9, and the average probability of the user to be classified is 0.95, then the user to be classified belongs to the target user.
And when the average value is less than or equal to the preset threshold value, the user to be classified is not the user with the service requirement and belongs to a non-target user.
In one example, the preset threshold is 0.9, and the average probability of the user to be classified is 0.85, then the user to be classified belongs to a non-target user.
The method provided by the embodiment of the invention comprises the steps of firstly inputting the obtained user characteristic data corresponding to the characteristic label of the user to be classified into each classification model of a plurality of classification models to obtain the probability calculated by each classification model, then calculating the average value of the probabilities, and determining the category of the user to be classified according to the average value.
The above illustrates a classification method, and the following illustrates a method for obtaining a classification model through training in some embodiments of the present invention.
As shown in fig. 3, the method for obtaining a classification model through training includes the following steps: s310, M first samples and N second samples are obtained, the first samples comprise category labels of classified users and user feature data corresponding to the feature labels, and the second samples comprise user feature data corresponding to the feature labels of unclassified users.
Wherein M is a positive integer greater than 0, and N is a positive integer greater than M.
The classified users are users who have determined user category labels, which may be purchased services. In one example, the classified users are users who purchased insurance services and the category labels are users who purchased insurance services.
Unclassified users may include users who have not purchased a service, and unclassified users may or may not be users who have a service need.
In one example, the user to be classified has not purchased insurance services, which may or may not be a user who is interested in purchasing insurance services.
M samples are randomly drawn from the first sample in the database and N samples are randomly drawn from the second sample in the database.
In one example, 50 samples are randomly drawn from a first sample in the database, the first sample being the user characteristic data of users who have purchased insurance service, and 2500 samples are randomly drawn from a second sample in the database, the second sample being the user characteristic data of users who have not purchased insurance service.
S320, splicing the user characteristic data of the M first samples and the user characteristic data of the N second samples to obtain P training sample sets.
Wherein P is a positive integer greater than 0.
And splicing the M first samples with a part of samples in the N second samples to obtain P training sample sets for training the classification model to be trained.
The number of first samples and second samples in each set of training samples may be the same or different.
In one example, there are 50 first samples and 2500 second samples, 50 first samples are taken as a first sample set, 2500 second samples are divided into 50 second sample sets, and the first sample set is spliced with each second sample set to obtain 50 training sample sets, wherein each training sample set comprises 50 first samples and 50 second samples.
S330, aiming at each classification model to be trained in a plurality of classification models to be trained, respectively executing the following steps: and training the classification model to be trained by adopting partial feature labels according to the training sample set corresponding to the classification model to be trained to obtain the classification model.
And randomly adopting partial feature labels in all the feature labels and one training sample set in the P training sample sets when training each classification model to be trained.
The feature labels adopted by training different classification models to be trained are not completely the same, so that the classification models obtained by training are different, the probability that the user is the user with service requirements can be determined according to the difference of different feature labels, the classification process cannot be dominated by certain feature labels, the generalization capability of the classification models is improved, and the user to be classified is classified more accurately.
In one example, there are 50 classification models to be trained and 50 training sample sets, and the following steps are performed for each model to be trained: and randomly extracting 1/3 feature labels from all the feature labels, training a classification model to be trained by adopting one training sample in 50 training sample sets, and obtaining the classification model after training.
The feature labels adopted by the 50 classification models are not completely the same, so that the 50 classification models obtained by training are different.
According to the method provided by the embodiment of the invention, when the classification model to be trained is trained, part of all the feature labels are randomly adopted, so that the plurality of classification models obtained by training are different, the obtained plurality of classification models cannot be dominated by the part of the feature labels, the generalization capability of the classification models is improved, and the accuracy of classifying users is improved.
In one embodiment provided by the invention, the classification model to be trained comprises a decision tree classification model to be trained; the classification model comprises a decision tree classification model.
And training a decision tree classification model according to a training sample set corresponding to the decision tree classification model to be trained and part of feature labels in all feature labels to be trained to obtain the decision tree classification model.
According to the embodiment of the invention, the decision tree classification model is obtained by training the decision tree classification model to be trained, and a basis is provided for determining the probability of the user according to the decision tree classification model, so that the category of the user to be classified can be determined according to the average value of the probability.
In some embodiments provided by the present invention, before training the classification model to be trained, the feature labels are obtained from the user feature data, so as to obtain the first sample and the second sample for training the classification model to be trained, and the method for obtaining the feature labels is described below with reference to fig. 4.
As shown in fig. 4, the method of obtaining the feature tag includes the following steps: s410, a plurality of first initial samples and a plurality of second initial samples are obtained.
The first initial sample includes category labels and user characteristic data for classified users and the second initial sample includes user characteristic data for unclassified users.
The classified users are users who have determined user category labels, which may be purchased services.
In one example, the classified users are users who purchased insurance services and the category labels are users who purchased insurance services.
Unclassified users may include users who have not purchased a service, and unclassified users may or may not be users who have a service need.
In one example, an unclassified user has not purchased an insurance service, and an unclassified user may or may not be a user who is intended to purchase an insurance service.
User characteristic data of a plurality of first initial samples and characteristic data of a plurality of second initial samples are obtained from a database.
In one example, user characteristic data for a user may include an installed number of day-of-care applications of 2, a study of this subject, an age of 40 years, a gender of a male, a city of Beijing, a yearly income of 40 ten thousand, and so on.
And S420, classifying the user characteristic data in the plurality of first initial samples and the plurality of second initial samples to obtain a plurality of characteristic labels.
And classifying according to the category of the data in the user characteristic data to obtain a plurality of characteristic labels. Wherein each feature tag corresponds to a type of user feature data.
In one example, the user characteristic data of a certain user is: the obtained feature labels comprise the installation quantity of the near-day insurance application software, the academic history, the age of 40 years, the sex of a male, the city of the male, Beijing, the income of 40 ten thousands of years and the like.
And taking the class label of the classified user in the first initial sample and the user characteristic data corresponding to the characteristic label as a first sample.
And taking the user characteristic data corresponding to the characteristic label of the unclassified user in the second initial sample as a second sample.
The first sample and the second sample are used for training the classification model to be trained.
The embodiment of the invention classifies the user characteristic data of the user with the determined class label and the user characteristic data of the user without the class label to obtain a plurality of characteristic labels, uses the class label of the classified user in a first initial sample and the user characteristic data corresponding to the characteristic label as a first sample, and uses the user characteristic data corresponding to the characteristic label of the user without the class in a second initial sample as a second sample, so as to obtain the first sample and the second sample, provide a basis for generating a training sample set and training a classification model to be trained, train the classification model to be trained by adopting partial characteristic labels and the training sample set to obtain the classification model, determine the probability that the user to be classified is the user with service requirements according to the classification model, thereby determining the class of the user and improving the accuracy of classifying the user.
As shown in fig. 5, the classification apparatus 500 may include a first obtaining module 510, a first calculating module 520, a second calculating module 530, and a determining module 540.
The first obtaining module 510 is configured to obtain user feature data corresponding to a feature tag of a user to be classified.
The unclassified user is a user to which the class of the unclassified user belongs, and the user to be classified may be any one of the unclassified users.
The feature label may be a feature label selected according to the purpose of classification.
The first calculating module 520 is configured to input the user feature data into each classification model of the multiple classification models, so as to obtain a probability calculated by each classification model.
Each classification model is obtained by training based on user feature data corresponding to a part of feature labels in a training sample.
The user characteristic data corresponding to part of characteristic labels in the training sample is adopted to train the classification model to be trained to obtain a plurality of classification models, the user characteristic data of the user to be classified is respectively input into each classification model, each classification model calculates a probability, and the probability can represent the probability that the user to be classified is the user with service requirements.
A second calculating module 530 for calculating an average of the plurality of probabilities.
The average value may represent an average probability that the user to be classified has a service demand.
And the determining module 540 is configured to determine the category to which the user to be classified belongs according to the average value.
The category of the user to be classified may be a target user or a non-target user.
The target user represents a user with a service requirement, and the non-target user represents a user without a service requirement.
In one example, the determination module 540 may include a comparison unit 541 and a determination unit 542.
A comparing unit 541, configured to compare the average value with a preset threshold.
The determining unit 542 is configured to determine that the user to be classified belongs to the target user when the average value is greater than the threshold value.
And when the average value is larger than the preset threshold value, indicating that the user to be classified is a user with service requirement and belongs to the target user.
And when the average value is less than or equal to the preset threshold value, the user to be classified is not a user with service requirement and belongs to a non-target user.
The device provided by the embodiment of the invention inputs the acquired user characteristic data corresponding to the characteristic labels of the users to be classified into each classification model of the multiple classification models to obtain the probability calculated by each classification model, then calculates the average value of the multiple probabilities, and determines the category of the users to be classified according to the average value.
In one embodiment provided by the present invention, the classification apparatus 500 may further include a second obtaining module 550, a splicing module 560, and a training module 570.
The second obtaining module 550 is configured to obtain M first samples and N second samples, where the first samples include category labels of classified users and user feature data corresponding to the feature labels, and the second samples include user feature data corresponding to feature labels of unclassified users.
Wherein M is a positive integer greater than 0, and N is a positive integer greater than M.
And a splicing module 560, configured to splice the user feature data of the M first samples and the user feature data of the N second samples to obtain P training sample sets.
Wherein P is a positive integer greater than 0.
A training module 570, configured to perform the following steps for each to-be-trained classification model of the multiple to-be-classified models, respectively: and training the classification model to be trained by adopting partial feature labels according to the training sample set corresponding to the classification model to be trained to obtain the classification model.
And randomly adopting partial feature labels in all the feature labels and one training sample set in the P training sample sets when training each classification model to be trained.
When the device provided by the embodiment of the invention trains the classification model to be trained, partial feature labels in all feature labels are adopted, so that a plurality of classification models obtained by training are different, the probability that a user is a user with service requirements can be determined according to the difference of different feature labels, and the classification process cannot be dominated by some feature labels, so that the generalization capability of the classification model is improved, and the accuracy of classifying the user is improved.
In one embodiment provided by the invention, the classification model to be trained comprises a decision tree classification model to be trained; the classification model comprises a decision tree classification model.
The training module 570 is specifically configured to: and training a decision tree classification model according to a training sample set corresponding to the decision tree classification model to be trained and part of feature labels in all feature labels to be trained to obtain the decision tree classification model.
The embodiment of the invention obtains the decision tree classification model by training the decision tree classification model to be trained, and provides a basis for determining the probability of the user according to the decision tree classification model, so that the category of the user to be classified can be determined according to the average value of the probability.
In one embodiment provided by the present invention, the classification apparatus 500 further includes a third obtaining module 580 and a classifying module 590.
A third obtaining module 580, configured to obtain a plurality of first initial samples and a plurality of second initial samples.
The first initial sample includes category labels and user characteristic data for classified users and the second initial sample includes user characteristic data for unclassified users.
The classified users are users who have determined user category labels, which may be purchased services.
Unclassified users may include users who have not purchased a service, who may or may not have a service requirement.
The classifying module 590 is configured to classify the user feature data in the first initial samples and the second initial samples to obtain a plurality of feature labels.
And taking the class label of the classified user in the first initial sample and the user characteristic data corresponding to the characteristic label as a first sample.
And taking the user characteristic data corresponding to the characteristic label of the unclassified user in the second initial sample as a second sample.
The first sample and the second sample are used for training the classification model to be trained.
The device provided by the embodiment of the invention classifies the user characteristic data of the user with the determined class label and the user characteristic data of the user without the class label to obtain a plurality of characteristic labels, uses the class label of the classified user in the first initial sample and the user characteristic data corresponding to the characteristic label as the first sample, and uses the user characteristic data corresponding to the characteristic label of the user without the class in the second initial sample as the second sample, so that the first sample and the second sample are obtained, a basis is provided for generating a training sample set and training a classification model to be trained, the classification model to be trained can be trained by adopting partial characteristic labels and the training sample set to obtain the classification model, the probability of the user to be classified is determined according to the classification model, the class of the user is determined, and the accuracy of classifying the user is improved.
The classification device provided by the embodiment of the present invention performs each step in the method shown in fig. 1 to 4, and can achieve the technical effect of improving the accuracy of classifying users, which is not described in detail herein for brevity.
Fig. 6 is a schematic diagram illustrating a hardware structure of an electronic device according to an embodiment of the present invention.
The electronic device may comprise a processor 601 and a memory 602 in which computer program instructions are stored.
Specifically, the processor 601 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured as one or more Integrated circuits implementing embodiments of the present invention.
Memory 602 may include mass storage for data or instructions. By way of example, and not limitation, memory 602 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 602 may include removable or non-removable (or fixed) media, where appropriate. The memory 602 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 602 is a non-volatile solid-state memory. In a particular embodiment, the memory 602 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.
The processor 601 implements any of the classification methods in the embodiments shown in fig. 1-4 by reading and executing computer program instructions stored in the memory 602.
In one example, the electronic device may also include a communication interface 603 and a bus 610. As shown in fig. 6, the processor 601, the memory 602, and the communication interface 603 are connected via a bus 610 to complete communication therebetween.
The communication interface 603 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present invention.
The bus 610 includes hardware, software, or both to couple the components of the electronic device to one another. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 610 may include one or more buses, where appropriate. Although specific buses have been described and illustrated with respect to embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.
The electronic device may perform the classification method in the embodiment of the present invention, thereby implementing the classification method described in conjunction with fig. 1.
In addition, in combination with the classification method in the foregoing embodiments, the embodiments of the present invention may be implemented by providing a computer storage medium. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any of the classification methods in the above embodiments.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions, or change the order between the steps, after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments noted in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims (8)

1. A method of classification, the method comprising:
acquiring user characteristic data corresponding to the characteristic labels of the users to be classified;
respectively inputting the user characteristic data into each classification model of a plurality of classification models to obtain the probability calculated by each classification model;
calculating an average of a plurality of said probabilities;
determining the category of the user to be classified according to the average value;
each classification model is obtained by training based on user characteristic data corresponding to part of characteristic labels in a training sample; the partial feature labels adopted in the training of each classification model are not identical;
the training process of each classification model comprises the following steps: obtaining M first samples and N second samples, wherein the first samples comprise category labels of classified users and user feature data corresponding to the feature labels, and the second samples comprise user feature data corresponding to the feature labels of unclassified users;
splicing the user characteristic data of the M first samples and the user characteristic data of the N second samples to obtain P training sample sets;
aiming at each classification model to be trained in a plurality of classification models to be trained, the following steps are respectively executed:
training the classification model to be trained by adopting partial feature labels according to a training sample set corresponding to the classification model to be trained to obtain the classification model;
wherein M is a positive integer greater than 0, N is a positive integer greater than M, and P is a positive integer greater than 0.
2. The method of claim 1, wherein the classification model to be trained comprises a decision tree classification model to be trained;
the classification model comprises a decision tree classification model.
3. The method of claim 1 or 2, wherein prior to said obtaining the M first samples and the N second samples, the method further comprises:
obtaining a plurality of first initial samples and a plurality of second initial samples, wherein the first initial samples comprise the category labels and user feature data of the classified users, and the second initial samples comprise user feature data of the unclassified users;
classifying the user feature data in the first initial samples and the second initial samples to obtain a plurality of feature labels;
taking the category label of the classified user and user feature data corresponding to the feature label in the first initial sample as the first sample;
taking the user feature data corresponding to the feature label of the unclassified user in the second initial sample as the second sample;
wherein each feature tag corresponds to a type of user feature data.
4. The method according to claim 1, wherein the determining the category to which the user to be classified belongs according to the average value specifically includes:
comparing the average value with a preset threshold value;
and when the average value is larger than the preset threshold value, determining that the user to be classified belongs to a target user.
5. A sorting apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring user characteristic data corresponding to the characteristic labels of the users to be classified;
the first calculation module is used for respectively inputting the user characteristic data into each classification model in a plurality of classification models to obtain the probability calculated by each classification model;
a second calculation module for calculating an average of a plurality of said probabilities;
the determining module is used for determining the category of the user to be classified according to the average value;
each classification model is obtained by training based on user characteristic data corresponding to part of characteristic labels in a training sample; the partial feature labels adopted in the training of each classification model are not identical;
the training process of each classification model comprises the following steps: a second obtaining module, configured to obtain M first samples and N second samples, where the first samples include category labels of classified users and user feature data corresponding to the feature labels, and the second samples include user feature data corresponding to the feature labels of unclassified users;
the splicing module is used for splicing the user characteristic data of the M first samples and the user characteristic data of the N second samples to obtain P training sample sets;
the training module is used for respectively executing the following steps aiming at each classification model to be trained in a plurality of classification models to be trained:
training the classification model to be trained according to the training sample set corresponding to the classification model to be trained to obtain the classification model;
wherein M is a positive integer greater than 0, N is a positive integer greater than M, and P is a positive integer greater than 0.
6. The apparatus of claim 5, wherein the classification model to be trained comprises a decision tree classification model to be trained;
the classification model comprises a decision tree classification model.
7. An electronic device, characterized in that the device comprises: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the classification method of any one of claims 1-4.
8. A computer storage medium having computer program instructions stored thereon which, when executed by a processor, implement the classification method of any one of claims 1 to 4.
CN202210929907.0A 2022-08-04 2022-08-04 Classification method, classification device, classification equipment and storage medium Pending CN114996590A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210929907.0A CN114996590A (en) 2022-08-04 2022-08-04 Classification method, classification device, classification equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210929907.0A CN114996590A (en) 2022-08-04 2022-08-04 Classification method, classification device, classification equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114996590A true CN114996590A (en) 2022-09-02

Family

ID=83023147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210929907.0A Pending CN114996590A (en) 2022-08-04 2022-08-04 Classification method, classification device, classification equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114996590A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7774451B1 (en) * 2008-06-30 2010-08-10 Symantec Corporation Method and apparatus for classifying reputation of files on a computer network
CN109190491A (en) * 2018-08-08 2019-01-11 上海海洋大学 Residual error convolutional neural networks SAR image sea ice classification method
US20190303713A1 (en) * 2018-03-30 2019-10-03 Regents Of The University Of Minnesota Discovery of shifting patterns in sequence classification
CN111738365A (en) * 2020-08-06 2020-10-02 腾讯科技(深圳)有限公司 Image classification model training method and device, computer equipment and storage medium
CN113780367A (en) * 2021-08-19 2021-12-10 北京三快在线科技有限公司 Classification model training and data classification method and device, and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7774451B1 (en) * 2008-06-30 2010-08-10 Symantec Corporation Method and apparatus for classifying reputation of files on a computer network
US20190303713A1 (en) * 2018-03-30 2019-10-03 Regents Of The University Of Minnesota Discovery of shifting patterns in sequence classification
CN109190491A (en) * 2018-08-08 2019-01-11 上海海洋大学 Residual error convolutional neural networks SAR image sea ice classification method
CN111738365A (en) * 2020-08-06 2020-10-02 腾讯科技(深圳)有限公司 Image classification model training method and device, computer equipment and storage medium
CN113780367A (en) * 2021-08-19 2021-12-10 北京三快在线科技有限公司 Classification model training and data classification method and device, and electronic equipment

Similar Documents

Publication Publication Date Title
CN108876213B (en) Block chain-based product management method, device, medium and electronic equipment
CN112990294B (en) Training method and device of behavior discrimination model, electronic equipment and storage medium
CN112883990A (en) Data classification method and device, computer storage medium and electronic equipment
CN110533094B (en) Evaluation method and system for driver
CN111861486A (en) Abnormal account identification method, device, equipment and medium
CN111245815B (en) Data processing method and device, storage medium and electronic equipment
CN114996590A (en) Classification method, classification device, classification equipment and storage medium
CN115392787A (en) Enterprise risk assessment method, device, equipment, storage medium and program product
CN114417830A (en) Risk evaluation method, device, equipment and computer readable storage medium
CN115659232A (en) Method and device for mining abnormal rule
CN111461118B (en) Interest feature determining method, device, equipment and storage medium
CN114092219A (en) Model verification method and device, electronic equipment and storage medium
CN113052604A (en) Object detection method, device, equipment and storage medium
CN110278524B (en) User position determining method, graph model generating method, device and server
CN109993181B (en) Abnormal behavior pattern recognition method, device, equipment and medium
CN109873908B (en) Junk call identification recognition method and device, computer equipment and storage medium
CN112070530A (en) Online evaluation method and related device of advertisement prediction model
CN113112102A (en) Priority determination method, device, equipment and storage medium
CN110570301A (en) Risk identification method, device, equipment and medium
CN112560433B (en) Information processing method and device
CN116204567B (en) Training method and device for user mining and model, electronic equipment and storage medium
CN112364018B (en) Method, device and equipment for generating wide table and storage medium
CN116911857A (en) Information processing method, device, equipment, medium and product
CN115392934A (en) Method, device and equipment for evaluating credit of commercial tenant and computer storage medium
CN114661969A (en) Method, device, equipment and storage medium for optimizing user label system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220902

RJ01 Rejection of invention patent application after publication