CN112597135A

CN112597135A - User classification method and device, electronic equipment and readable storage medium

Info

Publication number: CN112597135A
Application number: CN202110006106.2A
Authority: CN
Inventors: 任文龙; 许文彬
Original assignee: Tianmian Information Technology Shenzhen Co ltd
Current assignee: Tianmian Information Technology Shenzhen Co ltd
Priority date: 2021-01-04
Filing date: 2021-01-04
Publication date: 2021-04-02

Abstract

The invention relates to data processing, and discloses a user classification method, which comprises the following steps: establishing a hierarchical label for each data set based on the description information of the data sets, establishing a label tree based on the hierarchical label, and combining the data sets corresponding to the same leaf node label in the label tree into a data set group; determining target leaf node labels based on the data set requirement text, taking data set groups corresponding to the target leaf node labels as target data set groups, and calculating comprehensive scores of all data sets in the target data set groups based on historical modeling data; sorting the data sets in the target data set group based on the comprehensive scores, determining the target data sets based on sorting results, and performing combined modeling based on the target data sets to obtain a target user classification model; and inputting the user data into the target user classification model to obtain a user classification result. The invention also provides a user classification device, electronic equipment and a readable storage medium. The invention improves the accuracy of user classification.

Description

User classification method and device, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a user classification method and apparatus, an electronic device, and a readable storage medium.

Background

With the development of science and technology, models are more and more widely applied, for example, a model is applied to accurately classify users, a user classification model is usually modeled according to an acquired data set at present, the acquired data set is limited due to the requirement of data privacy safety, so that the data set is not comprehensive enough, and the classification accuracy of the model obtained by modeling is not high.

At present, a data set can be perfected through a multi-party combined modeling mode, in the modeling process, federates need to select proper data sets to perform combined modeling according to description information of each data set, however, due to the fact that the description information is relatively simple, detailed conditions of each data set cannot be accurately known, the target data set is improperly selected, and the classification accuracy of a user classification model obtained through modeling is not high. Therefore, a user classification method is needed to improve the user classification accuracy.

Disclosure of Invention

In view of the above, there is a need to provide a user classification method, aiming at improving the user classification accuracy.

The user classification method provided by the invention comprises the following steps:

analyzing a combined modeling request aiming at a user classification model sent by a first client, acquiring a data set requirement text carried by the combined modeling request, acquiring description information of data sets corresponding to all federates from a first database, establishing hierarchical labels for all the data sets based on the description information, establishing a label tree based on the hierarchical labels, and cooperating the data sets corresponding to the same leaf node label in the label tree into a data set group;

determining a target leaf node label corresponding to the joint modeling request based on the data set requirement text, taking a data set group corresponding to the target leaf node label as a target data set group, acquiring historical modeling data of each data set in the target data set group from a second database, and calculating a comprehensive score of each data set in the target data set group based on the historical modeling data;

sorting the data sets in the target data set group according to the sequence of the comprehensive scores from high to low, determining a target data set based on a sorting result, and performing combined modeling based on the target data set to obtain a target user classification model;

analyzing a user classification request sent by a second client, acquiring user data carried by the user classification request, and inputting the user data into the target user classification model to obtain a user classification result.

Optionally, the determining, based on the data set requirement text, a target leaf node tag corresponding to the joint modeling request includes:

performing word segmentation processing on the data set requirement text to obtain a word set;

and matching each word in the word set with a label library corresponding to each level label of the label tree, and determining a target leaf node label based on a matching result.

Optionally, the matching result includes a successfully matched tag name and a tag hierarchy, and determining the target leaf node tag based on the matching result includes:

judging whether the successfully matched labels contain leaf node labels or not based on the label names and the label levels;

and when the successfully matched label is judged to contain the leaf node label, taking the successfully matched leaf node label as a target leaf node label.

Optionally, after determining whether the successfully matched tag contains a leaf node tag based on the tag name and the tag hierarchy, the method further includes:

and if the successfully matched tags do not contain leaf node tags, sequencing the successfully matched tags according to the tag hierarchy from high to low, and displaying all levels of sub-tag selection interfaces of the sequenced last tags to an interface of the first client so that a user of the first client can select a target leaf node tag.

Optionally, the performing joint modeling based on the target dataset includes:

and performing data alignment processing and feature extraction processing on the target data set to respectively obtain an alignment data set and a feature data set corresponding to the target data set, and performing joint modeling based on the target data set, the alignment data set and the feature data set.

Optionally, the historical modeling data includes the number of times of successful modeling of the data set and an initial score of the modeling participant on the data set after each successful modeling, and the calculation formula of the comprehensive score is as follows:

wherein, Y_iThe comprehensive score of the ith data set in the target data set group is calculated, M is the modeling success frequency of the data set with the minimum modeling success frequency in the target data set group, N is the total score of the ith data set in the target data set group_iThe number of modeling successes for the ith data set in the target data set group, A is the average of the initial scores of all data sets in the target data set group, S_iIs the average of the initial scores of the ith data set in the target data set group.

Optionally, the "determining the target leaf node label corresponding to the joint modeling request based on the data set requirement text" is replaced with:

displaying a cascade label selection interface corresponding to the label tree to an interface of the first client, so that a user of the first client can select a target leaf node label based on the cascade label selection interface.

In order to solve the above problem, the present invention further provides a user classifying device, including:

the system comprises an establishing module, a data set group creating module and a data set group creating module, wherein the establishing module is used for analyzing a combined modeling request aiming at a user classification model sent by a first client, acquiring a data set requirement text carried by the combined modeling request, acquiring description information of data sets corresponding to all federates from a first database, establishing hierarchical labels for all the data sets based on the description information, establishing a label tree based on the hierarchical labels, and cooperating the data sets corresponding to the same leaf node label in the label tree into a data set group;

the calculation module is used for determining a target leaf node label corresponding to the joint modeling request based on the data set demand text, taking a data set group corresponding to the target leaf node label as a target data set group, acquiring historical modeling data of each data set in the target data set group from a second database, and calculating a comprehensive score of each data set in the target data set group based on the historical modeling data;

the modeling module is used for sequencing the data sets in the target data set group according to the sequence of the comprehensive scores from high to low, determining a target data set based on a sequencing result, and performing combined modeling based on the target data set to obtain a target user classification model;

and the classification module is used for analyzing a user classification request sent by a second client, acquiring user data carried by the user classification request, and inputting the user data into the target user classification model to obtain a user classification result.

In order to solve the above problem, the present invention also provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a user classification program executable by the at least one processor, the user classification program being executable by the at least one processor to enable the at least one processor to perform the user classification method described above.

In order to solve the above problem, the present invention also provides a computer-readable storage medium having a user classification program stored thereon, the user classification program being executable by one or more processors to implement the user classification method described above.

Compared with the prior art, the hierarchical label is established for each data set based on the description information of the data set, the label tree is established based on the hierarchical label, the data sets corresponding to the same leaf node label in the label tree are combined into a data set group, and the data sets are subjected to labeling treatment in the step, so that the expected data set group can be conveniently, quickly and accurately found in the follow-up process; secondly, determining a target leaf node label based on the data set requirement text, taking a data set group corresponding to the target leaf node label as a target data set group, and calculating a comprehensive score of each data set in the target data set group based on historical modeling data, wherein the comprehensive score is objective and accurate; then, the data sets in the target data set group are sorted according to the sequence of the comprehensive scores from high to low, the target data sets are determined based on the sorting results, and the target user classification model is obtained based on the joint modeling of the target data sets, so that the target data sets are selected more quickly and accurately, and the classification accuracy of the target user classification model obtained by modeling is higher; and finally, inputting the user data into the target user classification model to obtain a user classification result. Therefore, the invention improves the accuracy of user classification.

Drawings

Fig. 1 is a schematic flowchart of a user classification method according to an embodiment of the present invention;

fig. 2 is a block diagram of a user classifying device according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device implementing a user classification method according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

The invention provides a user classification method. Fig. 1 is a schematic flow chart of a user classification method according to an embodiment of the present invention. The method may be performed by an electronic device, which may be implemented by software and/or hardware.

In this embodiment, the user classification method includes:

s1, analyzing a combined modeling request aiming at a user classification model sent by a first client, acquiring a data set requirement text carried by the combined modeling request, acquiring description information of data sets corresponding to all federates from a first database, establishing a hierarchical label for each data set based on the description information, establishing a label tree based on the hierarchical label, and cooperating a set of data sets corresponding to the same leaf node label in the label tree into a data set group.

In this embodiment, the first client is a client to which the joint modeling request initiator belongs, and in order to ensure data security, each federate stores the data set thereof in a local storage space, and only uploads description information of the data set to the first database of the electronic device, so that other federates can select a proper data set for joint modeling.

In this embodiment, a hierarchical label may be established for each data set according to description information of the data set, where the description information includes information of federal members to which the data set belongs, a data type, and the like. For example, according to the industry background of a company to which the federate belongs, the coalition members are firstly divided into tags of finance, real estate, medical treatment, e-commerce and the like; different companies have different types of data and can continue to classify, and financial companies have loan data, overdue data, credit data and user basic data; the real estate company has user basic data, house pre-purchase preference and house asset information; the basic data of the user can be subdivided into hobbies, behavior characteristics, basic attributes and purchasing ability, and the loan data can be subdivided into pre-loan data, mid-loan data and post-loan data.

The labels of different levels are nested layer by layer to form a tree structure, and the label at the tail end belongs to the leaf node label. If the proper tags cannot be found in the tag library, the federate can also customize the tags, and the customized tags can be located in different levels of the tag tree so as to better classify and manage the data set.

After the label tree is established, the data sets corresponding to the same leaf node label in the label tree are combined into a data set group.

S2, determining a target leaf node label corresponding to the joint modeling request based on the data set requirement text, taking a data set group corresponding to the target leaf node label as a target data set group, acquiring historical modeling data of each data set in the target data set group from a second database, and calculating a comprehensive score of each data set in the target data set group based on the historical modeling data.

The data set requirement text can be a sentence or a plurality of sentences.

The determining a target leaf node label corresponding to the joint modeling request based on the data set requirement text comprises:

a11, performing word segmentation processing on the data set requirement text to obtain a word set;

in this embodiment, a word segmentation algorithm based on a word list (a forward maximum matching algorithm, a reverse maximum matching algorithm, a bidirectional maximum matching algorithm) or a word segmentation method based on an N-gram language model may be used to segment words of the data set requirement text.

A12, matching each word in the word set with a label library corresponding to each level label of the label tree, and determining a target leaf node label based on the matching result.

In this embodiment, each layer of tags in the tag tree corresponds to one tag library, and each word in the word set obtained by word segmentation is matched with each tag library, so that each level tag of the data set requirement text can be determined.

The matching result comprises a successfully matched label name and a label level, and the determining the label of the target leaf node based on the matching result comprises:

b11, judging whether the successfully matched label contains a leaf node label or not based on the label name and the label level;

and B12, when judging that the successfully matched label contains the leaf node label, taking the successfully matched leaf node label as a target leaf node label.

After determining whether the successfully matched tag contains a leaf node tag based on the tag name and the tag hierarchy, the method further includes:

In this embodiment, the second database stores historical modeling data of each data set, the historical modeling data includes the number of times of successful modeling of the data set and an initial score of a modeling participant on the data set after successful modeling each time, and a calculation formula of the composite score is:

Through the calculation formula, the data set with less modeling times can also be calculated to obtain higher comprehensive score, so that the comprehensive score is more objective.

For example, assume that there are 3 data sets in the target data cluster group, data set a, data set B, and data set C:

the data set A is successfully modeled for 6 times, and the corresponding initial scores are 6, 7, 8, 9 and 9 respectively; the data set B is successfully modeled for 5 times, and the corresponding initial scores are 6, 7, 8, 9 and 9 respectively; data set C was modeled 1 time successfully, corresponding to an initial score of 9.

Then it can be calculated that a equals (6+7+8+9+9+ 6+7+8+9+ 9)/12 equals 8, M equals 1, and data set a corresponds to S₁＝(6+7+8+9+9+9)/6＝8，N₁＝6，Y₁1/(1+6) × 8+6/(1+6) × 8; s corresponding to data set B₂＝(6+7+8+9+9)/5＝7.8，N₂＝5，Y₂＝1/(1+5)*8.27+5/(1+5)*7.8。

In another embodiment of the invention, "determining the target leaf node label corresponding to the joint modeling request based on the dataset requirement text" is replaced with:

S3, sorting the data sets in the target data set group according to the sequence of the comprehensive scores from high to low, determining a target data set based on the sorting result, and performing combined modeling based on the target data set to obtain a target user classification model.

In this embodiment, one or more data sets in the target data set group that are ranked first may be used as the target data set.

The jointly modeling based on the target dataset comprises:

S4, analyzing a user classification request sent by a second client, acquiring user data carried by the user classification request, and inputting the user data into the target user classification model to obtain a user classification result.

According to the embodiment, the user classification method provided by the invention includes the steps that firstly, hierarchical labels are established for all data sets based on description information of the data sets, label trees are established based on the hierarchical labels, and the sets of the data sets corresponding to the same leaf node label in the label trees are combined into a data set group; secondly, determining a target leaf node label based on the data set requirement text, taking a data set group corresponding to the target leaf node label as a target data set group, and calculating a comprehensive score of each data set in the target data set group based on historical modeling data, wherein the comprehensive score is objective and accurate; then, the data sets in the target data set group are sorted according to the sequence of the comprehensive scores from high to low, the target data sets are determined based on the sorting results, and the target user classification model is obtained based on the joint modeling of the target data sets, so that the target data sets are selected more quickly and accurately, and the classification accuracy of the target user classification model obtained by modeling is higher; and finally, inputting the user data into the target user classification model to obtain a user classification result. Therefore, the invention improves the accuracy of user classification.

Fig. 2 is a schematic block diagram of a user classification apparatus according to an embodiment of the present invention.

The user classifying device 100 according to the present invention may be installed in an electronic device. According to the implemented functions, the user classification apparatus 100 may include a creation module 110, a calculation module 120, a modeling module 130, and a classification module 140. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the establishing module 110 is configured to analyze a joint modeling request for a user classification model sent by a first client, obtain a data set requirement text carried by the joint modeling request, obtain description information of data sets corresponding to each federate from a first database, establish a hierarchical label for each data set based on the description information, establish a label tree based on the hierarchical label, and cooperate a set of data sets corresponding to a same leaf node label in the label tree into a data set group.

A calculating module 120, configured to determine a target leaf node tag corresponding to the joint modeling request based on the data set requirement text, use a data set group corresponding to the target leaf node tag as a target data set group, obtain historical modeling data of each data set in the target data set group from a second database, and calculate a comprehensive score of each data set in the target data set group based on the historical modeling data.

The data set requirement text can be a sentence or a plurality of sentences.

a21, performing word segmentation processing on the data set requirement text to obtain a word set;

A22, matching each word in the word set with a label library corresponding to each level label of the label tree, and determining a target leaf node label based on the matching result.

b21, judging whether the successfully matched label contains a leaf node label or not based on the label name and the label level;

and B22, when judging that the successfully matched label contains the leaf node label, taking the successfully matched leaf node label as a target leaf node label.

After determining whether the successfully matched tag contains a leaf node tag based on the tag name and the tag hierarchy, the computing module 120 is further configured to:

wherein, Y_iThe comprehensive score of the ith data set in the target data set group is given, and M is the establishment of the data set with the least modeling success times in the target data set groupNumber of successful moulds, N_iThe number of modeling successes for the ith data set in the target data set group, A is the average of the initial scores of all data sets in the target data set group, S_iIs the average of the initial scores of the ith data set in the target data set group.

And the modeling module 130 is configured to sort the data sets in the target data set group according to the sequence of the composite scores from high to low, determine a target data set based on the sorting result, and perform joint modeling based on the target data set to obtain a target user classification model.

The jointly modeling based on the target dataset comprises:

The classification module 140 is configured to analyze a user classification request sent by a second client, obtain user data carried in the user classification request, and input the user data into the target user classification model to obtain a user classification result.

Fig. 3 is a schematic structural diagram of an electronic device for implementing a user classification method according to an embodiment of the present invention.

The electronic device 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a command set or stored in advance. The electronic device 1 may be a computer, or may be a single network server, a server group composed of a plurality of network servers, or a cloud composed of a large number of hosts or network servers based on cloud computing, where cloud computing is one of distributed computing and is a super virtual computer composed of a group of loosely coupled computers.

In the present embodiment, the electronic device 1 includes, but is not limited to, a memory 11, a processor 12, and a network interface 13, which are communicatively connected to each other through a system bus, wherein the memory 11 stores a user classification program 10, and the processor 12 can execute the user classification program 10. Fig. 3 only shows the electronic device 1 with components 11-13 and the user classification program 10, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or some components may be combined, or a different arrangement of components.

The storage 11 includes a memory and at least one type of readable storage medium. The memory provides cache for the operation of the electronic equipment 1; the readable storage medium may be a non-volatile storage medium such as flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1; in other embodiments, the non-volatile storage medium may also be an external storage device of the electronic device 1, such as a plug-in hard disk provided on the electronic device 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. In this embodiment, the readable storage medium of the memory 11 is generally used for storing an operating system and various application software installed in the electronic device 1, for example, codes of the user classification program 10 in an embodiment of the present invention. Further, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.

Processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 12 is generally configured to control the overall operation of the electronic device 1, such as performing control and processing related to data interaction or communication with other devices. In this embodiment, the processor 12 is configured to run the program code stored in the memory 11 or process data, for example, run the user classification program 10.

The network interface 13 may comprise a wireless network interface or a wired network interface, and the network interface 13 is used for establishing a communication connection between the electronic device 1 and a client (not shown).

Optionally, the electronic device 1 may further include a user interface, the user interface may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface may further include a standard wired interface and a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The user classification program 10 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 12, may implement:

Specifically, the processor 12 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the user classification program 10, which is not described herein again.

Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable medium may be non-volatile or non-volatile. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

The computer-readable storage medium stores a user classification program 10, where the user classification program 10 may be executed by one or more processors, and a specific implementation of the computer-readable storage medium of the present invention is substantially the same as that in each embodiment of the user classification method, and is not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method for classifying a user, the method comprising:

2. The user classification method according to claim 1, wherein the determining a target leaf node label corresponding to the joint modeling request based on the dataset requirement text comprises:

3. The method for classifying a user according to claim 2, wherein the matching result includes a label name and a label hierarchy for which the matching is successful, and the determining the target leaf node label based on the matching result includes:

4. The method for classifying users according to claim 3, wherein after determining whether the successfully matched label contains a leaf node label based on the label name and label hierarchy, the method further comprises:

5. The user classification method of claim 1, wherein the jointly modeling based on the target dataset comprises:

6. The user classification method according to any one of claims 1 to 5, wherein the historical modeling data includes the number of times of success in modeling the data set and an initial score of the data set by a modeling participant after each success in modeling, and the calculation formula of the composite score is:

7. The user classification method according to claim 1, characterized in that "determining the target leaf node label corresponding to the joint modeling request based on the dataset requirement text" is replaced with:

8. An apparatus for classifying a user, the apparatus comprising:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

the memory stores a user classification program executable by the at least one processor to enable the at least one processor to perform the user classification method of any one of claims 1 to 7.

10. A computer-readable storage medium having stored thereon a user classification program executable by one or more processors to implement the user classification method of any one of claims 1 to 7.