CN112860672A - Method and device for determining label weight - Google Patents

Method and device for determining label weight Download PDF

Info

Publication number
CN112860672A
CN112860672A CN202110075021.XA CN202110075021A CN112860672A CN 112860672 A CN112860672 A CN 112860672A CN 202110075021 A CN202110075021 A CN 202110075021A CN 112860672 A CN112860672 A CN 112860672A
Authority
CN
China
Prior art keywords
enterprise
user
weight
determining
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110075021.XA
Other languages
Chinese (zh)
Inventor
蔡科
林畅
池冰晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202110075021.XA priority Critical patent/CN112860672A/en
Publication of CN112860672A publication Critical patent/CN112860672A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Abstract

The invention discloses a method and a device for determining label weight, and relates to the technical field of computers. One embodiment of the method comprises: determining at least one user tag corresponding to a target enterprise user; determining an impact weight of the target enterprise user corresponding to the user tag, wherein the impact weight comprises at least one of: label type weight, time attenuation weight, user behavior frequency weight and word frequency weight; and determining the label weight of the user label corresponding to the target enterprise user according to the influence weight. The implementation can describe the characteristic attributes of the enterprise users more accurately through the user tags.

Description

Method and device for determining label weight
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for determining label weight.
Background
With the development of the internet in these years, the online degree of many industries has become mature, and each enterprise also accumulates a large amount of raw data and various business data. And the user label also becomes an important means for carrying out refined operation and marketing service by utilizing big data of each enterprise. However, the existing user tags for enterprise users still cannot accurately describe the characteristic attributes of the enterprise users.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for determining a tag weight, which can describe a feature attribute of an enterprise user more accurately through a user tag.
In a first aspect, an embodiment of the present invention provides a method for determining a label weight, including:
determining an impact weight of the target enterprise user corresponding to the user tag, wherein the impact weight comprises at least one of: label type weight, time attenuation weight, user behavior frequency weight and word frequency weight;
and determining the label weight of the user label corresponding to the target enterprise user according to the influence weight.
Optionally, the determining at least one user tag corresponding to the target enterprise user includes:
acquiring enterprise information of the target enterprise user;
determining at least one user tag of the target enterprise user according to the enterprise information, wherein the classification of the user tag comprises: a business scenario-based label classification and an enterprise attribute-based label classification.
Optionally, the determining at least one user tag of the target enterprise user according to the enterprise information includes:
counting the enterprise information, and determining a counting class label of the target enterprise user;
and/or the presence of a gas in the gas,
determining a rule class label of the target enterprise user according to the enterprise information and the label rule;
and/or the presence of a gas in the gas,
and determining the mining class label of the target enterprise user according to the enterprise information and the label model.
Optionally, the determining at least one user tag of the target enterprise user according to the enterprise information includes:
performing data cleaning on the enterprise information;
and determining at least one user label of the target enterprise user according to the cleaned enterprise information.
Optionally, the performing data cleansing on the enterprise information includes:
if the enterprise information comprises an enterprise certificate number and the enterprise certificate number passes the industrial and commercial data verification, determining the enterprise certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises an enterprise certificate number and an enterprise name, the enterprise certificate number does not pass the verification of the industrial and commercial data, and a first certificate number corresponding to the enterprise name exists in the industrial and commercial data, determining the first certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises an enterprise name and a second certificate number corresponding to the enterprise name exists in the industrial and commercial data, determining the second certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises an enterprise certificate number and an enterprise name, and the enterprise certificate and the enterprise name number do not pass the industrial and commercial data verification, determining the enterprise certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
and if the enterprise information comprises an enterprise name and the enterprise name number does not pass the industrial and commercial data verification, determining the enterprise name as the identifier of the enterprise information.
Optionally, the performing data cleansing on the enterprise information includes:
selecting a target data source according to the priority of the data source;
acquiring updating information corresponding to the enterprise information from the target data source;
updating the enterprise information according to the updating information;
optionally, the target data source is industrial and commercial data;
the acquiring of the update information corresponding to the enterprise information from the target data source includes:
and obtaining the updating information corresponding to the enterprise information from the industrial and commercial data.
Optionally, the target data source is another platform;
the acquiring of the update information corresponding to the enterprise information from the target data source includes:
and obtaining the updating information corresponding to the enterprise information from other platforms.
Optionally, the determining, according to the influence weight, a tag weight of the target enterprise user corresponding to the user tag includes:
determining a coefficient corresponding to each influence weight;
and determining the label weight of the user label corresponding to the target enterprise user according to each influence weight and the coefficient corresponding to each influence weight.
Optionally, the impact weight comprises: a tag type weight;
the determining the influence weight of the target enterprise user corresponding to the user tag includes:
acquiring enterprise attribute information of a target enterprise user;
and determining the label type weight according to the enterprise attribute information.
Optionally, the impact weight comprises: a time decay weight;
the determining the influence weight of the target enterprise user corresponding to the user tag includes:
acquiring behavior time corresponding to the user tag;
and determining the time attenuation weight according to the behavior time.
Optionally, the impact weight comprises: user behavior times weight;
the determining the influence weight of the target enterprise user corresponding to the user tag includes:
acquiring the behavior times of the user label in a statistical period;
and determining the weight of the user behavior times according to the behavior times.
In a second aspect, an embodiment of the present invention provides an apparatus for determining a tag weight, including:
the first determining module is used for determining at least one user tag corresponding to the target enterprise user;
a second determining module, configured to determine an impact weight of the target enterprise user corresponding to the user tag, where the impact weight includes at least one of: label type weight, time attenuation weight, user behavior frequency weight and word frequency weight;
and the third determining module is used for determining the label weight of the user label corresponding to the target enterprise user according to the influence weight.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments described above.
In a fourth aspect, an embodiment of the present invention provides a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method of any one of the above embodiments.
One embodiment of the above invention has the following advantages or benefits: determining the influence weight of a user tag corresponding to a target enterprise user; and determining the label weight of the user label corresponding to the target enterprise user according to the influence weight. The tag weight describes the accuracy with which the user tag categorizes the target enterprise user. Therefore, the characteristic attributes of the enterprise user can be accurately described through the user label and the label weight.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 2 is a schematic diagram illustrating a flow of a tag weight determination method according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a flow of another method for determining tag weights according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a flow of a method for determining an enterprise information identifier according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of still another apparatus for determining tag weight according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a tag weight determining apparatus according to an embodiment of the present invention;
fig. 7 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 illustrates an exemplary system architecture 100 to which the method or apparatus for determining tag weights of embodiments of the present invention may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, and 103 may store enterprise information, index data, and business data of each enterprise user. The terminal devices 101, 102, 103 may be cell phones, notebooks, tablets, laptop portable computers, servers, etc.
The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The terminal apparatuses 101, 102, 103 transmit the stored video to the server 105 via the network 104.
The server 105 acquires enterprise information, index data, industrial and commercial data and the like of each enterprise user from the terminal devices 101, 102 and 103, and determines at least one user tag corresponding to a target enterprise user according to the acquired enterprise information, index data, industrial and commercial data and the like of each enterprise user; determining an impact weight of the target enterprise user corresponding to the user tag, wherein the impact weight comprises at least one of: label type weight, time attenuation weight, user behavior frequency weight and word frequency weight; and determining the label weight of the user label corresponding to the target enterprise user according to the influence weight.
It should be noted that the method for determining the label weight provided by the embodiment of the present invention is generally executed by the server 105, and accordingly, the device for determining the label weight is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 is a schematic diagram of a flow of a method for determining a tag weight according to an embodiment of the present invention. An embodiment of the present invention provides a method for determining a label weight, as shown in fig. 2, including:
step 201: at least one user tag corresponding to the target enterprise user is determined.
The method of the embodiment of the invention is directed to enterprise users. The method of the embodiment of the invention needs to determine the user label corresponding to the target enterprise user. The user label is a core factor constituting the user image. The user label is a word with difference characteristics generated by analyzing and refining behavior data generated by a target enterprise user in the platform.
Step 202: determining an influence weight of the target enterprise user corresponding to the user label, wherein the influence weight comprises at least one of the following: label type weight, time attenuation weight, user behavior frequency weight and word frequency weight.
The tag weight describes the accuracy with which the user tag categorizes the target enterprise user. The label weight of the corresponding user label of the target enterprise user can be determined through the influence weights of a plurality of different dimensions.
The label type weight is a weight value defined by a business operator according to enterprise information of an enterprise user, such as enterprise scale, enterprise age, settlement amount, loan amount, whether to execute a financial penalty or not, and the like, by using business experience.
Time decay is the degree of decay that identifies a user tag as affected by time. The farther the action time is from the current time, the less meaningful the action is to the current action of the enterprise user, such as registration time of the enterprise user, establishment time of the enterprise, and the like.
The user behavior time weighting can be determined according to the occurrence time of the user behavior time within the statistical time. Specifically, according to the statistical time dimension of the user tag, for example, the activity of the enterprise user is counted by 30 days, the more times the enterprise user generates the tag in 30 days, the greater the influence of the tag on the enterprise user.
The word frequency weight is used for representing the importance degree of the target user label in the overall labels. The word frequency weight of each user tag can be obtained according to the product of the importance of each user tag to the enterprise user and the importance of the user tag in all the user tags.
Step 203: and determining the label weight of the user label corresponding to the target enterprise user according to the influence weight.
The weight of the label and the label value of the label are larger, and the larger the weight is, the more obvious the label characteristic is identified in the enterprise. The label weight can be obtained by calculating according to the service scene, time, user behavior and label occurrence frequency. Specifically, the user tag weights may be calculated by multiplying, adding, or weighted summing the weights.
In one embodiment of the present invention, the calculation formula of the user tag weight is as follows: the user label weight is the label type weight, the time attenuation weight, the user behavior number weight and the word frequency weight.
In the embodiment of the invention, the influence weight of the user tag corresponding to the target enterprise user is determined; and determining the label weight of the user label corresponding to the target enterprise user according to the influence weight. The tag weight describes the accuracy with which the user tag categorizes the target enterprise user. Therefore, the characteristic attributes of the enterprise users can be accurately described through the user tags and the tag weights, so that the problem that the characteristic attributes of the enterprise users cannot be accurately described in the prior art is solved.
In one embodiment of the present invention, determining at least one user tag corresponding to a target enterprise user includes:
acquiring enterprise information of a target enterprise user;
determining at least one user tag of a target enterprise user according to the enterprise information, wherein the classification of the user tag comprises: a business scenario-based label classification and an enterprise attribute-based label classification.
The embodiment of the invention provides the following two label classification modes:
in the first mode, the user labels are classified based on the service scene. At present, thousands of user tags in the industry are 'difficult to search and apply' to become the main problem of the current user tags, and most of the user tags are some basic enterprise information, and the business can be achieved by processing and combining the user tags again. Therefore, in the embodiment of the present invention, the actual scene and the service pain point of the service are combed, and label classification is performed based on the service scene, where the classification is specifically as follows:
the labels are divided into the following steps by taking each platform as a dimension for distinguishing: finding users, marketing users, transferring introduction to users and the life cycle of users.
And secondly, classifying the user tags based on the enterprise attributes.
TABLE 1 Enterprise Attribute-based tag Classification sheet
Figure BDA0002907281580000081
As shown in Table 1, the basic attributes of the enterprise are developed and mined to represent the enterprise user with complete service description, and the service can combine, calculate and filter the tags at will.
In one embodiment of the present invention, determining at least one user tag of a target enterprise user based on enterprise information comprises:
carrying out statistics on enterprise information, and determining a statistic class label of a target enterprise user;
and/or the presence of a gas in the gas,
determining a rule class label of a target enterprise user according to the enterprise information and the label rule;
and/or the presence of a gas in the gas,
and determining the mining class label of the target enterprise user according to the enterprise information and the label model.
The user tags of the target enterprise users may be determined in three ways.
The method I is to develop a statistical class label. The label obtained by performing basic statistical calculation based on data is a user label which is obtained by sorting from a business application scene aiming at external data, platform behavior data and transaction data. The statistical class labels may include: basic information related labels, customer value related labels, business state related labels, incidence relation related labels, credit and wind direction related labels and behavior and preference related labels. The basic information may include: the enterprise age, the industry to which the enterprise belongs, the administrative region of the registration place and the like; customer values may include: company market value, financing amount, whether the enterprise is high and new, etc.; the operational status may include: business income, business income increment rate and the like; the association relationship may include: the number of associated companies, the number of associated industry companies, etc.; the credits and wind direction may include: public opinion score, administrative penalty accumulation times, whether administrative penalty exists at present, and the like; behaviors and preferences may include: the first registration time, the number of logins in the last 30 days, whether the authentication is performed or not, and the like.
And a second mode is to develop a rule class label. Based on the requirement of service operation, the service layer makes the label of the rule. The business formulates label rules from scenes of user finding, user marketing, user introduction and user life cycle. Table 2 below shows the corresponding rule labels for the scenario of the supply chain platform for "find user".
Table 2 corresponding rule tag for "find user" scenario
Figure BDA0002907281580000091
Figure BDA0002907281580000101
And thirdly, developing and excavating the class labels. Because the formulation of the business rules has subjectivity, the labels are inaccurate, the translation is based on the original business experience, and the qualitative change value brought by the data cannot be reflected; on the other hand, data is often incomplete, so that the label is missed, and the actual function of the label cannot be played. Therefore, models such as "machine learning", "regression prediction", and the like are introduced to perform mining of the tags. An example of a mining class label is shown in table 3 below.
Table 3 mining class tag examples
Figure BDA0002907281580000102
Fig. 3 is a schematic diagram of a flow of another method for determining a tag weight according to an embodiment of the present invention. An embodiment of the present invention provides a method for determining a label weight, as shown in fig. 3, including:
step 301: and carrying out data cleaning on the enterprise information.
Step 302: and determining at least one user label of the target enterprise user according to the cleaned enterprise information.
Step 303: determining an influence weight of the target enterprise user corresponding to the user label, wherein the influence weight comprises at least one of the following: label type weight, time attenuation weight, user behavior frequency weight and word frequency weight.
Step 304: and determining the label weight of the user label corresponding to the target enterprise user according to the influence weight.
The cleaning process may include at least one of normalization, outlier processing, missing value processing, feature deletion processing. The numerical field of the enterprise information may have different dimensions and dimension units, and the data set is normalized, so that the data of all dimensions can be in the same order of magnitude. Abnormal value processing refers to determining abnormal data of individual values which obviously deviate from other data in the enterprise information, and deleting or replacing the abnormal data. The missing value processing includes deleting data of the missing value and determining possible values of the data of the missing value, and interpolating the data of the missing value by the possible values.
By cleaning the enterprise, the risk of adverse effects on subsequent operations due to non-normative data in the enterprise can be reduced.
In one embodiment of the invention, the data cleaning of the enterprise information comprises the following steps: and unifying enterprise information identification in the data. Fig. 4 is a schematic diagram illustrating a flow of a method for determining an enterprise information identifier according to an embodiment of the present invention. An embodiment of the present invention provides a method for determining an enterprise information identifier, as shown in fig. 4, including:
if the enterprise information comprises the enterprise certificate number and the enterprise certificate number passes through the industrial and commercial data verification, determining the enterprise certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises an enterprise certificate number and an enterprise name, the enterprise certificate number does not pass the verification of the industrial and commercial data, and a first certificate number corresponding to the enterprise name exists in the industrial and commercial data, determining the first certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises an enterprise name and a second certificate number corresponding to the enterprise name exists in the industrial and commercial data, determining the second certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises the enterprise certificate number and the enterprise name, and the enterprise certificate and the enterprise name number do not pass the industrial and commercial data verification, determining the enterprise certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
and if the enterprise information comprises the enterprise name and the enterprise name number does not pass the industrial and commercial data verification, determining the enterprise name as the identifier of the enterprise information.
Because each platform calls different basic components, the unified ID rules of each platform are different, and the unified ID identification based on enterprise-level users is not available. Therefore, the user center needs to establish a unified rule of enterprise-level users.
At present, the verification strength of each platform about enterprises is different, and the three perfect enterprise elements, enterprise names and legal information are filled in some platforms and are subjected to manual strong verification; the existing platform fills in three factors of an enterprise, an enterprise name and legal information, but a manual verification link carries out weak verification due to business requirements; some platforms may have only filled in the business name field. Due to the existence of the scenes, the three elements of the enterprise cannot be identified by the unique enterprise user ID.
Fig. 4 is a schematic diagram illustrating a flow of a method for determining an enterprise information identifier according to an embodiment of the present invention. As shown in fig. 4, if there is an enterprise certificate number, and the certificate number can be verified through the industrial and commercial data, the certificate number is used as the unique ID of the enterprise user; if the enterprise certificate number and the enterprise name exist, but the certificate number cannot pass the verification of the industrial and commercial data, inquiring the industrial and commercial data through the enterprise name, and if the enterprise certificate number can be inquired, reading the enterprise certificate number of the industrial and commercial data to be used as the unique ID of the enterprise user; if the business name is only the business name, inquiring the business data through the business name, and if the business name can be inquired, reading the business certificate number of the business data to be used as the unique ID of the business user; if the enterprise certificate number and the enterprise name exist, if the certificate number and the name of the enterprise cannot complete the verification of the industrial and commercial data, the certificate number is used as the unique ID of the enterprise user; if only the enterprise name exists, if the verification of the industrial and commercial data cannot be completed, the enterprise name is used as the unique identifier to create the unique ID of the enterprise user.
In one embodiment of the invention, the data cleaning of the enterprise information comprises the following steps:
selecting a target data source according to the priority of the data source;
acquiring updating information corresponding to the enterprise information from a target data source;
and updating the enterprise information according to the updating information.
In one embodiment of the invention, if the business data can be searched, the basic information of the enterprise is updated on a daily basis based on the business data; if the industrial and commercial data cannot be searched, defining the priority of each platform information of each field, and cleaning and updating the data; in the process of unifying enterprise users, when data conflict, the industrial and commercial data is preferentially taken as the standard, and then coverage calculation is carried out according to the priority of enterprise information of each platform.
Fig. 5 is a schematic diagram illustrating a flow of still another method for determining a tag weight according to an embodiment of the present invention. An embodiment of the present invention provides a method for determining a label weight, as shown in fig. 5, including:
step 501: at least one user tag corresponding to the target enterprise user is determined.
Step 502: determining an influence weight of the target enterprise user corresponding to the user label, wherein the influence weight comprises at least one of the following: label type weight, time attenuation weight, user behavior frequency weight and word frequency weight.
Step 503: and determining the coefficient corresponding to each influence weight.
Step 504: and determining the label weight of the user label corresponding to the target enterprise user according to each influence weight and the coefficient corresponding to each influence weight.
The coefficient corresponding to the influence weight may characterize the importance of the influence weight. And calculating the weighted sum of the influence weights according to the influence weights and the coefficients corresponding to the influence weights, and determining the weighted sum as the label weight of the user label corresponding to the target enterprise user.
In one embodiment of the invention, the impact weights include: a tag type weight;
determining the influence weight of the user tag corresponding to the target enterprise user, including:
acquiring enterprise attribute information of a target enterprise user;
and determining the label type weight according to the enterprise attribute information.
The label type weight is a weight value defined by a business operator according to enterprise information of an enterprise user, such as enterprise scale, enterprise age, settlement amount, loan amount, whether to execute a financial penalty or not, and the like, by using business experience.
In one embodiment of the invention, the impact weights include: a time decay weight;
determining the influence weight of the user tag corresponding to the target enterprise user, including:
acquiring behavior time corresponding to a user tag;
from the behavior time, a time decay weight is determined.
Time decay is the degree of decay that identifies a user tag as affected by time. The farther the action time is from the current time, the less meaningful the action is to the current action of the enterprise user, such as registration time of the enterprise user, establishment time of the enterprise, and the like.
In one embodiment of the invention, the impact weights include: user behavior times weight;
determining the influence weight of the user tag corresponding to the target enterprise user, including:
acquiring the behavior times of a user tag in a statistical period;
and determining the weight of the behavior times of the user according to the behavior times.
The user behavior time weighting can be determined according to the occurrence time of the user behavior time within the statistical time. Specifically, according to the statistical time dimension of the user tag, for example, the activity of the enterprise user is counted by 30 days, the more times the enterprise user generates the tag in 30 days, the greater the influence of the tag on the enterprise user.
Because the enterprise data sources are different and the enterprise is in different stages, enterprise data loss exists, the data quality and the calculation accuracy are influenced, and the subsequent modeling and data analysis of the number of the enterprise users are deeply influenced, so that the supplement of the tag loss value is very important.
From the data missing type, the following can be classified: and (3) complete random deletion, namely, the deletion of the data is random and does not depend on any incomplete variable or complete change. Random deletion: the absence of data is not random, so that it comes from other variables. Monotonic loss: if the data is data for a time series class, there may be a miss over time.
The label missing value filling is commonly interpolated by the similar mean value: and predicting the missing variable by using a hierarchical clustering model, and interpolating by using a mean value. For example, X, Y, Z enterprise user tags are complete variables and a enterprise tag is a missing variable, then X, Y, Z tags can be entered for clustering, and then the mean of the different classes can be interpolated according to the class to which the missing variable belongs.
In the embodiment of the invention, the enterprise user data of each platform is cleaned and unified, and a unified enterprise user view is established; establishing a good-use, easy-use and useful enterprise-level user label through data statistics, rule calculation, data mining and stream calculation, and directly generating a business value; and (3) marking a reasonable, accurate and real-time enterprise user label for each enterprise through a weight calculation method, a similarity calculation method and a combined label calculation method.
Fig. 6 is a schematic structural diagram of an apparatus for determining a tag weight according to an embodiment of the present invention, including:
a first determining module 601, configured to determine at least one user tag corresponding to a target enterprise user;
a second determining module 602, configured to determine an influence weight of the user tag corresponding to the target enterprise user, where the influence weight includes at least one of: label type weight, time attenuation weight, user behavior frequency weight and word frequency weight;
a third determining module 603, configured to determine, according to the influence weight, a tag weight of a user tag corresponding to the target enterprise user.
Optionally, the first determining module 601 is specifically configured to:
acquiring enterprise information of a target enterprise user;
determining at least one user tag of a target enterprise user according to the enterprise information, wherein the classification of the user tag comprises: a business scenario-based label classification and an enterprise attribute-based label classification.
Optionally, the first determining module 601 is specifically configured to:
carrying out statistics on enterprise information, and determining a statistic class label of a target enterprise user;
and/or the presence of a gas in the gas,
determining a rule class label of a target enterprise user according to the enterprise information and the label rule;
and/or the presence of a gas in the gas,
and determining the mining class label of the target enterprise user according to the enterprise information and the label model.
Optionally, the first determining module 601 is specifically configured to:
carrying out data cleaning on enterprise information;
and determining at least one user label of the target enterprise user according to the cleaned enterprise information.
Optionally, the first determining module 601 is specifically configured to:
if the enterprise information comprises the enterprise certificate number and the enterprise certificate number passes through the industrial and commercial data verification, determining the enterprise certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises an enterprise certificate number and an enterprise name, the enterprise certificate number does not pass the verification of the industrial and commercial data, and a first certificate number corresponding to the enterprise name exists in the industrial and commercial data, determining the first certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises an enterprise name and a second certificate number corresponding to the enterprise name exists in the industrial and commercial data, determining the second certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises the enterprise certificate number and the enterprise name, and the enterprise certificate and the enterprise name number do not pass the industrial and commercial data verification, determining the enterprise certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
and if the enterprise information comprises the enterprise name and the enterprise name number does not pass the industrial and commercial data verification, determining the enterprise name as the identifier of the enterprise information.
Optionally, the first determining module 601 is specifically configured to:
selecting a target data source according to the priority of the data source;
acquiring updating information corresponding to the enterprise information from a target data source;
updating the enterprise information according to the updating information;
optionally, the target data source is industrial and commercial data;
the first determining module 601 is specifically configured to:
and obtaining the updating information corresponding to the enterprise information from the industrial and commercial data.
Optionally, the target data source is another platform;
the first determining module 601 is specifically configured to:
according to the updated information corresponding to the enterprise information obtained from other platforms.
Optionally, the third determining module 603 is specifically configured to:
determining a coefficient corresponding to each influence weight;
and determining the label weight of the user label corresponding to the target enterprise user according to each influence weight and the coefficient corresponding to each influence weight.
Optionally, the impact weight comprises: a tag type weight;
the second determining module 602 is specifically configured to:
acquiring enterprise attribute information of a target enterprise user;
and determining the label type weight according to the enterprise attribute information.
Optionally, the impact weight comprises: a time decay weight;
the second determining module 602 is specifically configured to:
acquiring behavior time corresponding to a user tag;
from the behavior time, a time decay weight is determined.
Optionally, the impact weight comprises: user behavior times weight;
the second determining module 602 is specifically configured to:
acquiring the behavior times of a user tag in a statistical period;
and determining the weight of the behavior times of the user according to the behavior times.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a first determination module, a second determination module, and a third determination module. Where the names of these modules do not in some cases constitute a limitation on the module itself, for example, the first determination module may also be described as a "module that determines at least one user tag corresponding to a target enterprise user".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
determining at least one user tag corresponding to a target enterprise user;
determining an impact weight of the target enterprise user corresponding to the user tag, wherein the impact weight comprises at least one of: label type weight, time attenuation weight, user behavior frequency weight and word frequency weight;
and determining the label weight of the user label corresponding to the target enterprise user according to the influence weight.
According to the technical scheme of the embodiment of the invention, the influence weight of the user label corresponding to the target enterprise user is determined; and determining the label weight of the user label corresponding to the target enterprise user according to the influence weight. The tag weight describes the accuracy with which the user tag categorizes the target enterprise user. Therefore, the characteristic attributes of the enterprise user can be accurately described through the user label and the label weight.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (15)

1. A method for determining label weight is characterized by comprising the following steps:
determining at least one user tag corresponding to a target enterprise user;
determining an impact weight of the target enterprise user corresponding to the user tag, wherein the impact weight comprises at least one of: label type weight, time attenuation weight, user behavior frequency weight and word frequency weight;
and determining the label weight of the user label corresponding to the target enterprise user according to the influence weight.
2. The method of claim 1, wherein determining at least one user tag corresponding to a target enterprise user comprises:
acquiring enterprise information of the target enterprise user;
determining at least one user tag of the target enterprise user according to the enterprise information, wherein the classification of the user tag comprises: a business scenario-based label classification and an enterprise attribute-based label classification.
3. The method of claim 2, wherein determining at least one user tag for the target enterprise user based on the enterprise information comprises:
counting the enterprise information, and determining a counting class label of the target enterprise user;
and/or the presence of a gas in the gas,
determining a rule class label of the target enterprise user according to the enterprise information and the label rule;
and/or the presence of a gas in the gas,
and determining the mining class label of the target enterprise user according to the enterprise information and the label model.
4. The method of claim 2, wherein determining at least one user tag for the target enterprise user based on the enterprise information comprises:
performing data cleaning on the enterprise information;
and determining at least one user label of the target enterprise user according to the cleaned enterprise information.
5. The method of claim 4, wherein the data cleansing of the business information comprises:
if the enterprise information comprises an enterprise certificate number and the enterprise certificate number passes the industrial and commercial data verification, determining the enterprise certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises an enterprise certificate number and an enterprise name, the enterprise certificate number does not pass the verification of the industrial and commercial data, and a first certificate number corresponding to the enterprise name exists in the industrial and commercial data, determining the first certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises an enterprise name and a second certificate number corresponding to the enterprise name exists in the industrial and commercial data, determining the second certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
if the enterprise information comprises an enterprise certificate number and an enterprise name, and the enterprise certificate and the enterprise name number do not pass the industrial and commercial data verification, determining the enterprise certificate number as the identifier of the enterprise information;
or the like, or, alternatively,
and if the enterprise information comprises an enterprise name and the enterprise name number does not pass the industrial and commercial data verification, determining the enterprise name as the identifier of the enterprise information.
6. The method of claim 4, wherein the data cleansing of the business information comprises:
selecting a target data source according to the priority of the data source;
acquiring updating information corresponding to the enterprise information from the target data source;
and updating the enterprise information according to the updating information.
7. The method of claim 6, wherein the target data source is business data;
the acquiring of the update information corresponding to the enterprise information from the target data source includes:
and obtaining the updating information corresponding to the enterprise information from the industrial and commercial data.
8. The method of claim 6, wherein the target data source is an other platform;
the acquiring of the update information corresponding to the enterprise information from the target data source includes:
and obtaining the updating information corresponding to the enterprise information from other platforms.
9. The method of claim 1, wherein determining the tag weight of the target enterprise user corresponding to the user tag based on the impact weight comprises:
determining a coefficient corresponding to each influence weight;
and determining the label weight of the user label corresponding to the target enterprise user according to each influence weight and the coefficient corresponding to each influence weight.
10. The method of claim 1, wherein the impact weights comprise: a tag type weight;
the determining the influence weight of the target enterprise user corresponding to the user tag includes:
acquiring enterprise attribute information of a target enterprise user;
and determining the label type weight according to the enterprise attribute information.
11. The method of claim 1, wherein the impact weights comprise: a time decay weight;
the determining the influence weight of the target enterprise user corresponding to the user tag includes:
acquiring behavior time corresponding to the user tag;
and determining the time attenuation weight according to the behavior time.
12. The method of claim 1, wherein the impact weights comprise: user behavior times weight;
the determining the influence weight of the target enterprise user corresponding to the user tag includes:
acquiring the behavior times of the user label in a statistical period;
and determining the weight of the user behavior times according to the behavior times.
13. An apparatus for determining a tag weight, comprising:
the first determining module is used for determining at least one user tag corresponding to the target enterprise user;
a second determining module, configured to determine an impact weight of the target enterprise user corresponding to the user tag, where the impact weight includes at least one of: label type weight, time attenuation weight, user behavior frequency weight and word frequency weight;
and the third determining module is used for determining the label weight of the user label corresponding to the target enterprise user according to the influence weight.
14. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-12.
15. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-12.
CN202110075021.XA 2021-01-20 2021-01-20 Method and device for determining label weight Pending CN112860672A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110075021.XA CN112860672A (en) 2021-01-20 2021-01-20 Method and device for determining label weight

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110075021.XA CN112860672A (en) 2021-01-20 2021-01-20 Method and device for determining label weight

Publications (1)

Publication Number Publication Date
CN112860672A true CN112860672A (en) 2021-05-28

Family

ID=76007644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110075021.XA Pending CN112860672A (en) 2021-01-20 2021-01-20 Method and device for determining label weight

Country Status (1)

Country Link
CN (1) CN112860672A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327048A (en) * 2021-06-16 2021-08-31 力合科创集团有限公司 Enterprise portrait calculation method, medium, and program based on big data and multidimensional features
CN114090854A (en) * 2022-01-24 2022-02-25 佰聆数据股份有限公司 Intelligent label weight updating method and system based on information entropy and computer equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327048A (en) * 2021-06-16 2021-08-31 力合科创集团有限公司 Enterprise portrait calculation method, medium, and program based on big data and multidimensional features
CN114090854A (en) * 2022-01-24 2022-02-25 佰聆数据股份有限公司 Intelligent label weight updating method and system based on information entropy and computer equipment
CN114090854B (en) * 2022-01-24 2022-04-19 佰聆数据股份有限公司 Intelligent label weight updating method and system based on information entropy and computer equipment

Similar Documents

Publication Publication Date Title
CN107220217A (en) Characteristic coefficient training method and device that logic-based is returned
CN112860672A (en) Method and device for determining label weight
CN109784352A (en) A kind of method and apparatus for assessing disaggregated model
CN111427974A (en) Data quality evaluation management method and device
US20220374814A1 (en) Resource configuration and management system for digital workers
CN112328869A (en) User loan willingness prediction method and device and computer system
CN110866698A (en) Device for assessing service score of service provider
CN112950359B (en) User identification method and device
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
US20190220780A1 (en) Quantitative discovery of name changes
Perera et al. A rule-based system for automated generation of serverless-microservices architecture
CN113902449A (en) Enterprise online transaction system risk early warning method and device and electronic equipment
CN113505990A (en) Enterprise risk assessment method and device, electronic equipment and storage medium
CN111179055B (en) Credit line adjusting method and device and electronic equipment
CN115422028A (en) Credibility evaluation method and device for label portrait system, electronic equipment and medium
CN114897607A (en) Data processing method and device for product resources, electronic equipment and storage medium
CN114638503A (en) Asset risk pressure testing method, device, equipment and storage medium
CN114282881A (en) Depreciation measuring and calculating method and device, storage medium and computer equipment
CN113450208A (en) Loan risk change early warning and model training method and device
CN114548631A (en) Dynamic evaluation method and device
CN110895564A (en) Potential customer data processing method and device
CN112529236A (en) Target object identification method and device, electronic equipment and storage medium
CN116823407B (en) Product information pushing method, device, electronic equipment and computer readable medium
CN116450708B (en) Enterprise data mining method and system
CN113743906A (en) Method and device for determining service processing strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination