CN114090854B - Intelligent label weight updating method and system based on information entropy and computer equipment - Google Patents

Intelligent label weight updating method and system based on information entropy and computer equipment Download PDF

Info

Publication number
CN114090854B
CN114090854B CN202210076732.3A CN202210076732A CN114090854B CN 114090854 B CN114090854 B CN 114090854B CN 202210076732 A CN202210076732 A CN 202210076732A CN 114090854 B CN114090854 B CN 114090854B
Authority
CN
China
Prior art keywords
label
weight
coefficient
tag
coverage rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210076732.3A
Other languages
Chinese (zh)
Other versions
CN114090854A (en
Inventor
姜磊
朱振航
杨钊
严海龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Brilliant Data Analytics Inc
Original Assignee
Brilliant Data Analytics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Brilliant Data Analytics Inc filed Critical Brilliant Data Analytics Inc
Priority to CN202210076732.3A priority Critical patent/CN114090854B/en
Publication of CN114090854A publication Critical patent/CN114090854A/en
Application granted granted Critical
Publication of CN114090854B publication Critical patent/CN114090854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/40Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to a big data label technology, and relates to an intelligent label weight updating method, a system and computer equipment based on information entropy, wherein the method comprises the following steps: acquiring source data comprising a label set, a label coverage rate, a label use behavior frequency set, a label behavior weight set and a service scene coefficient; considering the overall distribution condition of the label coverage rate, introducing a label coverage rate reference value as the base number of the logarithm in the information quantity calculation formula, improving the information quantity calculation formula and generating the label information quantity weight; automatically updating the tag use weight coefficient based on the tag use behavior times and the tag behavior weight; calculating the attenuation coefficient of the label weight; and generating the label weight and dynamically updating according to the attenuation mode of the label weight and by integrating the label information weight, the label use weight coefficient and the service scene coefficient. The invention enables the coefficient related to the updating of the label weight to be dynamically adjusted, and solves the problem that the prior art is difficult to ensure the accuracy and effectiveness of the label weight.

Description

Intelligent label weight updating method and system based on information entropy and computer equipment
Technical Field
The invention belongs to the technical field of big data labels, and particularly relates to an intelligent label weight updating method and system based on information entropy and computer equipment.
Background
The big data label is a characteristic mark obtained by highly extracting, summarizing, analyzing and mining data, expresses the conclusion and judgment of an object, is a bridge between data and business, can support and apply accurate strategy making and make a decision quickly based on the characteristic, and plays an increasingly important role in the digital era.
The traditional method for constructing the client label is generally based on various types of data inside and outside an enterprise, a corresponding label is marked on the client according to a specific rule, meanwhile, the label weight is calculated according to a fixed label weight calculation formula when the label is updated, and the corresponding label weight calculation formula lacks a dynamic updating mode. With the development of services and the rapid change of data, after a period of operation, the label weight calculation mode which cannot be dynamically updated gradually fails to meet the requirement of accurate services. Meanwhile, after the label weight is generated, the label weight can only be adjusted along with the update of the label, so that the latest characteristics of the service object cannot be fed back in time in the use process of the label weight.
Meanwhile, due to changes of business requirements, use environments and feedback data, the value of the label is greatly influenced by time changes, and the image shows that the value of the label is gradually reduced along with the time. To solve this problem, the label update frequency is increased or attenuation coefficient is increased in the label weight calculation. The method for increasing the update frequency of the tag is only suitable for the situation that data change can be rapidly acquired and the computing capability is strong. After some labels are created, because corresponding source data changes cannot be acquired, especially manually marked labels, attenuation coefficients are needed to ensure the accuracy of label weights; conventional implementations, which are generally based on newton's law of cooling, utilize an exponential decay function as a time decay factor and can only specify a fixed date as the decay start time, are not easily understood by business personnel and cannot satisfy the decay characteristics of all tags.
Disclosure of Invention
The invention provides an intelligent updating method, a system and computer equipment for label weight based on information entropy, which improve the calculation mode of label information weight and label weight attenuation by setting label information weight, label use weight coefficient, service scene coefficient and the like, and construct an intelligent label weight updating mode, so that parameters or coefficients related to label weight updating can be flexibly and dynamically adjusted, and the technical problems that the prior art lacks intelligent dynamic adjusting means and is difficult to ensure that the label weight is continuously, accurately and effectively solved.
On one hand, the intelligent updating method of the label weight based on the information entropy comprises the following steps:
s1, acquiring source data used for label weight calculation, and preprocessing the source data; the source data includes: label set A1, total number of service objects T corresponding to labels, and label coverage rate P (T)i) Label usage behavior times set a2 (t)i) Label behavior weight set A3 and service scene coefficient CSi(ti);
S2, improving an information quantity calculation formula of the label, and introducing a label coverage rate reference value as the base number of the logarithm in the information quantity calculation formula by considering the overall distribution condition of the label coverage rate, wherein the label coverage rate is a true number; taking the improved information quantity calculation formula as a label information quantity weight generation formula to generate a label information quantity weight;
s3, automatically updating the label use weight coefficient based on the label use behavior times and the label behavior weight;
s4, calculating an attenuation coefficient of the label weight according to the label use scene and the artificial adjustment coefficient;
and S5, generating label weight and dynamically updating according to the attenuation mode of the label weight, the label information weight, the label use weight coefficient and the service scene coefficient on the basis of the steps S1-S4.
On the other hand, the intelligent updating system for label weight based on information entropy comprises:
the source data acquisition module is used for acquiring source data used for label weight calculation and preprocessing the source data; the source data includes: label set A1, total number of service objects T corresponding to labels, and label coverage rate P (T)i) Label usage behavior times set a2 (t)i) Label behavior weight set A3 and service scene coefficient CSi(ti);
The label information quantity weight generation module is used for improving an information quantity calculation formula of a label, taking the overall distribution condition of the label coverage rate into consideration, introducing a label coverage rate reference value as the base number of the logarithm in the information quantity calculation formula, and taking the label coverage rate as a true number; taking the improved information quantity calculation formula as a label information quantity weight generation formula to generate a label information quantity weight;
the tag use weight coefficient updating module is used for automatically updating the tag use weight coefficient based on the tag use behavior times and the tag behavior weight;
the tag weight attenuation coefficient calculation module is used for calculating the attenuation coefficient of the tag weight according to the tag use scene and the manual adjustment coefficient;
and the label weight dynamic updating module is used for generating and dynamically updating the label weight according to the attenuation mode of the label weight and by integrating the label information weight, the label use weight coefficient and the service scene coefficient.
In still another aspect, a computer device according to the present invention includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to perform the steps of the intelligent tag weight updating method according to the present invention.
Compared with the prior art, the invention has the following beneficial effects:
1. based on an improved information amount calculation formula, the label information amount relative to the reference value is obtained by dynamically and automatically calculating the label coverage rate reference value, and then the label information amount weight is obtained, so that the overall distribution condition of the label weight is optimized, and the reasonability of the label weight is improved.
2. The using weight coefficient of the label is increased so as to reflect the using value of the label in the label weight and further improve the comprehensiveness of the label weight; the label use weight is obtained through automatic calculation based on the behavior times and the behavior weight, and can be automatically weighted and updated according to the label use data, so that the accuracy of the weight coefficient is guaranteed.
3. The label basic weight is increased to solve the problem of cold start of label weight calculation under the condition of lacking data, ensure that the label weight can still be used when lacking initial data, and simultaneously enable a user to manually adjust the corresponding label weight so as to meet various weight application requirements of the user.
4. The label weight attenuation calculation mode is improved, and a user can dynamically adjust the corresponding attenuation coefficient based on business understanding by supporting various attenuation modes, attenuation dates and increasing artificial adjustment factors so as to meet the attenuation characteristics of different label types. Meanwhile, after the training sample is provided, the corresponding attenuation weight can be automatically generated, and the reasonability of the label weight with the attenuation characteristic is improved.
5. Proposing a service scene coefficient CSi(ti) The importance of the label under different scenes is evaluated, and the flexibility and the adaptive range of the label weight are improved. After the service scene coefficients are set, the same tag has different service scene coefficients in different application scenes, that is, the same tag has a plurality of different weight coefficients. That is, the invention can flexibly and dynamically adjust the business scene coefficient according to the change of the scene by setting the business scene coefficient, thereby dynamically adjusting and updating the label weight according to the scene.
Drawings
FIG. 1 is a flowchart of an intelligent tag weight updating method based on information entropy according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an intelligent tag weight updating system based on information entropy according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Example 1
The embodiment provides an intelligent updating method of label weight based on information entropy, which comprises the following steps:
s1, acquiring source data used for label weight calculation, and preprocessing the source data; the source data includes: label set A1, total number of service objects T corresponding to labels, and label coverage rate P (T)i) Label basis weight C (t)i) Label usage behavior times set a2 (t)i) Label behavior weight set A3 and service scene coefficient CSi(ti) And so on.
Label set a1= { t = { (t) }0,t2,t3,....,tnRefer to a collection of labels, packets, of some kindIncluding but not limited to: 1. all label sets under a certain business object, such as all labels under a customer business body; 2. all label sets under a certain service scene; 3. all label sets under a specific classification, such as behavior class labels; 4. a set of full-size tags. Wherein t is1,t2,t3,....,tnRefers to a specific label.
The total number T of the business objects corresponding to the label refers to the total number of the business objects corresponding to the label for which the weight is currently calculated, for example, the total number of the business objects corresponding to the label of the "high-risk customer" refers to the total number of the business objects that can be labeled as the label of the "high-risk customer", that is, the total number of customers having corresponding data in the system.
Label coverage ratio P (t)i) Refers to possession of the tag tiThe ratio of the number of the service objects to the total number of the service objects corresponding to the tag, P (t)i)=T(ti) [ solution ] T, wherein T (T)i) Means possession of the tag tiThe number of business objects.
Label basis weight C (t)i) The basic weight input by a user is usually greater than 0, the initial value is 0, and the method can be used for manually adjusting the label weight and solving the cold start problem, so that on one hand, the label weight can still be used when the initial data is lacked, and on the other hand, the entry of manually adjusting the weight is increased.
Label usage behavior times set A2 (t)i)={a1,a2,..,anMeans the label tiCorresponding operation behavior record number, wherein a1,a2,..,anThe number of corresponding behaviors is referred to, and the data of the operation behaviors comprises but is not limited to the number of times of querying the tag information, the number of times of calling the tag result, the number of times of approving the tag, the number of tag evaluation records and the like. In this embodiment, the statistical period of the tag usage behavior times may only count the number of behavior records in a specified time period, such as the number of behavior records in three months, or may perform weighted calculation on the behavior times according to the behavior occurrence time, so as to reflect that the importance of the historical behavior records to the current is continuously weakened, for example: behavioral records three months ago to one year were according to 0.5And (4) carrying out frequency statistics on the weight, wherein the frequency statistics is carried out on the behavior records one year ago according to the weight of 0.1.
Label behavior weight set a3= { b = { b =1,b2,..,bnRecording corresponding weight coefficient sets for corresponding behaviors to reflect importance of different behaviors to label weights, wherein the weight coefficient sets (namely label behavior weight sets) are mainly obtained from historical experience values, and b is1Corresponding behavior a1Expressed behavior weights. In this embodiment, in different scenarios, the tag behavior weight set may be manually adjusted according to the characteristics of the scenario, that is, different tag behavior weight sets may exist in different scenarios.
Service scene coefficient CSi(ti) The weighting weight coefficient of the label in a certain scene is shown, the coefficient of the service scene is greater than 0, and the initial value is 1. The setting mode of the service scene coefficient comprises three modes: the first is manual input by a user, namely the user can input a scene weight coefficient of a certain label in a certain scene in a system interface; the second is system automatic calculation, namely the system automatically performs weighting calculation according to the number of times of using the tag in a certain scene, for example, the tag A is frequently used in a first scene, and the system automatically increases the scene weight coefficient of the tag A in the first scene; the third method is to specify key labels, which is similar to the setting mode of manual input by users, except that the users do not need to input one by one, and only need to specify the key labels in a certain scene, and the system automatically increases the scene weight coefficients of the specified key labels and the similar labels in the scene.
In this embodiment, taking the calculation of the label weight of the "high-value customer" label as an example, step S1 is described in detail as follows:
s101, obtain client tag set a1= { 'high value client', 'medium value client', 'low value client', 'high risk client',nthe label set a1 contains the full number of customer labels, n total.
S102, obtaining the total number T of the clients, namely the total number T of the business objects corresponding to the labels.
S103, acquiring owned 'high-value customer'Number of customers T (T) of tagHigh value customer) And obtain the coverage rate P (t) of the label of the' high-value clientHigh value customer)=T(tHigh value customer)/T。
S104, acquiring label basic weight C (t) input by a useri) If there is no input, the default tag basis weight is 0.
S105, acquiring a 'high-value customer' label use behavior time set A2 (t)High value customer)={aQuery,aInvoking,aLike points,aEvaluation of,aTreading onIn which a isQueryIs the number of times the tag information is queried, aInvokingIs the number of times the tag result is called, aLike pointsIs the number of times the tag points, aEvaluation ofIs the number of label evaluation records, aTreading onIs the number of tag steps.
S106, obtaining a label behavior weight set A3= { b =Query, bInvoking,bLike points,bEvaluation of,bTreading onIn which b isQueryIs the weight of the behavior of the label information to be inquired, bInvokingIs the weight of the behavior of the tag result being called, bLike pointsIs the weight of behavior of the tag like, bEvaluation ofIs the label evaluation behavior weight, bTreading onIs the tag tread behavior weight. Except for bTreading onIn addition to negative numbers, the others are positive numbers.
S106, obtaining a business scene coefficient CS (t)High value customer)。
S2, improving an information quantity calculation formula of the label, and introducing a label coverage rate reference value as the base number of the logarithm in the information quantity calculation formula by considering the overall distribution condition of the label coverage rate, wherein the label coverage rate is a true number; and taking the improved information quantity calculation formula as a label information quantity weight generation formula to generate a label information quantity weight.
The information entropy is a measure designed to quantify the uncertainty of information, and when an event occurs with a small probability, the information amount of the event is large, and when an event occurs with a large probability, the information amount of the event is small. By taking the idea of information entropy as a reference, the label can also be regarded as one piece of information of the business object, and if a large number of business objects possess the label, the amount of information contained in the label can be regarded as low; if only a small number of customers have the tag, the tag may be considered to contain a relatively high amount of information.
Because the label only has two states of "marked" and "unmarked", conventionally, only the information entropy of the label is calculated by using the information amount calculation formula i (x) = -log (p (x)), where the label coverage rate p (x) can be used as the probability of the label occurrence, but the calculation formula does not consider the overall distribution of the label coverage rate, and lacks a corresponding reference, such as: the weight should be 1 when the tag coverage approaches the tag coverage reference value, and the change of the weight should be more obvious when the tag coverage deviates more from the tag coverage reference value.
Based on this, the embodiment improves the above-mentioned conventional information amount calculation formula, and uses the label coverage reference value as the base number of log logarithm, and uses the label coverage as the true number, so as to ensure the reasonability of label weight. The specific calculation formula of the improved label information weight is as follows:
I(ti) = logp (Standard)(P(ti))=
Figure 67907DEST_PATH_IMAGE001
Wherein, I (t)i) Is label information weight, P (reference) is label coverage rate reference value, n is label quantity in current label set, and can be obtained by calculating coverage rate average value of current label set, that is
Figure 944727DEST_PATH_IMAGE002
. In addition, numerical values such as a median value, a quartile value and the like of the label coverage can be selected as a reference value of the label coverage according to a label distribution rule; the user may designate a certain label as a reference label and use the latest coverage of the designated reference label as a label coverage reference value. And the system can switch the calculation mode of the label coverage rate reference value among the three modes according to the condition that the user manually adjusts the basic weight of the label.
Label overlayRate reference value P (reference) and label basis weight C (t)i) Both of these parameters may affect the final label weight generation or update results. If the label coverage rate reference value is set reasonably, the label weight generated based on the label coverage rate reference value can be considered to be appropriate, so that the user does not need to manually adjust the label basis weight to influence the final label weight result. If the user often needs to manually adjust the basic weight of the label to influence the final label weight result, the setting is not reasonable probably because of the calculation mode of the label coverage rate reference value; under the condition, the system can automatically adjust the calculation mode of the corresponding label coverage rate reference value, if the median value of the label coverage rate is used as the label coverage rate reference value originally, the user can not frequently need to manually adjust the label basic weight to influence the final label weight result by changing the quartile value of the label coverage rate, if the manual adjustment times of the user are less, the system proves effective, and if the manual adjustment times of the user are more, the system proves worse, the system continues to adjust.
In the practical application process, whether the calculation mode of the label coverage rate reference value is suitable for the current situation of the user can be indirectly evaluated according to the frequency of manually adjusting the label basic weight by the user, and if the fact that the user frequently and manually adjusts the label basic weight to correct the final weight result of each label is found, the system automatically adjusts the corresponding calculation mode of the label coverage rate reference value to automatically adjust the label coverage rate reference value to the best effect. For example: and in the initial stage, the average value of the label coverage rate is used as a label coverage rate reference value, the final weight of the label is corrected by finding that the user frequently and manually adjusts the basic weight of the label, the system can automatically adjust the median value of the label coverage rate to be used as the label coverage rate reference value, automatically compares the median value with the frequency of the basic weight of the manually adjusted label before adjustment after adjustment, and judges whether to perform switching again, reduction or prompt to enable the user to manually set the label coverage rate reference value.
In this embodiment, the process of obtaining the weight of the tag information amount by using the improved information amount calculation formula is as follows:
s201, calculating a labelCoverage reference value
Figure 815732DEST_PATH_IMAGE002
The present embodiment is obtained by calculating the average label coverage of all the labels in the customer label set a 1.
S202, calculating label information weight I (t)High value customer) = logp (Standard)(P(ti))=
Figure 297660DEST_PATH_IMAGE001
And S3, automatically updating the label use weight coefficient based on the label use behavior times and the label behavior weight.
The importance of the tag needs to be combined with whether the tag is valuable to the user or not besides the information content of the tag, and whether the value is valuable to the user can be judged through the use behavior of the tag. The present embodiment automatically calculates the tag use weight coefficient based on the tag use behavior frequency and the tag behavior weight, so as to represent the use value of the tag in the tag weight.
The calculation formula of the label use weight coefficient is as follows:
UW(ti)=
Figure 87761DEST_PATH_IMAGE003
wherein, UW (t)i) A weighting factor is used for the tag,
Figure 401279DEST_PATH_IMAGE004
is a label tiNumber of certain actions, biSetting the upper limit value of each behavior weight for the corresponding behavior weight at the same time to prevent the occurrence of extreme value, if setting the upper limit value of the weight of the behavior i to 1, even if
Figure 935029DEST_PATH_IMAGE005
Is greater than 1, the corresponding behavior weight is also only 1.
In this embodiment, since the tag behavior weight may be adjusted according to the scene characteristics, the calculated tag usage weight coefficient may be dynamically adjusted sufficiently according to the change of the service scene, and the influence of the service scene on the generation and update of the tag is not reflected on the classification of the tag, but the same tag may be dynamically updated following the switching or change of the service scene, so that the accuracy of the tag weight coefficient is improved and enhanced well. The concrete embodiment is as follows: the label weight coefficient is determined by performing coarse adjustment through the service scene coefficient, and further performing fine adjustment according to the weights of different label behaviors in the service scene.
The update period of the tag usage weight coefficient can be real-time update, but is usually set to update according to a specified period in consideration of consuming system computing resources, such as: the updating is carried out according to hours and days.
In the present embodiment, the automatic update process of the tag usage weight coefficient of "high-value customer" is as follows
S301, calculating a tag use weight coefficient:
UW(thigh value customer)=
Figure 599359DEST_PATH_IMAGE003
Wherein
Figure 435728DEST_PATH_IMAGE006
iIs a label tiNumber of certain actions, biIs the corresponding action weight, namely UW (t)High value customer)=
Figure 959245DEST_PATH_IMAGE006
QuerybQuery+
Figure 31106DEST_PATH_IMAGE007
CallbInvoking+
Figure 81102DEST_PATH_IMAGE006
PraisebLike points+
Figure 698159DEST_PATH_IMAGE006
EvaluationbEvaluation of+
Figure 568026DEST_PATH_IMAGE007
Stepping onbTreading onIn this embodiment, the upper weight limit of each behavior is assumed to be k, and
Figure 443578DEST_PATH_IMAGE006
querybQuery
Figure 348080DEST_PATH_IMAGE006
CallbInvoking
Figure 401618DEST_PATH_IMAGE006
PraisebLike points
Figure 414573DEST_PATH_IMAGE006
EvaluationbEvaluation of
Figure 703603DEST_PATH_IMAGE006
Stepping onbTreading onThe absolute values of all results are less than k.
And S302, if the label use weight coefficient is updated every hour, calculating according to the latest behavior number every hour.
And S4, calculating the attenuation coefficient of the label weight according to the label use scene and the artificial adjustment coefficient. The method specifically comprises the following steps:
s401, when the system is started initially, a user needs to judge the label with weight attenuation according to business experience, and configures attenuation start date, attenuation period, attenuation coefficient and attenuation mode.
The decay start date includes, but is not limited to, a tag update date, a fixed date, and a dynamic date, where the dynamic date refers to a date that changes with respect to a specific business object state, such as: decay begins according to the client's birthday (this time is not the same for different clients). Decay periods include time periods such as weekly, daily, monthly, etc. The attenuation coefficient is a decimal less than 1 and greater than 0. The attenuation mode can be a plurality of modes such as fixed multiple, index, fixed value and the like so as to meet different label value attenuation characteristics.
S402, calculating the attenuation coefficient of the label weight according to the attenuation mode.
The different attenuation modes are not the same for the calculation of the attenuation coefficients:
when the attenuation mode is a fixed multiple or an exponential, the obtained attenuation coefficient is as follows: AW (t)i) = ((current date-decay start date)/number of decay cycles) × decay coefficient, or AW (t)i)=
Figure 869137DEST_PATH_IMAGE008
. The above attenuation coefficient calculation formula is only a simple implementation example, and in the practical application process, a person skilled in the art can train the corresponding attenuation coefficient calculation formula according to the value attenuation degree of the label.
When the attenuation mode is a fixed value, the attenuation coefficient is as follows: a (t)i) = ((current date-decay start date)/decay period) × decay coefficient.
Wherein, AW (t)i) The attenuation coefficient when the attenuation mode is a fixed multiple or an exponential, A (t)i) The attenuation coefficient is a constant value of the attenuation mode.
In this embodiment, assuming that the attenuation start date is D, the attenuation period is weekly attenuation, the attenuation coefficient is ks, and the attenuation mode may be a fixed multiple, the attenuation coefficient is calculated as follows: a (t)High value customer) = ((current date-D)/7) × ks.
Preferably, after a certain amount of training sample data is obtained, the similarity between the current label and the label with the configured attenuation coefficient can be calculated by converting the attribute information of the label into a vector, and if the label with higher similarity exists, the attenuation coefficient of the corresponding similar label is automatically referred.
And S5, generating and dynamically updating the label weight according to the attenuation mode of the label weight and by integrating the label information weight, the label use weight coefficient, the label basic weight and the service scene coefficient on the basis of the steps S1-S4.
This step dynamically generates and updates tag weights H (t)i) There are three cases:
1. when the label weight is not attenuated, it is calculated as follows: h (t)i)=( I(ti)*UW(ti)+ C(ti))* CSi(ti);
2. When the attenuation mode of the label weight is a fixed multiple or an exponential, the calculation is as follows: h (t)i)=( I(ti)*UW(ti)+ C(ti))* CSi(ti)* AW(ti) ;
3. When the attenuation mode of the tag weight is a fixed value, the following calculation is performed: h (t)i)=( I(ti)*UW(ti)+ C(ti))* CSi(ti)+A(ti);
Wherein, I (t)i) Weight of label information generated in step S2, UW (t)i) Use of a weighting factor, C (t), for the automatically updated label of step S3i) Being label basis weight, CSi(ti) For the traffic scenario coefficients, AW (t)i)、A(ti) The attenuation coefficient for the label weight calculated in step S4.
According to the attenuation mode of the label weight, selecting a corresponding label weight calculation formula to dynamically calculate and update the label weight H (t)i) The value is obtained. The tag weights may be updated in real time, but are typically set to be updated on a specified period, such as 15 minutes, hours, or days, to account for the consumption of system computing resources.
In this embodiment, the tag weight of the "high-value customer" is generated and automatically updated as follows:
s501, generating label weight H (t)High value customer)=(I(tHigh value customer)*UW(tHigh value customer)+ C(tHigh value customer))* CS(tHigh value customer)* AW(tHigh value customer) 。
S502, in this embodiment, if the label weight is updated every hour, the corresponding label weight is automatically updated every hour.
Preferably, steps S1 to S4 may be performed without performing pre-calculation, and after the calculation formula of step S5 is directly formed, the calculation formula is substituted into the corresponding data for one time to perform calculation, and update is completed by using the latest data when the label weight is updated.
Example 2
Based on the same inventive concept as embodiment 1, this embodiment provides an intelligent tag weight updating system based on information entropy, which includes the following modules:
the source data acquisition module is used for acquiring source data used for label weight calculation and preprocessing the source data; the source data includes: label set A1, total number of service objects T corresponding to labels, and label coverage rate P (T)i) Label usage behavior times set a2 (t)i) Label behavior weight set A3 and service scene coefficient CSi(ti);
The label information quantity weight generation module is used for improving an information quantity calculation formula of a label, taking the overall distribution condition of the label coverage rate into consideration, introducing a label coverage rate reference value as the base number of the logarithm in the information quantity calculation formula, and taking the label coverage rate as a true number; taking the improved information quantity calculation formula as a label information quantity weight generation formula to generate a label information quantity weight;
the tag use weight coefficient updating module is used for automatically updating the tag use weight coefficient based on the tag use behavior times and the tag behavior weight;
the tag weight attenuation coefficient calculation module is used for calculating the attenuation coefficient of the tag weight according to the tag use scene and the manual adjustment coefficient;
and the label weight dynamic updating module is used for generating and dynamically updating the label weight according to the attenuation mode of the label weight and by integrating the label information weight, the label use weight coefficient and the service scene coefficient.
Further, the source data acquired by the source data acquisition module further includes a label basis weight C (t)i). The label weight dynamic updating module integrates the label information weight, the label use weight coefficient, the label basic weight and the service scene system according to the attenuation mode of the label weightAnd counting, generating label weight and dynamically updating.
Further, in the tag information amount weight generating module, the introduced tag coverage rate reference value includes the following calculation modes: obtaining by calculating the average coverage rate of the current label set, or selecting a median value or a quartile value of the label coverage rate as a label coverage rate reference value according to a label distribution rule, or selecting the latest coverage rate of a reference label designated by a user as a label coverage rate reference value; the label information weight generating module also switches the calculation mode of the label coverage rate reference value among the calculation modes according to the manual adjustment condition of the label basic weight.
Example 3
Based on the same inventive concept as that of embodiment 1, this embodiment provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and running on the processor, and when the processor executes the computer program, the steps of the intelligent update method for tag weights in embodiment 1 are implemented.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. The label weight intelligent updating method based on the information entropy is characterized by comprising the following steps of:
s1, acquiring source data used for label weight calculation, and preprocessing the source data; the source data includes: label set A1, total number of service objects T corresponding to labels, and label coverage rate P (T)i) Label usage behavior times set a2 (t)i) Label behavior weight set A3 and service scene coefficient CSi(ti);
S2, improving an information quantity calculation formula of the label, and introducing a label coverage rate reference value as the base number of the logarithm in the information quantity calculation formula by considering the overall distribution condition of the label coverage rate, wherein the label coverage rate is a true number; taking the improved information quantity calculation formula as a label information quantity weight generation formula to generate a label information quantity weight;
s3, automatically updating the label use weight coefficient based on the label use behavior times and the label behavior weight;
s4, calculating an attenuation coefficient of the label weight according to the label use scene and the artificial adjustment coefficient;
and S5, generating label weight and dynamically updating according to the attenuation mode of the label weight, the label information weight, the label use weight coefficient and the service scene coefficient on the basis of the steps S1-S4.
2. The intelligent updating method for label weight according to claim 1, wherein the source data obtained in step S1 further includes a label basis weight C (t)i) (ii) a In step S5, tag information amount weight, tag use weight coefficient, tag basis weight, and service scene coefficient are integrated according to the attenuation mode of the tag weight, and tag weight is generated and dynamically updated.
3. The intelligent updating method for label weight according to claim 2, wherein the label coverage reference value of step S2 includes the following calculation methods: obtaining by calculating the average coverage rate of the current label set, or selecting a median value or a quartile value of the label coverage rate as a label coverage rate reference value according to a label distribution rule, or selecting the latest coverage rate of a reference label designated by a user as a label coverage rate reference value; and switching the calculation mode of the label coverage rate reference value among the calculation modes according to the manual adjustment condition of the label basic weight.
4. The intelligent updating method for label weight according to claim 3, wherein when the label coverage reference value is obtained by calculating the average value of the coverage of the current label set in step S2, the improved information amount is calculated by the following formula:
I(ti) = logP(standard)(P(ti))=
Figure 181011DEST_PATH_IMAGE001
Wherein, I (t)i) Is a label information amount weight, P (reference) is a label coverage reference value,
Figure 504676DEST_PATH_IMAGE002
is a label tiN is the number of tags in the current set of tags.
5. The intelligent updating method for label weight according to claim 1, wherein the formula for calculating the label use weight coefficient in step S3 is as follows:
UW(ti)=
Figure 612310DEST_PATH_IMAGE003
wherein, UW (t)i) A weighting factor is used for the tag,
Figure 23831DEST_PATH_IMAGE004
is a label tiNumber of certain actions, biAnd simultaneously setting the upper limit value of each behavior weight for the corresponding behavior weight.
6. The intelligent tag weight updating method according to claim 1, wherein the step S4 comprises the following steps:
s401, judging the label with weight attenuation according to business experience when the label is initially started, and configuring an attenuation starting date, an attenuation period, an attenuation coefficient and an attenuation mode;
s402, calculating an attenuation coefficient of the label weight according to an attenuation mode;
when the attenuation mode is a fixed multiple or an exponential, the obtained attenuation coefficient is as follows: AW (t)i) = ((current date-decay start date)/number of decay cycles) × decay coefficient, or AW (t)i)=
Figure 166099DEST_PATH_IMAGE005
When the attenuation mode is a fixed value, the attenuation coefficient is as follows: a (t)i) = ((current date-decay start date)/decay period) × decay coefficient;
wherein, AW (t)i) The attenuation coefficient when the attenuation mode is a fixed multiple or an exponential, A (t)i) The attenuation coefficient is a constant value of the attenuation mode.
7. The intelligent updating method for label weight according to claim 6, wherein the source data obtained in step S1 further includes a label basis weight C (t)i) (ii) a In step S5, tag weight H (t) is generatedi) And dynamically updated scenarios include:
when the label weight is not attenuated, it is calculated as follows: h (t)i)=( I(ti)*UW(ti)+ C(ti))* CSi(ti);
When the attenuation mode of the label weight is a fixed multiple or an exponential, the calculation is as follows: h (t)i)=( I(ti)*UW(ti)+ C(ti))* CSi(ti)* AW(ti) ;
When the attenuation mode of the tag weight is a fixed value, the following calculation is performed: h (t)i)=( I(ti)*UW(ti)+ C(ti))* CSi(ti)+A(ti);
Wherein, I (t)i) Weight of label information generated in step S2, UW (t)i) The weight coefficient, AW (t), is used for the automatically updated tag of step S3i)、A(ti) The attenuation coefficient for the label weight calculated in step S4.
8. Intelligent label weight updating system based on information entropy is characterized by comprising:
the source data acquisition module is used for acquiring source data used for label weight calculation and preprocessing the source data; the source data includes: label set A1Total number of service objects T corresponding to the label, and label coverage rate P (T)i) Label usage behavior times set a2 (t)i) Label behavior weight set A3 and service scene coefficient CSi(ti);
The label information quantity weight generation module is used for improving an information quantity calculation formula of a label, taking the overall distribution condition of the label coverage rate into consideration, introducing a label coverage rate reference value as the base number of the logarithm in the information quantity calculation formula, and taking the label coverage rate as a true number; taking the improved information quantity calculation formula as a label information quantity weight generation formula to generate a label information quantity weight;
the tag use weight coefficient updating module is used for automatically updating the tag use weight coefficient based on the tag use behavior times and the tag behavior weight;
the tag weight attenuation coefficient calculation module is used for calculating the attenuation coefficient of the tag weight according to the tag use scene and the manual adjustment coefficient;
and the label weight dynamic updating module is used for generating and dynamically updating the label weight according to the attenuation mode of the label weight and by integrating the label information weight, the label use weight coefficient and the service scene coefficient.
9. The intelligent tag weight update system of claim 8, wherein the obtained source data further comprises a tag basis weight C (t)i);
The label weight dynamic updating module generates and dynamically updates label weights according to the attenuation mode of the label weights and by integrating the label information weight weights, the label use weight coefficients, the label basic weights and the service scene coefficients;
in the tag information weight generating module, the introduced tag coverage rate reference value comprises the following calculation modes: obtaining by calculating the average coverage rate of the current label set, or selecting a median value or a quartile value of the label coverage rate as a label coverage rate reference value according to a label distribution rule, or selecting the latest coverage rate of a reference label designated by a user as a label coverage rate reference value; the label information weight generating module also switches the calculation mode of the label coverage rate reference value among the calculation modes according to the manual adjustment condition of the label basic weight.
10. Computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, carries out the steps of the intelligent update method of tag weights as claimed in any of the claims 1-7.
CN202210076732.3A 2022-01-24 2022-01-24 Intelligent label weight updating method and system based on information entropy and computer equipment Active CN114090854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210076732.3A CN114090854B (en) 2022-01-24 2022-01-24 Intelligent label weight updating method and system based on information entropy and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210076732.3A CN114090854B (en) 2022-01-24 2022-01-24 Intelligent label weight updating method and system based on information entropy and computer equipment

Publications (2)

Publication Number Publication Date
CN114090854A CN114090854A (en) 2022-02-25
CN114090854B true CN114090854B (en) 2022-04-19

Family

ID=80309280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210076732.3A Active CN114090854B (en) 2022-01-24 2022-01-24 Intelligent label weight updating method and system based on information entropy and computer equipment

Country Status (1)

Country Link
CN (1) CN114090854B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073720A (en) * 2011-01-10 2011-05-25 北京航空航天大学 FR method for optimizing personalized recommendation results
CN106649681A (en) * 2016-12-15 2017-05-10 北京金山安全软件有限公司 Data processing method, device and equipment
CN111754116A (en) * 2020-06-24 2020-10-09 国家电网有限公司大数据中心 Credit assessment method and device based on label portrait technology
CN112860672A (en) * 2021-01-20 2021-05-28 中国建设银行股份有限公司 Method and device for determining label weight
CN113656669A (en) * 2021-10-19 2021-11-16 北京芯盾时代科技有限公司 Label updating method and device
CN113946569A (en) * 2021-08-26 2022-01-18 武汉氪细胞网络技术有限公司 User portrait construction method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073720A (en) * 2011-01-10 2011-05-25 北京航空航天大学 FR method for optimizing personalized recommendation results
CN106649681A (en) * 2016-12-15 2017-05-10 北京金山安全软件有限公司 Data processing method, device and equipment
CN111754116A (en) * 2020-06-24 2020-10-09 国家电网有限公司大数据中心 Credit assessment method and device based on label portrait technology
CN112860672A (en) * 2021-01-20 2021-05-28 中国建设银行股份有限公司 Method and device for determining label weight
CN113946569A (en) * 2021-08-26 2022-01-18 武汉氪细胞网络技术有限公司 User portrait construction method
CN113656669A (en) * 2021-10-19 2021-11-16 北京芯盾时代科技有限公司 Label updating method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于客观信息熵的多因素权重分配方法";黄定轩;《系统工程理论方法应用》;20040416;321-324 *

Also Published As

Publication number Publication date
CN114090854A (en) 2022-02-25

Similar Documents

Publication Publication Date Title
CN107943583B (en) Application processing method and device, storage medium and electronic equipment
CN110287250A (en) User gradation quantization method and device
CN113099475A (en) Network quality detection method and device, electronic equipment and readable storage medium
CN108493946A (en) Electric energy control method, device and equipment based on user power utilization analysis
CN111222553B (en) Training data processing method and device of machine learning model and computer equipment
CN112308749B (en) Culture plan generation device, method, electronic device, and readable storage medium
CN114090854B (en) Intelligent label weight updating method and system based on information entropy and computer equipment
CN107728772B (en) Application processing method and device, storage medium and electronic equipment
CN108074108B (en) Method and terminal for displaying net recommendation value
CN112200375B (en) Prediction model generation method, prediction model generation device, and computer-readable medium
EP3764310A1 (en) Prediction task assistance device and prediction task assistance method
CN114064445A (en) Test method, device, equipment and computer readable storage medium
CN112001563A (en) Method and device for managing phone bill amount, electronic equipment and storage medium
Gupta et al. Estimating Internet users' demand characteristics
CN117077931A (en) Software development project progress and cost management and control method and system based on struggle value analysis and software function point splitting
CN110267717B (en) Method and device for automatically generating automatic scaling call rules according to different independent tenants in multi-tenant environment
Li et al. Forecasting tourism demand using econometric models
CN116010228A (en) Time estimation method and device for network security scanning
CN112669091B (en) Data processing method, device and storage medium
CN115617670A (en) Software test management method, storage medium and system
CN112767056B (en) Service data prediction method, device, computer equipment and storage medium
CN110705736A (en) Macroscopic economy prediction method and device, computer equipment and storage medium
CN111861055A (en) Resource scheduling method, device and platform
CN113010310B (en) Method, device and server for processing job data
CN117911074A (en) User satisfaction prediction method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant