CN115391421A - Feature extraction method, device, equipment and storage medium - Google Patents
Feature extraction method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN115391421A CN115391421A CN202210996416.8A CN202210996416A CN115391421A CN 115391421 A CN115391421 A CN 115391421A CN 202210996416 A CN202210996416 A CN 202210996416A CN 115391421 A CN115391421 A CN 115391421A
- Authority
- CN
- China
- Prior art keywords
- behavior
- sequence
- user
- habit
- mining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 25
- 230000006399 behavior Effects 0.000 claims abstract description 323
- 238000005065 mining Methods 0.000 claims description 48
- 238000000034 method Methods 0.000 claims description 43
- 230000001364 causal effect Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 11
- 230000004927 fusion Effects 0.000 claims description 4
- 230000003542 behavioural effect Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 11
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 19
- 230000009471 action Effects 0.000 description 13
- 238000004891 communication Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 230000008859 change Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000014759 maintenance of location Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 229920001872 Spider silk Polymers 0.000 description 2
- 230000033228 biological regulation Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The disclosure provides a feature extraction method, a feature extraction device, feature extraction equipment and a storage medium, and relates to the technical field of computer data processing, in particular to the technical field of artificial intelligence. The specific implementation scheme is as follows: acquiring a behavior sequence of each user according to the user behavior data, wherein the behavior sequence comprises at least one behavior arranged according to a time sequence; and extracting relationship information and/or mode information among the behaviors according to the behavior sequence to obtain behavior habit characteristics of each user. The behavior habit features extracted in the way can represent the behavior habits of the users more accurately, and are beneficial to improving the expression capability of data, so that the precision and accuracy of subsequent model processing are improved.
Description
Technical Field
The disclosure relates to the technical field of computer data processing, in particular to the technical field of artificial intelligence.
Background
Before building a model, it is often necessary to pre-process the data to be input into the model to obtain clean, and accurate data. And then, model feature construction is carried out based on the preprocessed data, and data expression capacity is improved by combining with derivative variables generated by a service scene, so that the model effect is improved.
Disclosure of Invention
The disclosure provides a method, a device, equipment and a storage medium for feature extraction.
According to an aspect of the present disclosure, there is provided a method of feature extraction, including: acquiring a behavior sequence of each user according to the user behavior data, wherein the behavior sequence comprises at least one behavior arranged according to a time sequence; and extracting relationship information and/or mode information among the behaviors according to the behavior sequence to obtain behavior habit characteristics of each user.
According to another aspect of the present disclosure, there is provided an apparatus for feature extraction, including: the behavior sequence acquisition module is used for acquiring a behavior sequence of each user according to the user behavior data, and the behavior sequence comprises at least one behavior arranged according to a time sequence; and the behavior habit feature extraction module is used for extracting the relationship information and/or the mode information among the behaviors according to the behavior sequence to obtain the behavior habit features of each user.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described method of feature extraction.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above-described method of feature extraction.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the above-described method of feature extraction.
The present disclosure provides a method, apparatus, device, and storage medium for feature extraction. The method extracts the behavior habit features of each user by extracting the relationship information and/or the mode information among the behaviors from the behavior sequence of the user, so that the extracted behavior habit features can represent the behavior habits of the user more accurately, the data expression capability is improved, and the precision and the accuracy of subsequent model processing are improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of a method for implementing feature extraction according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a process of acquiring a behavior habit of a user according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a system architecture for extracting behavior habit features of a user according to a behavior sequence according to a third embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a system architecture for extracting behavior habit features of a user according to a behavior sequence according to a fourth embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a system architecture for extracting user behavior habit features according to a behavior sequence according to a fifth embodiment of the present disclosure;
FIG. 6 is a causal relationship inference graph of model characteristics and set results of a causal determination model according to a sixth embodiment of the disclosure;
FIG. 7 is a schematic diagram of an apparatus for feature extraction according to an embodiment of the present disclosure;
fig. 8 is a block diagram of an electronic device for implementing the method of feature extraction of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 shows a flowchart of a method for implementing feature extraction according to an embodiment of the present disclosure. As shown in fig. 1, the method includes:
operation S110, obtaining a behavior sequence of each user according to the user behavior data, where the behavior sequence includes at least one behavior arranged according to a time sequence;
the user behavior data may be from a log file recorded by the user operation, or from a communication record between a client and a server used by the user.
The behavior of the user includes an operation performed by the user, an object of the operation, a result of the operation, and the like. In the simplest case, the behavior of the user may include only the operation performed by the user, or only the object of the operation.
When the behavior sequence of each user is obtained, the user behavior data corresponding to each user may be obtained first, the behavior of each user and the time of the behavior are obtained from the user behavior data corresponding to each user, and then the behaviors of the users are sequenced according to the time of the behavior, so that the behavior sequence of each user can be formed.
For example, from the log file shown below:
user 1 [ 2022 year 1 month 1 day afternoon 5 o' clock 10 min ] [ browse ] [ commodity x1 ];
user 1 [ 2022 year 1 month 1 day afternoon 5 o' clock 15 [ browse ] [ commodity x2 ];
[ user 2 ] [ 1 month in 2022, 1 pm, 5 pm, 20 min ] [ browse ] [ commodity x3 ];
user 1 [ 2022 year 1 month 1 day 5 pm 25 min ] [ join shopping cart ] [ commodity x2 ];
user 2 [ 2022 ] 1 month 1 day afternoon 5 o' clock 20 point [ explore ] [ commodity x1 ];
[ user 1 ] [ 2022 year 1 month 1 day 5 p.m. 30 [ buy ] [ commodity x2 ];
user 2 [ 2022 ] 1 month 1 day afternoon 5 o' clock 35 [ buy ] commodity x1 ];
the following results were obtained:
a behavior sequence of the user 1 { "[ browse ] [ commodity x1 ]", "[ browse ] [ commodity x3 ]", "[ shopping cart added ] [ commodity x2 ]", "[ purchase ] [ commodity x2 ]" };
the sequence of behaviors of the user 2 { "[ browse ] [ commodity x3 ]", "[ browse ] [ commodity x1 ]", "[ purchase ] [ commodity x1 ]" }.
The order in which a person performs a series of actions often depends on the inherent relationships between the actions (e.g., relationships of implications and causality, etc.) or the behavioral habits of the person. The action sequence with the time sequence just can keep the internal relation between the actions or the potential information such as the individual action habit through the arrangement sequence between the actions.
Therefore, through the behavior sequence of the user, the internal connection between behaviors and the behavior pattern of the user can be further mined, and if a certain behavior pattern repeatedly appears, the behavior pattern can be presumed to become a behavior habit of the user.
For example, if there is always a behavior of "joining a shopping cart" between behaviors of "browse" and "purchase" in the behavior sequence of the user, it can be presumed that the user has a consumption habit of "adding a shopping cart after browsing and then purchasing a product"; if the user only has the ' browsing ' behavior before the ' purchasing ' behavior in the behavior sequence, and rarely has the ' joining shopping cart ' behavior, the user can be presumed to have the consumption habit of ' directly purchasing products after browsing.
Therefore, compared with discrete behavior data, the behavior sequence is easier to find spider-silk traces of behavior habits of the user, and is easier to extract behavior habit features of the user from the spider-silk traces.
Operation S120 is performed to extract relationship information and/or pattern information between behaviors according to the behavior sequence, so as to obtain behavior habit characteristics of each user.
The relation information between the extracted behaviors refers to information which can represent the internal relation between the user behaviors in the extracted behavior sequence.
And extracting pattern information among the behaviors, wherein the pattern information refers to characteristics related to time sequence, such as occurrence sequence, arrangement rule and the like of each behavior in the extracted behavior sequence.
As mentioned above, the behavior habits of the user are closely related to the occurrence order of the individual behaviors in the sequence, the arrangement pattern, and the internal relationship between the individual behaviors.
Thus, by following a sequence of actions:
extracting relationship information between the behaviors, or,
extract pattern information between behaviors, or
The relationship information between the behaviors and the mode information between the behaviors are extracted, the obtained behavior habit characteristics of the user are more representative and explanatory, and the expression capability of the behavior habit characteristics can be further improved.
In the embodiment of the present disclosure, the behavior sequence of each user is obtained through the operation 110; then, through operation 120, relationship information and/or timing information between behaviors is extracted from the behavior sequence of the user, and the behavior habit of the user is represented as a behavior habit feature of the user. The behavior habit features extracted in the way can represent the behavior habits of the users more accurately, and are beneficial to improving the expression capability of data, so that the precision and accuracy of subsequent model processing are improved.
Fig. 2 shows a process of acquiring a behavior habit of a user according to another embodiment of the present disclosure. In the embodiment of the present disclosure shown in fig. 2, the behavior data of the user includes session information, and each user behavior occurs in a certain session.
Because the relation between the behaviors in the same session is closer, when the behavior sequence of each user is obtained, the embodiment of the disclosure further refines the behavior sequence according to the session, that is, the behavior sequence of each user in each session is obtained according to the user behavior data.
Specifically, as shown in fig. 2, the user behavior data 201 of the embodiment of the present disclosure includes various behaviors of multiple users in multiple sessions, and when acquiring a behavior sequence of each user in each session according to the user behavior data, the user behavior data is first split into three-dimensional dimension data 202 of the user (e.g., user 1, user 2, user 3, and the like), the session (e.g., session 1, session 2, session 3, and the like), and the behavior (e.g., behavior 1, behavior 2, user 3, and the like); then, the behavior of each user in each session is determined according to the user and the session, and the behaviors are sequenced in time, so as to obtain a behavior sequence of each user in each session as shown in behavior sequence data 203.
The obtained action sequence has more compact relation among the actions in the sequence, and is easier to find representative and more valuable relation information and pattern information. The behavior habit features extracted based on the behavior sequence are more accurate.
In addition, the behavior sequence is further refined according to the conversation, so that the behavior sequence can be shortened, the calculation is simplified, and the calculation resource is saved.
Fig. 3 shows a system architecture adopted by another embodiment of the present disclosure to extract user behavior habit features according to a behavior sequence.
In the embodiment of the present disclosure shown in fig. 3, association rule mining is performed on the behavior sequence 301 through an association rule mining model 302 to obtain an association rule 303 between at least one behavior, and then the behavior association rule 303 is determined as a behavior habit feature of each user.
The association rule mining is a rule-based machine learning algorithm which can find interesting data relationships in big data and aims to distinguish strong rules existing in a database by using some measurement indexes.
Specifically, in the embodiment of the present disclosure, the association rule mining model 302 is constructed using Apriori algorithm, which includes: taking each behavior in the behavior sequence and the combination of the behaviors as terms, and finding a frequent item set from the behavior sequence according to a set first minimum support (minimum item support); thereafter, a strong association rule between the at least one behavior is determined based on the frequent item set and the set minimum confidence.
Assume that the behavior sequence of a certain user obtained from user data is as shown in table 1 below:
TABLE 1
Session ID | Sequence of behaviors |
10 | A,C,D |
20 | B,C,E |
30 | A,B,C,E |
40 | B,E |
The minimum support is counted as 2.
The first step, k =1, calculates the support count of an item set, and the result is shown in table 2;
where k represents the number of terms of the frequent term.
TABLE 2
Item(s) | Item support |
{A} | 2 |
{B} | 3 |
{C} | 3 |
{D} | 1 |
{E} | 3 |
Wherein item support refers to the number of times an item appears in the original dataset.
It can be seen that the support degree of the item set { D } is less than 2, and all items containing { D } can be disregarded by the prior principle, so that { D } can be pruned, and a frequent item set as shown in Table 3 can be obtained.
TABLE 3
Second, k =2, the support counts for the binomial set were calculated, and the results are shown in table 4.
TABLE 4
Item(s) | Item support |
{A,B} | 1 |
{A,C} | 2 |
{A,E} | 1 |
{B,C} | 2 |
{B,E} | 3 |
{C,E} | 2 |
After the item set with the support count less than 2 is pruned, the two-item frequent item set shown in table 5 is obtained.
TABLE 5
Item(s) | Item support |
{A,C} | 2 |
{B,C} | 2 |
{B,E} | 3 |
{C,E} | 2 |
And thirdly, calculating the support degree count of the three item sets to obtain the three frequent item sets shown in the table 6, wherein k = 3.
Wherein the connection of the three and more item sets follows the following rules:
{ A, C } is connected with { B, C }, tail items of the two item sets are drawn, and the rest items are different and are not connectable.
And { B, C } is connected with { B, E }, tail items of the two item sets are drawn, and the rest items are the same and can be connected into { B, C, E }.
Connecting one by one to obtain the product.
TABLE 6
The fourth step, k =4, calculates the support count for the four term set.
But the four item sets are empty, the algorithm flow is terminated, and the frequent item set of the third step is the final result.
The one frequent item set shown in table 3, the two frequent item set shown in table 5, and the three frequent item set shown in table 6 obtained in the above calculation process are all association rules of behavior sequences.
If one wants to further improve the efficiency of the algorithm, one can also discover the frequent set of items by restricting candidate generation.
And limiting candidate generation, namely setting a certain condition, and selecting the item meeting the condition from the full-permutation combination of the frequent items as a candidate item for generating the frequent items. For example, when a purchasing habit is interested, only an action that an operation is browsing, joining a shopping cart, or purchasing is taken as a candidate.
And fifthly, determining a strong association rule between at least one behavior according to the set minimum confidence.
The confidence degree is the credibility and reliability degree of a conclusion obtained according to a certain condition. For example, the condition of "action B occurs" can be derived as the confidence level of the conclusion that "action E will also occur", i.e., the association rule { B, E }.
The confidence of the association rule may be calculated by a statistically derived probability, for example, by a probability t function (the greater the probability, the greater the confidence).
If the minimum confidence is set to 60%, the association rule with the confidence greater than or equal to 60% is the strong association rule.
It should be noted that the above examples are only illustrative and do not limit the implementation manner of the embodiments of the present disclosure.
The implementer may also use any other applicable association rule mining algorithm such as FP-Growth algorithm to perform association rule mining to obtain an association rule between at least one behavior.
According to the embodiment of the disclosure, association rules are mined from the behavior sequence, and the concurrency relation among user behaviors can be found, namely which behaviors always appear concomitantly. The behavior that always accompanies is likely to come from a certain behavior habit of the user. Therefore, by mining association rules from the behavior sequence and using the association rules as the behavior habit features of the user, the behavior habit can be characterized, wherein the behavior habit always shows that a plurality of behaviors accompany the behavior.
Fig. 4 shows a system architecture adopted by another embodiment of the present disclosure to extract user behavior habit features according to a behavior sequence.
In the embodiment of the present disclosure shown in fig. 4, a sequence pattern mining model 402 is used to mine a sequence pattern of a behavior sequence 401 to obtain a sequence pattern 403 of the behavior sequence, and then an association rule 403 is determined as a behavior habit feature of each user.
Sequence pattern mining refers primarily to mining patterns that occur more frequently in relative time or compared to other patterns.
Specifically, in the disclosed embodiment, the Pattern mining model 402 is constructed using a general sequence Pattern mining (GSP) based algorithm, including: determining a frequent subsequence with the support degree larger than or equal to a second minimum support degree from the behavior sequence according to the set second minimum support degree (minimum sequence support degree); the frequent sub-sequences determine the sequence pattern of the behavior sequence.
Wherein, the subsequence refers to the subsequence of the behavior sequence, and the subsequence keeps the original sequence and precedence relation in the behavior sequence. For example, if the row sequence is { A, B, C }, then the row subsequence is: { A }, { B }, { C }, { A, B }, { B, C }, { A, C }, and { A, B, C }.
Given a sequence S, a sequence dataset DT, the Support (Support) of a sequence S refers to the percentage of tuples S contains S that occur in DT relative to the entire dataset tuple.
The process of finding frequent subsequences is similar to the process of finding frequent items described above and will not be described herein.
GSP introduces time constraint, sliding time window and classification level technology, increases scanning constraint conditions, effectively reduces the number of candidate sequences needing scanning, is practical, and can reduce the generation of redundant useless modes. In addition, GSP uses hash tree to store candidate sequence, reduces the number of sequence to be scanned, and converts the representation method of data sequence, so it can find out whether a candidate is subsequence of data sequence.
It should be noted that the above examples are only illustrative and do not limit the implementation manner of the embodiments of the present disclosure.
The implementer may also adopt any other applicable sequence pattern mining algorithm such as the PrefixSpan algorithm to perform sequence pattern mining to obtain the learning sequence pattern of the behavior sequence.
According to the embodiment of the disclosure, the sequence mode is mined from the behavior sequence, and the precedence relationship among the user behaviors can be found, namely, which behaviors always occur in succession. But the behavior that always occurs one after the other is likely to come from a certain behavior habit of the user. Therefore, by mining the sequence patterns from the behavior sequences and determining the sequence patterns as the behavior habit features of the user, it can be used to characterize the behavior habits which always appear as multiple behaviors occurring in succession.
Fig. 5 shows a system architecture adopted by another embodiment of the present disclosure to extract the behavior habit features of the user according to the behavior sequence.
In the embodiment of the present disclosure shown in fig. 5, a behavior sequence 501 is subjected to sequence pattern mining through a behavior habit model 502 to obtain a behavior habit feature 503. Wherein, the behavior habit model 502:
mining association rules by using an association rule mining algorithm 5021 (for example, an Apriori algorithm, an FP-Growth algorithm and other algorithms) according to the behavior sequence 501 to obtain association rules among at least one behavior;
a sequence pattern mining algorithm 5022 is adopted to perform a mining algorithm 5022 of a sequence pattern according to a behavior sequence (for example, algorithm such as GSP algorithm, prefix span algorithm and the like) to obtain a sequence pattern of the behavior sequence;
and fusing the association rule and the sequence mode to obtain the behavior habit characteristics of each user.
Wherein, the fusion mode can be as follows: the concatenation, weighted sum or fusion of association rules and sequence patterns by any applicable function.
The sequence mode is to find out the sequence among data in the sequence data set; the association rule is to find out the concurrency relation among data in the data set; the association rule mining does not concern the sequence among the transactions, and the sequence pattern mining needs to consider the sequence among the sequences. It follows that association rules and sequence patterns may complement each other.
In the embodiment of the present disclosure, the behavior habit features extracted in the above manner include the association rule between each behavior in the behavior sequence and the sequence pattern of the behavior sequence, so that information related to the behavior habit of the user can be more comprehensively obtained, and thus the behavior habit features are richer and more comprehensive.
On the basis of obtaining the user habit characteristics, the behavior habit characteristics of the user can be further used in scenes of personalized service, evaluation or application related to the behavior habit of the user, and the use experience of the user is greatly improved.
For example, when application models such as a personalized recommendation model and an analysis model of the influence of behavior habits on a certain time are constructed, the behavior habit characteristics of a user are also used as one of the model characteristics; when the model is actually applied, the behavior habit characteristics of the user are extracted from the user behavior data by the characteristic extraction method; and then, inputting the behavior habit characteristics into the application model to obtain an output result. And analyzing the relationship between the behavior habit characteristics output results so as to obtain the influence degree of the behavior habit characteristics of the user on the model result.
Fig. 6 shows a causal relationship inference graph applying the user habit features extracted by the above method to the model features of the user retention causal determination model and the setting result according to another embodiment of the present disclosure. In this embodiment, the causal determination model attempts to determine the causal relationship between the above features and whether the user retains the setting result when a function of the product is changed, that is, which features will cause the user to continue to use the product (retention).
The left side of fig. 6 shows a causal relationship inference graph which adopts static attributes of "whether the user is a search user" and "whether the user is a novel user" as model features. The causal results obtained using this causal inference approach do not explain why "whether it is a novel user" or "whether it is a search user" affects user retention. For example, when the product module of the novel is changed in operation, the product module is also a "novel user", a user who only registers as a novel user but does not use the novel frequently is hardly affected, and a user who uses the novel frequently may be affected. Furthermore, for a user who frequently uses novels, the operation change is a bonus item for the user whose operation change is exactly in line with the user's use habit; and for the user whose operation change happens to be just contrary to the user's usage habit, the operation change is a division item.
For this reason, in the embodiment of the present disclosure, the above-described static feature is replaced with the user behavior habit feature such as "whether to use search" and "whether to use novel" as a model feature, such as a causal relationship inference graph shown on the right side of fig. 6. Therefore, the method can better explain why some users remain (the operation change is consistent with the behavior habit of the users) and some users lose (the operation change is contrary to the behavior habit of the users) when the product module is changed in operation.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
According to an embodiment of the present disclosure, the present disclosure further provides a feature extraction apparatus, as shown in fig. 7, where the apparatus 70 includes: a behavior sequence obtaining module 701, configured to obtain a behavior sequence of each user according to user behavior data, where the behavior sequence includes at least one behavior arranged according to a time sequence; a behavior habit feature determining module 702, configured to extract relationship information and/or pattern information between behaviors according to the behavior sequence, so as to obtain a behavior habit feature of each user.
According to an embodiment of the present disclosure, the user behavior data includes session information, and accordingly, the behavior sequence obtaining module 701 is specifically configured to obtain a behavior sequence of each user in each session according to the user behavior data.
According to an embodiment of the present disclosure, the behavior habit feature determining module 702 includes: the association rule mining submodule is used for mining the association rule of the behavior sequence to obtain an association rule among at least one behavior; and the behavior habit characteristic determining submodule is used for determining the association rule as the behavior habit characteristic of each user.
According to an embodiment of the present disclosure, the association rule mining sub-module includes: a frequent item set discovery unit, configured to discover a frequent item set from the behavior sequence according to a set first minimum support degree by using each behavior in the behavior sequence and a combination between the behaviors as an item; and the strong association rule discovery unit is used for determining a strong association rule between at least one behavior according to the frequent item set and the set minimum confidence coefficient.
According to an embodiment of the present disclosure, the behavior habit feature determining module 702 includes: the sequence pattern mining submodule is used for mining the sequence pattern of the behavior sequence to obtain the sequence pattern of the behavior sequence; and the behavior habit characteristic determining submodule is used for determining the sequence mode as the behavior habit characteristic of each user.
According to an embodiment of the present disclosure, the sequence pattern mining submodule includes: a frequent subsequence finding unit, configured to determine, according to a second minimum support degree that is set, a frequent subsequence with a support degree greater than or equal to the second minimum support degree from the behavior sequence; and the sequence mode determining unit is used for determining the sequence mode of the behavior sequence from the frequent subsequence.
According to an embodiment of the present disclosure, the behavior habit feature determining module 702 includes: the association rule mining sub-module is used for mining the association rule according to the behavior sequence to obtain the association rule among at least one behavior; the sequence pattern mining submodule is used for mining the sequence pattern according to the behavior sequence to obtain the sequence pattern of the behavior sequence; and the behavior habit feature fusion submodule is used for fusing the association rule and the sequence mode to obtain the behavior habit features of each user.
The apparatus 70, according to an embodiment of the present disclosure, further includes: and the behavior habit feature application module is used for inputting the behavior habit features into the application model to obtain an output result, and the model features of the application model comprise the behavior habit features.
According to an embodiment of the present disclosure, the behavior habit feature application module is specifically configured to input the behavior habit features into a causal relationship inference model, so as to obtain a causal relationship between the behavior habit features and a specified result.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806 such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.
Claims (20)
1. A method of feature extraction, comprising:
acquiring a behavior sequence of each user according to the user behavior data, wherein the behavior sequence comprises at least one behavior arranged according to a time sequence;
and extracting relationship information and/or mode information among the behaviors according to the behavior sequence to obtain behavior habit characteristics of each user.
2. The method of claim 1, wherein the user behavior data includes session information,
correspondingly, the obtaining the behavior sequence of each user according to the user behavior data includes:
and acquiring the behavior sequence of each user in each session according to the user behavior data.
3. The method of claim 1, wherein extracting relationship information between behaviors according to the behavior sequence to obtain behavior habit features of each user comprises:
performing association rule mining on the behavior sequence to obtain an association rule among the at least one behavior;
and determining the association rule as the behavior habit characteristics of each user.
4. The method of claim 1, wherein performing association rule mining on the behavior sequence to obtain an association rule between the at least one behavior comprises:
taking each behavior in the behavior sequence and the combination between the behaviors as items, and finding a frequent item set from the behavior sequence according to a set first minimum support degree;
and determining a strong association rule between the at least one behavior according to the frequent item set and the set minimum confidence.
5. The method of claim 1, wherein the extracting pattern information between behaviors according to the behavior sequence to obtain behavior habit features of each user comprises:
mining a sequence mode of the behavior sequence to obtain the sequence mode of the behavior sequence;
the sequence pattern is determined as a behavioral habit feature for each user.
6. The method of claim 5, wherein the mining of the sequence pattern of the behavior sequence to obtain the sequence pattern of the behavior sequence comprises:
according to a set second minimum support degree, determining a frequent subsequence with the support degree larger than or equal to the second minimum support degree from the behavior sequence;
and determining the sequence mode of the behavior sequence by the frequent subsequence.
7. The method of claim 1, wherein extracting relationship information and pattern information between behaviors to obtain behavior habit features of each user comprises:
mining association rules according to the behavior sequence to obtain association rules among the at least one behavior;
according to the behavior sequence, mining a sequence mode to obtain the sequence mode of the behavior sequence;
and fusing the association rule and the sequence mode to obtain the behavior habit characteristics of each user.
8. The method of claim 1, further comprising:
and inputting the behavior habit characteristics into an application model to obtain an output result, wherein the model characteristics of the application model comprise the behavior habit characteristics.
9. The method of claim 1, wherein the inputting the behavioral habit features into an application model, resulting in an output result, comprises:
and inputting the behavior habit characteristics into a causal relationship inference model to obtain a causal relationship between the behavior habit characteristics and an appointed result.
10. An apparatus for feature extraction, comprising:
the behavior sequence acquisition module is used for acquiring a behavior sequence of each user according to the user behavior data, wherein the behavior sequence comprises at least one behavior arranged according to a time sequence;
and the behavior habit characteristic determining module is used for extracting the relationship information and/or the mode information among the behaviors according to the behavior sequence to obtain the behavior habit characteristics of each user.
11. The apparatus according to claim 10, wherein the user behavior data includes session information, and accordingly, the behavior sequence acquiring module is specifically configured to acquire the behavior sequence of each user in each session according to the user behavior data.
12. The apparatus of claim 10, wherein the behavior habit feature determination module comprises:
the association rule mining sub-module is used for mining the association rule of the behavior sequence to obtain an association rule among the at least one behavior;
and the behavior habit characteristic determining submodule is used for determining the association rule as the behavior habit characteristic of each user.
13. The apparatus of claim 12, wherein the association rule mining sub-module comprises:
a frequent item set discovery unit, configured to discover a frequent item set from the behavior sequence according to a set first minimum support degree by using each behavior in the behavior sequence and a combination between behaviors as an item;
and the strong association rule discovery unit is used for determining a strong association rule between the at least one behavior according to the frequent item set and the set minimum confidence coefficient.
14. The apparatus of claim 10, wherein the behavior habit feature determination module comprises:
the sequence pattern mining submodule is used for mining the sequence pattern of the behavior sequence to obtain the sequence pattern of the behavior sequence;
and the behavior habit characteristic determining submodule is used for determining the sequence mode as the behavior habit characteristic of each user.
15. The apparatus of claim 14, wherein the sequence pattern mining submodule comprises:
a frequent subsequence finding unit, configured to determine, according to a set second minimum support degree, a frequent subsequence with a support degree greater than or equal to the second minimum support degree from the behavior sequence;
and the sequence mode determining unit is used for determining the sequence mode of the behavior sequence from the frequent subsequence.
16. The apparatus of claim 10, wherein the behavior habit feature determination module comprises:
the association rule mining submodule is used for mining association rules according to the behavior sequence to obtain the association rules among the at least one behavior;
the sequence pattern mining submodule is used for mining a sequence pattern according to the behavior sequence to obtain the sequence pattern of the behavior sequence;
and the behavior habit feature fusion submodule is used for fusing the association rule and the sequence mode to obtain the behavior habit features of each user.
17. The apparatus of claim 10, further comprising:
and the behavior habit feature application module is used for inputting the behavior habit features into an application model to obtain an output result, wherein the model features of the application model comprise the behavior habit features.
18. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
19. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.
20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210996416.8A CN115391421A (en) | 2022-08-18 | 2022-08-18 | Feature extraction method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210996416.8A CN115391421A (en) | 2022-08-18 | 2022-08-18 | Feature extraction method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115391421A true CN115391421A (en) | 2022-11-25 |
Family
ID=84121325
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210996416.8A Pending CN115391421A (en) | 2022-08-18 | 2022-08-18 | Feature extraction method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115391421A (en) |
-
2022
- 2022-08-18 CN CN202210996416.8A patent/CN115391421A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113590645B (en) | Searching method, searching device, electronic equipment and storage medium | |
CN113190702A (en) | Method and apparatus for generating information | |
CN114461644A (en) | Data acquisition method and device, electronic equipment and storage medium | |
CN113570437A (en) | Product recommendation method and device | |
CN114579104A (en) | Data analysis scene generation method, device, equipment and storage medium | |
CN112818230A (en) | Content recommendation method and device, electronic equipment and storage medium | |
CN112925978A (en) | Recommendation system evaluation method and device, electronic equipment and storage medium | |
CN113806660A (en) | Data evaluation method, training method, device, electronic device and storage medium | |
CN112818013A (en) | Time sequence database query optimization method, device, equipment and storage medium | |
CN116467461A (en) | Data processing method, device, equipment and medium applied to power distribution network | |
CN112989190A (en) | Commodity mounting method and device, electronic equipment and storage medium | |
CN114741433B (en) | Community mining method, device, equipment and storage medium | |
CN113656689B (en) | Model generation method and network information pushing method | |
CN114491232B (en) | Information query method and device, electronic equipment and storage medium | |
CN115563310A (en) | Method, device, equipment and medium for determining key service node | |
CN112887426B (en) | Information stream pushing method and device, electronic equipment and storage medium | |
CN115391421A (en) | Feature extraction method, device, equipment and storage medium | |
CN113722593A (en) | Event data processing method and device, electronic equipment and medium | |
CN114048376A (en) | Advertisement service information mining method and device, electronic equipment and storage medium | |
CN113961797A (en) | Resource recommendation method and device, electronic equipment and readable storage medium | |
CN114254650A (en) | Information processing method, device, equipment and medium | |
CN112860626A (en) | Document sorting method and device and electronic equipment | |
CN113222632A (en) | Object mining method and device | |
CN113656393B (en) | Data processing method, device, electronic equipment and storage medium | |
CN114036263A (en) | Website identification method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |