CN117609336A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117609336A
CN117609336A CN202311658325.4A CN202311658325A CN117609336A CN 117609336 A CN117609336 A CN 117609336A CN 202311658325 A CN202311658325 A CN 202311658325A CN 117609336 A CN117609336 A CN 117609336A
Authority
CN
China
Prior art keywords
sequence
frequent
data
sequences
transaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311658325.4A
Other languages
Chinese (zh)
Inventor
路建业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202311658325.4A priority Critical patent/CN117609336A/en
Publication of CN117609336A publication Critical patent/CN117609336A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/26Discovering frequent patterns

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Automation & Control Theory (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a time data sequence of different users according to the historical user data set; determining frequent sequences in time data sequences corresponding to different users according to a preset support threshold; and carrying out fuzzy merging on the determined frequent sequences based on preset conditions. According to the embodiment of the invention, the historical user data set can be processed according to the time sequence, so that the data feature density can be improved, the convenience of data utilization can be enhanced, the data mining and the data recommendation can be conveniently carried out based on the processed data, and the use experience of a user can be improved.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer application technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a storage medium.
Background
Along with the development of internet technology, the palm silver functions in the banking field are continuously enriched, and along with the increase of the number of palm silver users, the user behavior data are also greatly increased. In the current banking field, filtering analysis is advanced on massive user data, and then customized services, such as fine services or accurate information recommendation, are provided for users.
At present, common user behavior data of palm silver often comprise buried point data such as time when pages and advertisements are clicked by users and stay time, the data are stored in a distributed file system in a big data access mode based on time or space sequence, such as hadoop, the problems that training time is too long, prediction time is long and the like exist in the existing learning-based methods based on machine learning, deep learning and the like, recommendation cannot be timely performed for users, and the time consumed by the mining and matching process of the excessive candidate item sets generated by the association rule mining algorithm is long, so that the timeliness requirement of a production environment cannot be met well. At present, a high-efficiency data processing method is needed to improve the data feature density and facilitate the application of data in mining and recommending scenes.
Disclosure of Invention
The invention provides a data processing method, a device, electronic equipment and a storage medium, which aim to improve the data characteristic density by processing a historical user data set according to a time sequence, enhance the convenience of data utilization, facilitate data mining and data recommendation based on the processed data and improve the use experience of users.
According to an aspect of the present invention, there is provided a data processing method, wherein the method includes:
determining a time data sequence of different users according to the historical user data set;
determining frequent sequences in time data sequences corresponding to different users according to a preset support threshold;
and carrying out fuzzy merging on the determined frequent sequences based on preset conditions.
According to another aspect of the present invention, there is provided a data processing apparatus, wherein the apparatus comprises:
the sequence determining module is used for determining time data sequences of different users according to the historical user data set;
the frequent sequence module is used for determining frequent sequences in the time data sequences corresponding to the different users according to a preset support threshold;
and the fuzzy merging module is used for carrying out fuzzy merging on the determined frequent sequences based on preset conditions.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data processing method according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a data processing method according to any one of the embodiments of the present invention.
According to the technical scheme provided by the embodiment of the invention, the historical data is divided into the time data sequences according to different users, the frequent sequences are determined in the time data sequences of the different users based on the preset support threshold, and the frequent sequences are combined in a fuzzy manner according to the preset conditions, so that the problems of low data feature density and low data utilization rate are solved, the data mining and data recommendation based on the processed data are facilitated, and the use experience of the users can be improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data processing method according to a first embodiment of the present invention;
FIG. 2 is a flow chart of another data processing method according to a second embodiment of the present invention;
FIG. 3 is an exemplary diagram of a data processing method provided according to a third embodiment of the present invention;
FIG. 4 is an exemplary diagram of a frequent sequence generation process provided in accordance with embodiment III of the present invention;
FIG. 5 is a schematic diagram of a data processing apparatus according to a fourth embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device implementing a data processing method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a data processing method according to a first embodiment of the present invention, where the method may be performed by a data processing device, which may be implemented in hardware and/or software, and the data processing device may be configured in a server or a server cluster. As shown in fig. 1, the method includes:
step 110, determining time data sequences of different users according to the historical user data sets.
The historical user data set may be data generated in a business process of a user, for example, a palm silver scene is taken as an example, the historical user data set may include data used by different users for palm banks, the user data set may include user financial data and user non-financial data, the time data sequence may be data generated by user data of different users according to time arrangement, and each data in the time data sequence may be composed of historical matters of each user.
In the embodiment of the invention, the historical user data set can be read, the historical user data set can be divided into different subsets according to different users, and the transactions divided into the subsets can be arranged into a time data sequence according to time aiming at the subsets of each user.
And 120, determining frequent sequences in time data sequences corresponding to different users according to a preset support threshold.
Wherein; the preset support threshold may be a critical value for acquiring data constituting a frequent sequence, and the preset support threshold may be used for determining a frequent item in the time data sequence, where the frequent item may be a transaction item in which the number of occurrence of the transaction in the time data sequence is greater than or the preset support threshold.
In the embodiment of the invention, the occurrence times of each transaction item can be counted in the time data sequences of different users, the transaction with the occurrence times greater than or equal to the preset support threshold value in the time data sequences of each user can be extracted, and the occurrence times corresponding to each transaction and each transaction item can be processed into a frequent sequence. For example, the processing of the frequent sequence may include ordering the transactions by number of occurrences as the frequent sequence.
And 130, carrying out fuzzy merging on the determined frequent sequences based on preset conditions.
Wherein, the preset condition may be used for a preset rule for merging the frequent sequences, and the preset condition may include, but is not limited to, taking a child frequent sequence of the frequent sequence as a supersequence of the frequent sequence, merging the frequent sequence into a parent frequent sequence containing the frequent sequence, and the like.
In the embodiment of the invention, one or more preset conditions can be read, the determined frequent sequences can be subjected to fuzzy merging according to the preset conditions to reduce the number of the frequent sequences, the fuzzy merging can comprise taking the sub-frequent sequences of the frequent sequences as supersequences of the frequent sequences, merging the frequent sequences into parent frequent sequences containing the frequent sequences, and the like.
According to the embodiment of the invention, the historical data is divided into the time data sequences according to different users, the frequent sequences are determined in the time data sequences of the different users based on the preset support threshold, and the frequent sequences are combined in a fuzzy way according to the preset conditions, so that the problems of low data feature density and low data utilization rate are solved, the data mining and data recommendation based on the processed data are facilitated, and the use experience of the users can be improved.
Example two
Fig. 2 is a flowchart of another data processing method according to a second embodiment of the present invention, where the embodiment of the present invention is embodied on the basis of the above embodiment of the present invention, and referring to fig. 2, the method provided by the embodiment of the present invention specifically includes the following steps:
step 210, searching the corresponding transaction data set in the historical user data set according to the user identification of the user.
Wherein the user identification can be used for distinguishing information of different users, the user identification can be composed of one or more of numbers, letters and special symbols, the transaction data set can be a data set composed of one or more transactions, and each transaction data set can correspond to each user.
In the embodiment of the invention, the historical user data set can be extracted, the transaction corresponding to different user identifications can be searched in the historical user data set according to the user identifications, and the transaction is stored in the transaction data set corresponding to different users.
Step 220, transaction data sets corresponding to different users are respectively arranged into time data sequences according to time.
The time may be time information associated with each transaction in the transaction data set, and the time may be execution time or recording time of the transaction, etc.
In the embodiment of the invention, the transactions can be respectively arranged according to time in the transaction data sets of different users, and the arranged transaction data sets are used as time data sequences, and it can be understood that all the transactions in the time data sequences can be arranged according to time, and all the transactions are arranged according to the time increasing sequence in each time data sequence.
Step 230, determining a user service level coefficient of the user and a transaction level coefficient of the transaction in the time data sequence according to a preset configuration rule.
The user service level coefficient may be a coefficient affecting the transaction frequency that different users have, and the transaction level coefficient may be a coefficient affecting the transaction frequency that different transactions have, for example, the higher the user service level coefficient of one user is, the higher the effect of the occurrence of the transaction of the user on the support degree of the transaction is, and likewise, the higher the transaction level coefficient of one transaction is, the higher the effect of the occurrence abnormality of the transaction on the support degree of the transaction is, which may be understood as the occurrence number, and the support degree may be a judgment basis for judging whether the transaction is frequent.
In the embodiment of the invention, the preset configuration rules are read, the user service level coefficients and the transaction level coefficients which are respectively set for different users and different transactions in the preset configuration rules are extracted, and the preset configuration rules can exist in the form of configuration files.
And 240, traversing the transactions in each time data sequence, and updating the occurrence times once according to the user service level coefficient of the user to which the current transaction belongs and the transaction level coefficient of the transaction when the transactions occur each time.
In the embodiment of the invention, the transaction in each time data sequence can be traversed, the occurrence frequency of each transaction is counted, when each transaction occurs, the user service identification coefficient corresponding to the user to which the transaction belongs and the transaction level coefficient corresponding to the transaction can be determined, the occurrence frequency can be adjusted through the service level coefficient and the transaction level coefficient, for example, the occurrence frequency can be added with the product of 1, the service level coefficient and the transaction level coefficient to be used as the new occurrence frequency, and for example, the occurrence frequency can be added with the sum of the service level coefficient and the transaction level coefficient to be used as the new occurrence frequency.
Step 250, determining the support degree of the occurrence times corresponding to each transaction.
Specifically, the ratio of the counted occurrence times of each transaction to the total occurrence times of all the transactions may be used as the support degree of the transaction, and it may be understood that the support degree may be used to determine whether the corresponding transaction belongs to a frequent item.
And 260, processing the time data sequence into a frequent sequence according to the support degree and a preset support degree threshold value.
In the embodiment of the invention, frequent items can be screened out by using the support degree and the preset support degree threshold value for each time data sequence, and the frequent items can be used for forming a frequent sequence.
Step 270, determining a subsequence of each frequent sequence in each frequent sequence, and merging the subsequences into the corresponding frequent sequences.
Wherein the subsequences may be frequent sequences of partially consecutive transactions in each frequent sequence.
In the embodiment of the invention, whether a subsequence of which one is another can be found in the determined frequent sequences, and if so, the frequent sequence of which the subsequence is the subsequence can be merged into the other frequent sequence, namely, the frequent sequence of which the subsequence is the other frequent sequence is rejected from the plurality of frequent sequences.
Step 280, searching a parent frequent sequence containing the frequent sequence in each frequent sequence, and merging the frequent sequence into the parent frequent sequence.
In the embodiment of the invention, whether one frequent sequence is the parent frequent sequence of the other frequent sequence can be searched in the determined frequent sequences, and if the situation exists, the frequent sequences can be combined into the corresponding parent frequent sequences.
According to the embodiment of the invention, the historical user data sets are divided into the transaction data sets corresponding to different users according to the user marks, the transaction data sets are sequenced according to time and used as time data sequences in each user transaction data set, the user service level coefficient and the transaction level coefficient of the corresponding user can be determined through the preset configuration rule, the transaction in the time data sequences is traversed, the occurrence times are updated according to the user service level and the transaction level coefficient corresponding to the transaction when the transaction occurs each time, the counted occurrence times are determined to be supported, the time data sequences are processed based on the support and the preset support threshold value to generate frequent sequences, the sub-sequences of each frequent sequence are combined into the frequent sequences in the frequent sequences, the frequent sequences with the father frequent sequences are determined to be combined into the father frequent sequences corresponding to the father frequent sequences, the problems of low data feature density and low data utilization rate are solved, the data mining and the data recommendation based on the processed data are facilitated, and the use experience of the user can be improved.
Further, on the basis of the above embodiment of the present invention, processing the time data sequence into the frequent sequence according to the support degree and the preset support degree threshold includes:
searching for the head sequence of the first time data sequence and the tail sequence of the second time data sequence in each time data sequence, and merging the first time data sequence and the second time data sequence into a to-be-pruned time data sequence;
and eliminating sub-data sequences consisting of transactions with the support degree smaller than a preset support degree threshold value in the time data sequences to be pruned, and taking the time data sequences to be pruned after eliminating as frequent sequences.
In the embodiment of the invention, whether two time data sequences exist and the head sequence of the first data sequence is identical to the tail sequence of the second data sequence can be searched in the time data sequences, if so, the head sequence of the first data sequence can be removed and then combined with the second data sequence, or the tail sequence of the second data sequence can be combined with the first data sequence, the combined result of the first data sequence and the second data sequence can be used as a time data sequence to be pruned, the support degree of the transaction in the time sequence to be pruned can be compared with a preset support degree threshold value, the data sequence formed by the transaction with the support degree smaller than or equal to the preset support threshold value can be recorded as a sub data sequence, the sub data sequence can be removed from the corresponding time data sequence to be pruned, and the processed time data sequence to be pruned can be used as a frequent sequence.
In some inventive embodiments, further comprising: and counting the data indexes of the frequent sequences according to a preset time window.
In the embodiment of the invention, the frequent sequences can be counted through a preset time window, so that the corresponding data indexes are generated. The preset time window may be a preset time length or a preset data length for counting the frequent sequences.
Further, on the basis of the above embodiment of the present invention, the dimensions included in the preset time window include at least a time dimension and a data sequence length dimension.
In the embodiment of the invention, the time dimension may refer to the time of the transaction, and the data sequence length dimension may refer to the number of the transactions.
Example III
Fig. 3 is an exemplary diagram of a data processing method according to a third embodiment of the present invention, referring to fig. 3, taking processing of palm silver user history data as an example, firstly, a palm silver user history behavior sequence converts a transaction data set into a sequence data set according to a client ID, then scans the sequences, in the process of scanning, according to different weights given to different users and scene transactions when calculating support degrees, the transaction sequence with a support degree greater than or equal to a given minimum support degree is determined as a frequent sequence by an algorithm, after one scanning is finished, subsequent traversal is performed based on the previous traversal, finally fuzzy merging is performed on the scanned sequences, finally a matching window is set, and the optimal frequent sequence is obtained according to a matching mode rule.
(1) Converting transaction data sets into sequence data sets according to client IDs
The sequence mode considers the sequence of transactions, which is important in specific business, such as financial scenarios, for example, after the user browses the palm money fund module, the user will continue to browse the stock module. This process of mining frequently occurring ordered events or sequences is sequence pattern mining. It increases the time for transactions to occur compared to the transaction data set required by the association rules, and in addition, the sequence of transactions to occur is attributed to the palm silver client's ID. Example data are shown in the following table:
this example dataset is a common transactional dataset, which is now converted to a sequential dataset for ease of processing. Firstly, merging records with the same client ID, and sequencing according to the sequence of the occurrence time of the transaction to obtain a sequence data set, wherein the conversion result is shown in the following table:
(2) Optimization of support algorithms
The current support degree is usually calculated by counting the number of times of occurrence of the sequence and then calculating the duty ratio, but in the palm silver non-financial transaction scene in a big data access mode, the statistics is carried out according to the number of times, the redundancy of irrelevant operation data is easy to be caused, and the accuracy of subsequent recommendation is also reduced if the operation records of different types of users are accumulated according to the number of times only.
Based on the original support degree calculation method, the embodiment of the invention classifies different non-financial transaction behaviors and grades of different users, and assigns different weights to the different types of users and transaction behaviors according to business rules, thereby influencing the support degree of a sequence in a cyclic traversal way, filtering irrelevant behavior data, and increasing the accuracy of a mining algorithm to be more close to palm silver business.
(3) Scanning sequences, acquiring frequent transactions and frequent sequences
The method updates the matching pattern set in two cases: the first case is to update at regular intervals with a given duration, and the second case is to update at intervals with a sequence length.
When updating is needed, the algorithm selects historical data with specified length, and then a frequent sequence mining algorithm is used for mining new modes. The invention uses SPADE frequent sequence mining algorithm, the basic idea of the algorithm is to scan the history sequence for a plurality of times to determine the support degree (occurrence number) of each item, and at the end of scanning, the items with the minimum support degree or more are judged to be frequent transactions by the algorithm, and the items generate a unitary frequent sequence consisting of the items. The subsequent traversal generates a new candidate frequent set based on the frequent sequence set (seed set) generated by the previous traversal, and the sequence length is incremented by one. And calculating the support degree of the candidate sequence in the traversal process of the transaction, and determining the frequent sequence based on the support degree, and taking the frequent sequence as a seed set of the next traversal. When no candidate sequence is generated, the algorithm terminates. The sequence containing k items is called k-sequence, the purpose is to generate frequent k-sequences with frequent (k-1) sequences, the generation of candidate sequences comprises the following two steps: a. and (3) a connection stage. If the first item deleted by the sequence s1 is the same as the item accessed by the last item deleted by the sequence s2, s1 and s2 can be connected, i.e. the last item accessed by s2 is used to expand the sequence s1. This time division is divided into two cases, when the last item access (letter) of s2 is its last element (bracket), it is taken as the last element of s1, otherwise it is taken as the last item access of s1 last element. b. Pruning stage. If a certain subsequence of a candidate sequence is not a frequent sequence, it should be deleted from the candidate sequence. It is clear that sub-sequences of frequent sequences should also be frequent, and supersequences of non-frequent sequences should also be non-frequent.
The process of generating a candidate sequence of length 4 from a frequent sequence of length 3 through concatenation and pruning as shown in fig. 4 will be described below with an example. As shown in the table, frequent sequences { (a, b) (c) } and { (b) (c, d) } generate candidate sequences { (a, b) (c, d) }, { (a, b) (c) } and { (b) (c) (e) } generate candidate sequences { (a, b) (c) (e) }. It is apparent that { (a, b) (d) } cannot be linked to any sequence because there is no sequence like { (b) (d, e) }. The sequences { (a, b) (c) (e) } are pruned because the subsequences { (a) (c) (e) } having a length of 3 are not in frequent 3-sequences.
(4) Fuzzy merging
The number of frequently accessed sequences obtained by the algorithm is still large, namely the problem of excessive candidate sets, and the matching process takes a long time, so that further processing is needed. The algorithm combines the mined modes through blurring of frequent sequences, reduces the number of redundant modes and forms a final mode set. There are two cases in frequent sequences that need to be combined: a. the subsequences of the frequent sequences must also be frequent, i.e. they are output by the algorithm together, and should be combined into their supersequences. b. The pattern itself is frequent and in addition to this access it is also possible to access the complete sequence for some time in the future, in which case the whole sequence can be pre-fetched directly, since there is no sequential relationship between sequences, so it can be incorporated into a larger sequence containing all elements of the sequence. The first case is a special case of the second case, so the merging operation of frequent sequences can be summarized as: if the elements of the shorter sequence are a subset of the elements of the longer sequence, the shorter sequence is incorporated into the longer sequence.
(5) Sliding window arrangement
In order to meet the requirement of processing frequent sequences in an actual scene and obtaining behavior data such as daily frequency, monthly frequency and the like of a user within a certain period of time or under a certain length, the method introduces the idea of sliding windows, and provides two window modes for intercepting the sequences, wherein one window mode is based on time and the other window mode is based on length.
Based on the sliding window of the length, the length of the required sequence is set in advance, and the object data of the window starts to be accumulated every time a matching interval is passed, and when the length accords with the size of the matching window, the subsequence in the window is copied.
The sliding window based on the time sequence mainly uses the time stamp, and the method also derives the time stamp, extracts the characteristics of the time stamp, such as the time, the minute, the second, the quarter, the week, the day of the year, and the characteristics of whether the working day, the weekend, the holiday, the early morning and the like are represented by 0-1, so that the method is convenient for people to further count the frequent sequence.
The method provided by the embodiment of the invention can be deployed on Hadoop, processes data on a big data platform and has lower execution cost by adding the weight of related behaviors in the algorithm support counting process and filtering out irrelevant behaviors related to specific businesses, thereby improving the accuracy of user behavior analysis and the fitness of banking businesses.
Example IV
Fig. 5 is a schematic structural diagram of a data processing apparatus according to a fourth embodiment of the present invention. As shown in fig. 5, the apparatus includes:
the sequence determining module 301 is configured to determine a time data sequence of different users according to the historical user data set.
And the frequent sequence module 302 is configured to determine a frequent sequence in the time data sequences corresponding to the different users according to a preset support threshold.
And the fuzzy merging module 303 is configured to perform fuzzy merging on the determined frequent sequences based on a preset condition.
According to the embodiment of the invention, the historical data is divided into the time data sequences according to different users by the sequence determining module, the frequent sequence module determines the frequent sequence in the time data sequences of the different users based on the preset support threshold value, and the frequent sequence is combined in a fuzzy way according to the preset condition, so that the problems of low data feature density and low data utilization rate are solved, the data mining and the data recommendation based on the processed data are facilitated, and the use experience of the users can be improved.
In some inventive embodiments, the sequence determination module 301 comprises:
and the data searching unit is used for searching the corresponding transaction data set in the historical user data set according to the user identification of the user.
And the sequence arrangement unit is used for arranging the transaction data sets corresponding to different users into the time data sequences according to time respectively.
In other inventive embodiments, frequent sequence module 302 includes:
a coefficient determining unit for determining a user service level coefficient of the user and a transaction level coefficient of a transaction in the time data sequence according to a preset configuration rule
And the occurrence number counting unit is used for traversing the transactions in each time data sequence, and updating the occurrence number once according to the user service level coefficient of the current transaction attribution user and the transaction level coefficient of the transaction when the transactions occur each time.
And the support degree unit is used for determining the support degree of the occurrence times corresponding to each transaction.
And the frequent sequence unit is used for processing the time data sequence into the frequent sequence according to the support degree and the preset support degree threshold value.
In some inventive embodiments, frequent sequence units are specifically used to: searching that the head sequence of a first time data sequence is identical to the tail sequence of a second time data sequence in each time data sequence, and merging the first time data sequence and the second time data sequence into a to-be-pruned time data sequence;
and eliminating sub-data sequences consisting of transactions with the support degree smaller than the preset support degree threshold value in the time data sequences to be pruned aiming at each time data sequence to be pruned, and taking the time data sequences to be pruned after eliminating as the frequent sequences.
In some embodiments of the invention, the fuzzy merging module 303 is specifically configured to include at least one of: determining a subsequence of each frequent sequence in each frequent sequence, and merging the subsequences into the corresponding frequent sequence; searching a parent frequent sequence containing the frequent sequence in each frequent sequence, and merging the frequent sequence into the parent frequent sequence.
In some inventive embodiments, further comprising: and the statistics module is used for counting the data indexes of the frequent sequences according to a preset time window.
In some embodiments of the present invention, the dimensions included in the preset time window include at least a time dimension and a data sequence length dimension.
The data processing device provided by the embodiment of the invention can execute the data processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example five
Fig. 6 is a schematic structural diagram of an electronic device implementing a data processing method according to an embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 6, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as data processing methods.
In some embodiments, the data processing method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. One or more of the steps of the data processing method described above may be performed when the computer program is loaded into RAM 13 and executed by processor 11. Alternatively, in other embodiments, the processor 11 may be configured to perform the data processing method in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of data processing, comprising:
determining a time data sequence of different users according to the historical user data set;
determining frequent sequences in time data sequences corresponding to different users according to a preset support threshold;
and carrying out fuzzy merging on the determined frequent sequences based on preset conditions.
2. The method of claim 1, wherein determining the temporal data sequence of the different users from the historical user data set comprises:
searching a corresponding transaction data set in the historical user data set according to the user identification of the user;
and respectively arranging the transaction data sets corresponding to different users into the time data sequences according to time.
3. The method of claim 1, wherein determining the frequent sequence in the time data sequence corresponding to the different users according to a preset support threshold comprises:
determining a user service level coefficient of the user according to a preset configuration rule and a transaction level coefficient of a transaction in the time data sequence;
traversing the transaction in each time data sequence, and updating the once occurrence times according to the user service level coefficient of the current transaction attribution user and the transaction level coefficient of the transaction when the transaction occurs each time;
determining the degree of support of the occurrence times corresponding to each transaction;
and processing the time data sequence into the frequent sequence according to the support degree and the preset support degree threshold value.
4. A method according to claim 3, wherein said processing the time data sequence into the frequent sequence according to the support and the preset support threshold comprises:
searching that the head sequence of a first time data sequence is identical to the tail sequence of a second time data sequence in each time data sequence, and merging the first time data sequence and the second time data sequence into a to-be-pruned time data sequence;
and eliminating sub-data sequences consisting of transactions with the support degree smaller than the preset support degree threshold value in the time data sequences to be pruned aiming at each time data sequence to be pruned, and taking the time data sequences to be pruned after eliminating as the frequent sequences.
5. The method of claim 1, wherein the fuzzy merging of the determined frequent sequences based on the preset condition comprises at least one of:
determining a subsequence of each frequent sequence in each frequent sequence, and merging the subsequences into the corresponding frequent sequence;
searching a parent frequent sequence containing the frequent sequence in each frequent sequence, and merging the frequent sequence into the parent frequent sequence.
6. The method as recited in claim 1, further comprising:
and counting the data indexes of the frequent sequences according to a preset time window.
7. The method of claim 6, wherein the predetermined time window includes dimensions including at least a time dimension and a data sequence length dimension.
8. A data processing apparatus, comprising:
the sequence determining module is used for determining time data sequences of different users according to the historical user data set;
the frequent sequence module is used for determining frequent sequences in the time data sequences corresponding to the different users according to a preset support threshold;
and the fuzzy merging module is used for carrying out fuzzy merging on the determined frequent sequences based on preset conditions.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions for causing a processor to implement the data processing method of any one of claims 1-7 when executed.
CN202311658325.4A 2023-12-05 2023-12-05 Data processing method and device, electronic equipment and storage medium Pending CN117609336A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311658325.4A CN117609336A (en) 2023-12-05 2023-12-05 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311658325.4A CN117609336A (en) 2023-12-05 2023-12-05 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117609336A true CN117609336A (en) 2024-02-27

Family

ID=89951401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311658325.4A Pending CN117609336A (en) 2023-12-05 2023-12-05 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117609336A (en)

Similar Documents

Publication Publication Date Title
US10467255B2 (en) Methods and systems for analyzing reading logs and documents thereof
CN107609192A (en) The supplement searching method and device of a kind of search engine
CN112559747A (en) Event classification processing method and device, electronic equipment and storage medium
CN111861596A (en) Text classification method and device
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
CN112650910A (en) Method, device, equipment and storage medium for determining website update information
CN112818230A (en) Content recommendation method and device, electronic equipment and storage medium
CN113806660A (en) Data evaluation method, training method, device, electronic device and storage medium
CN112287208B (en) User portrait generation method, device, electronic equipment and storage medium
CN116955856A (en) Information display method, device, electronic equipment and storage medium
CN117609336A (en) Data processing method and device, electronic equipment and storage medium
CN114528378A (en) Text classification method and device, electronic equipment and storage medium
CN114358879A (en) Real-time price monitoring method and system based on big data
CN113010782A (en) Demand amount acquisition method and device, electronic equipment and computer readable medium
CN116737520B (en) Data braiding method, device and equipment for log data and storage medium
CN113971216B (en) Data processing method and device, electronic equipment and memory
CN117033801B (en) Service recommendation method, device, equipment and storage medium
CN115391160B (en) Abnormal change detection method, device, equipment and storage medium
CN115859944B (en) Big data-based computer data mining method
CN115391421A (en) Feature extraction method, device, equipment and storage medium
WO2024021630A1 (en) Method and apparatus for calculating indicator data
CN115660750A (en) Method and device for generating guide information, electronic equipment and storage medium
CN115795304A (en) Data processing model training method and system, electronic equipment and storage medium
CN111723201A (en) Method and device for clustering text data
CN114372815A (en) Screening method for potential customers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination