CN114386502A - Method, apparatus and storage medium for cluster analysis of fast-applying users - Google Patents

Method, apparatus and storage medium for cluster analysis of fast-applying users Download PDF

Info

Publication number
CN114386502A
CN114386502A CN202210016525.9A CN202210016525A CN114386502A CN 114386502 A CN114386502 A CN 114386502A CN 202210016525 A CN202210016525 A CN 202210016525A CN 114386502 A CN114386502 A CN 114386502A
Authority
CN
China
Prior art keywords
clustering
cluster
initial
user
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210016525.9A
Other languages
Chinese (zh)
Inventor
胡志华
谢龙龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dianzhong Science And Technology Co ltd
Original Assignee
Beijing Dianzhong Science And Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dianzhong Science And Technology Co ltd filed Critical Beijing Dianzhong Science And Technology Co ltd
Priority to CN202210016525.9A priority Critical patent/CN114386502A/en
Publication of CN114386502A publication Critical patent/CN114386502A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to a method, apparatus, and storage medium for cluster analysis of users of fast applications, the method comprising: acquiring characteristic data of a user; carrying out normalization processing on the feature data to obtain a feature processing result; determining an initial clustering result related to the user according to the feature processing result; and determining a final clustering result based on the initial clustering result so as to perform clustering analysis on the fast-application users. By using the technical scheme disclosed by the invention, the clustering analysis result of the user can be obtained, so that corresponding services can be provided for different users, and the user experience is improved.

Description

Method, apparatus and storage medium for cluster analysis of fast-applying users
Technical Field
The present disclosure relates generally to the field of user clustering technology. More particularly, the present disclosure relates to a method, apparatus, and computer-readable storage medium for cluster analysis of users of fast applications.
Background
The quick application is a novel application ecology, and is favored by users because the quick application has the advantages of no need of downloading and installation, no need of storage, point-to-use and the like when in use. The user can more easily acquire the required service through the fast application. At present, more and more users read electronic books through fast application, so how to provide corresponding requirements for different users becomes a technical problem to be solved.
Disclosure of Invention
To at least partially solve the technical problems mentioned in the background, the solution of the present disclosure provides a solution for cluster analysis of users of fast applications. By using the scheme disclosed by the invention, the clustering result of the quick application user can be obtained, so that the targeted user demand is provided, and the user experience is improved. To this end, the present disclosure provides solutions in a number of aspects as follows.
In one aspect, the present disclosure provides a method for cluster analysis of users of fast applications, comprising: acquiring characteristic data of a user; carrying out normalization processing on the feature data to obtain a feature processing result; determining an initial clustering result related to the user according to the feature processing result; and determining a final clustering result based on the initial clustering result so as to perform clustering analysis on the fast-application users.
In one embodiment, the characteristic data includes at least a top-up amount of the user on the fast application and an active status, and the active status includes an active number of days, a daily average payment amount, a number of reading words, a number of reflow times, and a number of books.
In another embodiment, wherein normalizing the feature data to obtain a feature processing result comprises: and carrying out normalization processing on each feature data in the feature data by using a maximum and minimum normalization method to obtain a corresponding feature processing result.
In yet another embodiment, wherein the feature processing results comprise a plurality of vectors, and determining initial clustering results associated with the user based on the feature processing results comprises: calculating a contour coefficient of each vector of the feature processing result; determining the group number of initial clusters according to the contour coefficient of each vector; and determining an initial clustering result related to the user based on the group number of the initial clustering.
In yet another embodiment, wherein determining an initial clustering result associated with the user based on the number of groups of the initial clustering comprises: based on the group number of the initial cluster, randomly selecting an initial cluster center corresponding to the group number from a plurality of vectors of the feature processing result; and determining an initial clustering result related to the user according to the initial clustering center and a residual vector, wherein the residual vector comprises a vector except the initial clustering center.
In yet another embodiment, wherein determining an initial clustering result associated with the user based on the initial clustering center and the residual vector comprises: calculating the similarity between each residual vector and each initial clustering center; and assigning each residual vector to a corresponding initial clustering center with a similarity close thereto, and forming a corresponding cluster group to obtain an initial clustering result related to the user.
In yet another embodiment, wherein determining a final clustering result based on the initial clustering result comprises: calculating the mean value of the vectors in each cluster group in the initial clustering result and the mean square error corresponding to each cluster group; adjusting intermediate particles corresponding to each cluster group according to the mean value and the mean square error of each cluster group to obtain a final cluster center; and determining a final clustering result based on the final clustering center.
In yet another embodiment, the following adjustment and update operations are repeatedly performed to adjust the intermediate particles corresponding to each cluster group to obtain the final cluster center until the mean square error is smaller than or equal to the preset limit value: when the mean square error is larger than a preset limit value, adjusting corresponding intermediate particles of each clustering group as intermediate clustering points; forming a corresponding intermediate cluster group based on the corresponding intermediate cluster points for next determination of a mean value and a mean square error of vectors in the corresponding intermediate cluster group; and when the mean square error of the middle clustering group is smaller than or equal to a preset limit value, taking the middle particles when the adjustment is stopped as the final clustering center.
In another aspect, the present disclosure also provides an apparatus for cluster analysis of users of fast applications, comprising: a processor; and a memory coupled to the processor, the memory having stored therein computer program code which, when executed, causes the processor to perform the foregoing embodiments.
In yet another aspect, the present disclosure also provides a computer-readable storage medium having stored thereon computer-readable instructions for cluster analysis of users of fast applications, which, when executed by one or more processors, implement various embodiments as previously described.
According to the scheme, the characteristic data of the fast application users are subjected to normalization processing, and the characteristic processing results are clustered, so that the clustering results of the fast application users are obtained, and the user experience is improved according to the corresponding service provided for different users.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. In the drawings, several embodiments of the disclosure are illustrated by way of example and not by way of limitation, and like or corresponding reference numerals indicate like or corresponding parts and in which:
FIG. 1 is an exemplary flow diagram illustrating a method for cluster analysis of users of fast applications in accordance with an embodiment of the present disclosure;
fig. 2 is an exemplary diagram illustrating a feature processing result obtained after normalization processing is performed on feature data according to an embodiment of the present disclosure;
FIG. 3 is an exemplary diagram illustrating determining initial clustering results according to an embodiment of the present disclosure;
FIG. 4 is an exemplary flow chart illustrating determining a final clustering result according to an embodiment of the present disclosure;
FIG. 5 is an exemplary flow diagram illustrating adjusting intermediate particles to obtain a final cluster center according to an embodiment of the disclosure;
FIG. 6 is an exemplary diagram illustrating adjusting intermediate particles to obtain a final cluster center according to an embodiment of the disclosure; and
fig. 7 is a block diagram illustrating an apparatus for cluster analysis of users of fast applications according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings. It should be understood that the embodiments described in this specification are only some of the embodiments of the present disclosure provided to facilitate a clear understanding of the aspects and to comply with legal requirements, and not all embodiments of the present disclosure may be implemented. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed in the specification without making any creative effort, shall fall within the protection scope of the present disclosure.
FIG. 1 is an exemplary flow diagram illustrating a method 100 for cluster analysis of users of fast applications according to an embodiment of the present disclosure. As shown in fig. 1, at step S102, feature data of a user is acquired. In one embodiment, a data warehouse tool hive, for example, may be employed to obtain the aforementioned user's feature data from the fast application server. In some embodiments, the user's characteristic data may include, but is not limited to, a top-up amount and an active status of the user on the fast application. The aforementioned recharging amount may include, for example, a coin recharging amount or a membership recharging amount. The aforementioned activity status may include, for example, the number of active days, the daily pay amount, the number of read words, the number of reflow times, and the number of books.
Based on the acquired feature data of the user, at step S104, normalization processing is performed on the feature data to obtain a feature processing result. In one embodiment, each of the feature data may be normalized using, for example, a maximum-minimum normalization method to obtain a corresponding feature processing result. That is, each feature data is mapped to [0,1] by the maximum-minimum normalization method. For example, taking the characteristic data as the recharge amount, the maximum value max and the minimum value min of the recharge amount in a certain period (for example, one day) of the fast application of the user may be counted first. Further, the recharge amount may be normalized based on the following formula:
Figure BDA0003461173410000041
wherein x is*The characteristic processing result obtained by normalizing the recharging amount is shown, and x shows the current recharging amount of the user. In one implementation scenario, for example, the above formula (1) may obtain a feature processing result corresponding to each feature data (e.g., a recharge amount, an active number of days, a daily average payment amount, a number of reading words, a number of reflow times, and a number of books).
After the feature processing result is obtained, at step S106, an initial clustering result related to the user is determined according to the feature processing result. In one embodiment, the feature processing result may include a plurality of vectors, and the plurality of vectors are feature processing results obtained by normalizing feature data of a plurality of users. Specifically, the contour coefficient of each vector of the feature processing result may be first calculated, and then the number of groups of the initial clusters is determined from the contour coefficient of each vector. Further, an initial clustering result associated with the user is determined based on the number of groups of the initial clustering.
In one exemplary scenario, the contour coefficients for each vector may be calculated by the following formula:
Figure BDA0003461173410000051
wherein s (i) represents the contour coefficient of the vector i, a (i) represents the distance of the vector i to all vectors with similar features, b (i) represents the minimum of the average distances of the vector i to all points without similar features, and max { } represents taking the maximum value.
The number of groups of initial clusters is determined according to the contour coefficient of each vector, and an initial clustering result related to the user is determined. For example, an initial clustering center corresponding to the number of groups is arbitrarily selected from a plurality of vectors of the feature processing result based on the number of groups of the initial clustering, and then an initial clustering result related to the user is determined according to the initial clustering center and the remaining vectors. In one embodiment, the aforementioned residual vectors include vectors other than the initial cluster center. And calculating the similarity between each residual vector and each initial clustering center based on the initial clustering centers and the residual vectors. Further, each residual vector is assigned to a corresponding initial cluster center with which its similarity is close, and a corresponding cluster group is formed to obtain an initial cluster result related to the user. How to obtain the initial clustering result related to the user will be described in detail later in conjunction with fig. 3.
Finally, at step S108, a final clustering result is determined based on the initial clustering result to perform clustering analysis on the fast-applying users. For example, re-determining the cluster center point based on the initial clustering result to obtain the final clustering result. How to determine the final clustering result will be described in detail later in conjunction with fig. 4-6.
As can be seen from the above description, in the embodiment of the present disclosure, the feature data of the fast application user is normalized, and then the initial clustering result is determined according to the feature processing result. Further, the initial clustering result is further optimized to obtain a final clustering result, so that corresponding services are provided for different users according to the final clustering result, and user experience is improved.
Fig. 2 is an exemplary diagram illustrating a feature processing result obtained after normalization processing is performed on feature data according to an embodiment of the present disclosure. Taking the recharge amount and the number of active days of the user as an example, a plurality of points (or vectors) shown in fig. 2 represent the distribution of the feature processing results corresponding to the normalized recharge amount and the number of active days of the plurality of users. Wherein, the abscissa represents the corresponding characteristic processing result after the recharging amount is normalized, the ordinate represents the corresponding characteristic processing result after the activity day is normalized, and the values are all in the range of [0,1 ]. As described above, the number of groups of the initial clustering results can be determined according to the contour coefficient of each vector, and thus the initial clustering results can be determined.
In one exemplary scenario, it is assumed that the number of groups of the initial clustering results is determined to be five according to the contour coefficient of each vector, and thus the initial clustering center can be determined to be five. In this scenario, five points may be arbitrarily selected from the multiple vectors of the feature processing result as initial clustering centers, and the initial clustering result related to the user may be determined based on the initial clustering centers and the remaining vectors, as shown in fig. 3, for example.
Fig. 3 is an exemplary diagram illustrating determining an initial clustering result according to an embodiment of the present disclosure. Still taking the recharge amount and the number of active days of the user as an example, a plurality of points (or vectors) shown in fig. 3 represent the distribution of the feature processing results corresponding to the normalized recharge amount and the number of active days of the plurality of users. Wherein, the abscissa represents the corresponding characteristic processing result after the recharging amount is normalized, the ordinate represents the corresponding characteristic processing result after the activity day is normalized, and the values are all in the range of [0,1 ]. Assuming that the number of groups of the initial clustering results can be determined to be five according to the contour coefficient of each point, the initial clustering centers are correspondingly five. From the foregoing, five points may be arbitrarily selected from the plurality of vectors of the feature processing result as the initial cluster centers, for example, five initial cluster centers P1-P5 are exemplarily shown in the figure.
Initial clustering results related to the user are obtained based on the above five initial clustering centers P1-P5 and the remaining vector points (i.e., remaining vectors). Specifically, the respective cluster groups are formed by calculating the similarities (e.g., distances) between the remaining vector points and the aforementioned five initial cluster centers P1-P5, and assigning the remaining vector points to the corresponding initial cluster centers whose similarities are close. Such as vector points P1-P5 shown in fig. 3, the vector points P1-P5 may be assigned to the initial cluster centers P1-P5, assuming that the similarity of P1-P5 to the initial cluster centers P1-P5, respectively, is calculated to be closest. Similarly, all remaining vector point correspondences may be configured to the initial cluster centers P1-P5, and correspond to form five cluster groups with P1-P5 as the initial cluster centers. It can be understood that the five clustering groups are obtained initial clustering results, and then a final clustering result related to the user can be determined according to the initial clustering results. How to determine the final clustering result will be described in detail below with reference to fig. 4.
Fig. 4 is an exemplary flow chart illustrating determining a final clustering result according to an embodiment of the present disclosure. As shown in fig. 4, at step S402, a mean value of vectors in each cluster group and a mean square error corresponding to each cluster group in the initial clustering result are calculated. Taking the above fig. 3 as an example, the initial clustering result includes five cluster groups, and each cluster group includes a plurality of vector points with similarity to the initial cluster center. That is, the mean and the corresponding mean variance of the initial cluster center and the plurality of vector points close to the similarity thereof are calculated, respectively. After the mean and mean square difference of each cluster group are obtained, at step S404, the intermediate particles corresponding to each cluster group are adjusted according to the mean and mean square difference of each cluster group to obtain a final cluster center. That is, a new cluster center (i.e., center particle) is re-determined in each cluster group, and how to adjust the center particle will be described in detail later in conjunction with fig. 5. From the obtained final clustering center, at step S406, a final clustering result is determined based on the final clustering center. This step may refer to fig. 3 described above with respect to determining the content of the initial clustering result. Similarly to determining the initial clustering result, the final clustering result may be formed by calculating the similarity between the residual vector and the final clustering center, and correspondingly assigning the residual vector closest to the similarity between the final clustering centers to the final clustering centers.
FIG. 5 is an exemplary flow diagram illustrating adjusting intermediate particles to obtain a final cluster center according to an embodiment of the disclosure. As shown in fig. 5, at step S502, the corresponding intermediate particles of each cluster group are taken as intermediate cluster points. In one implementation scenario, the preset limit may be determined based on the corresponding traffic demand. The corresponding intermediate particles of each cluster group can be determined according to the mean value of the vectors in each cluster group. Next, at step S504, a corresponding intermediate cluster group is formed based on the corresponding intermediate cluster point for next determination of the mean and mean square error of the vectors in the corresponding intermediate cluster group. Specifically, according to the determined intermediate clustering points, corresponding intermediate clustering groups, i.e., new clustering results, may be formed. Further, according to the new clustering result, the mean value and the mean square error corresponding to the next new clustering group can be obtained, and further the middle clustering point of the next new clustering group (namely, the middle clustering group) is determined. After obtaining the mean value and the mean square difference corresponding to the next new clustering group based on the new clustering result, in step S506, it is determined whether the mean square difference of the middle clustering group is less than or equal to the preset limit. When the mean square error of the intermediate clustering group is less than or equal to the preset limit, the flow proceeds to step S508. At this step, the intermediate particles at which the adjustment was stopped are taken as the final cluster center. When the mean square error is larger than the preset limit value, repeatedly performing the steps S502-S506 to repeatedly adjust and update the intermediate particles of the cluster group until the intermediate particles when the mean square error of the intermediate cluster group is smaller than or equal to the preset limit value are used as the final cluster center.
FIG. 6 is an exemplary diagram illustrating adjusting intermediate particles to obtain a final cluster center according to an embodiment of the disclosure. As exemplarily shown in the left diagram of fig. 6, a cluster group includes a plurality of vector points and has an initial cluster center P. In an exemplary scenario, it is assumed that an intermediate particle a of the cluster group may be determined according to a mean value of a plurality of vector points in the cluster group, and then clustering is performed with the intermediate particle a as a clustering center to form a new clustering result (i.e., the above-mentioned intermediate clustering result). Further, assuming that the mean square error of the newly formed cluster group is greater than a preset limit, a new intermediate particle, such as the new intermediate particle B shown in the intermediate graph in the figure, may be newly determined according to the mean of the newly formed cluster group. And repeatedly executing the above operations until the mean square error of the new cluster group is smaller than or equal to the preset limit value, and taking the intermediate particles when the adjustment is stopped as the final cluster center, for example, a point C shown in the right diagram in the figure is the final cluster center. And obtaining a final clustering result according to the final clustering center so as to provide corresponding services for different users based on the final clustering result and improve user experience.
Fig. 7 is a block diagram illustrating an apparatus 700 for cluster analysis of users of fast applications according to an embodiment of the present disclosure. It is to be understood that the device implementing aspects of the present disclosure may be a single device (e.g., a computing device) or a multifunction device including various peripheral devices.
As shown in fig. 7, the apparatus of the present disclosure may include a central processing unit or central processing unit ("CPU") 711, which may be a general purpose CPU, a special purpose CPU, or other execution unit that processes and programs run. Further, device 700 may also include a mass storage 712 and a read only memory ("ROM") 713, wherein mass storage 712 may be configured to store various types of data including various user-specific characteristic data, algorithm data, intermediate results, and various programs needed to operate device 700. ROM 713 may be configured to store data and instructions required for power-on self-test of device 700, initialization of various functional blocks in the system, basic input/output drivers for the system, and booting of an operating system.
Optionally, device 700 may also include other hardware platforms or components, such as the illustrated tensor processing unit ("TPU") 714, graphics processing unit ("GPU") 715, field programmable gate array ("FPGA") 716, and machine learning unit ("MLU") 717. It is to be understood that although various hardware platforms or components are shown in the device 700, this is by way of illustration and not of limitation, and one skilled in the art may add or remove corresponding hardware as may be desired. For example, device 700 may include only a CPU, associated storage, and interface devices to implement the disclosed method for cluster analysis for users of fast applications.
In some embodiments, to facilitate the transfer and interaction of data with external networks, the device 700 of the present disclosure also includes a communication interface 718 such that it may connect to a local area network/wireless local area network ("LAN/WLAN") 705 via the communication interface 718, which in turn may connect to a local server 706 via the LAN/WLAN or to the Internet ("Internet") 707. Alternatively or additionally, the device 700 of the present disclosure may also be directly connected to the internet or a cellular network based on wireless communication technology through the communication interface 718, such as based on 3 rd generation ("3G"), 4 th generation ("4G"), or 5 th generation ("5G") wireless communication technology. In some application scenarios, the device 700 of the present disclosure may also access the server 708 and database 709 of the external network as needed to obtain various known image models, data and modules, and may store various data remotely, such as various types of data or instructions for rendering, for example, normalization processing, initial clustering, final clustering, and the like.
The peripheral devices of the apparatus 700 may include a display device 702, an input device 703 and a data transmission interface 704. In one embodiment, the display device 702 may, for example, include one or more speakers and/or one or more visual displays configured for voice prompting and/or image video display of the initial clustering results or the final clustering results of the present disclosure. The input device 703 may include other input buttons or controls, such as a keyboard, a mouse, a microphone, a gesture capture camera, etc., configured to receive input of feature data and/or user instructions by a user. The data transfer interface 704 may include, for example, a serial interface, a parallel interface, or a universal serial bus interface ("USB"), a small computer system interface ("SCSI"), serial ATA, FireWire ("FireWire"), PCI Express, and a high-definition multimedia interface ("HDMI"), etc., configured for data transfer and interaction with other devices or systems. In accordance with aspects of the present disclosure, the data transmission interface 704 may receive the profile data from the users in the hive table and transmit the profile data including the users or various other types of data or results to the device 700.
The aforementioned CPU 711, mass storage 712, ROM 713, TPU 714, GPU 715, FPGA 716, MLU 717, and communication interface 718 of the disclosed device 700 may be interconnected by a bus 719, and enable data interaction with peripheral devices through the bus. Through the bus 719, the CPU 711 may control other hardware components and their peripherals in the device 700, in one embodiment.
An apparatus for performing the present disclosure for clustering users of fast applications is described above in connection with fig. 7. It is to be understood that the device structures or architectures herein are merely exemplary, and that the implementations and implementation entities of the present disclosure are not limited thereto but may be modified without departing from the spirit of the present disclosure.
From the above description in conjunction with the accompanying drawings, those skilled in the art will also appreciate that embodiments of the present disclosure may also be implemented by software programs. The present disclosure thus also provides a computer program product. The computer program product may be used to implement the method for cluster analysis of users of fast applications described in this disclosure in conjunction with fig. l-6.
It should be noted that while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
It should be understood that when the claims of the present disclosure, and when the terms first, second, third, fourth, etc. are used in the specification and drawings, they are used only to distinguish one object from another, and not to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
Although the embodiments of the present disclosure are described above, the descriptions are only examples for facilitating understanding of the present disclosure, and are not intended to limit the scope and application scenarios of the present disclosure. It will be understood by those skilled in the art of the present disclosure that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure, and that the scope of the disclosure is to be limited only by the appended claims.

Claims (10)

1. A method for cluster analysis of users of fast applications, comprising:
acquiring characteristic data of a user;
carrying out normalization processing on the feature data to obtain a feature processing result;
determining an initial clustering result related to the user according to the feature processing result; and
and determining a final clustering result based on the initial clustering result so as to perform clustering analysis on the users who are applied quickly.
2. The method of claim 1, wherein the characteristic data includes at least a top-up amount of the user on a fast application and an active status, and the active status includes a number of active days, a daily average payment amount, a number of read words, a number of reflow times, and a number of books.
3. The method of claim 2, wherein normalizing the feature data to obtain a feature processing result comprises:
and processing each feature data in the feature data by using a maximum and minimum normalization method to obtain a corresponding feature processing result.
4. The method of claim 1, wherein the feature processing results comprise a plurality of vectors, and determining initial clustering results related to a user from the feature processing results comprises:
calculating a contour coefficient of each vector of the feature processing result;
determining the group number of initial clusters according to the contour coefficient of each vector; and
determining an initial clustering result related to the user based on the group number of the initial clustering.
5. The method of claim 4, wherein determining initial clustering results related to a user based on the number of groups of the initial clustering comprises:
based on the group number of the initial cluster, randomly selecting an initial cluster center corresponding to the group number from a plurality of vectors of the feature processing result; and
determining an initial clustering result related to a user according to the initial clustering center and a residual vector, wherein the residual vector comprises a vector except the initial clustering center.
6. The method of claim 5, wherein determining an initial clustering result related to a user from the initial clustering center and a residual vector comprises:
calculating the similarity between each residual vector and each initial clustering center; and
each residual vector is assigned to a corresponding initial cluster center with which its similarity is close, and a respective cluster group is formed to obtain an initial clustering result related to the user.
7. The method of claim 6, wherein determining a final clustering result based on the initial clustering result comprises:
calculating the mean value of the vectors in each cluster group in the initial clustering result and the mean square error corresponding to each cluster group;
adjusting intermediate particles corresponding to each cluster group according to the mean value and the mean square error of each cluster group to obtain a final cluster center; and
and determining a final clustering result based on the final clustering center.
8. The method of claim 7, wherein adjusting the intermediate particles corresponding to each cluster group to obtain a final cluster center repeatedly performs the following adjusting and updating operations until the mean square error is less than or equal to a preset limit:
when the mean square error is larger than a preset limit value, adjusting corresponding intermediate particles of each clustering group as intermediate clustering points;
forming a corresponding intermediate cluster group based on the corresponding intermediate cluster points for next determination of a mean value and a mean square error of vectors in the corresponding intermediate cluster group; and
and when the mean square error of the middle clustering group is smaller than or equal to a preset limit value, taking the middle particles when the adjustment is stopped as the final clustering center.
9. An apparatus for cluster analysis of users of fast applications, comprising:
a processor; and
a memory connected to the processor, the memory having stored therein computer program code which, when executed, causes the processor to perform the method of any of claims 1-8.
10. A computer-readable storage medium having stored thereon computer-readable instructions for cluster analysis of users of fast applications, which, when executed by one or more processors, implement the method of any one of claims 1-8.
CN202210016525.9A 2022-01-07 2022-01-07 Method, apparatus and storage medium for cluster analysis of fast-applying users Pending CN114386502A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210016525.9A CN114386502A (en) 2022-01-07 2022-01-07 Method, apparatus and storage medium for cluster analysis of fast-applying users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210016525.9A CN114386502A (en) 2022-01-07 2022-01-07 Method, apparatus and storage medium for cluster analysis of fast-applying users

Publications (1)

Publication Number Publication Date
CN114386502A true CN114386502A (en) 2022-04-22

Family

ID=81199671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210016525.9A Pending CN114386502A (en) 2022-01-07 2022-01-07 Method, apparatus and storage medium for cluster analysis of fast-applying users

Country Status (1)

Country Link
CN (1) CN114386502A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681973A (en) * 2018-05-14 2018-10-19 广州供电局有限公司 Sorting technique, device, computer equipment and the storage medium of power consumer
CN109376759A (en) * 2018-09-10 2019-02-22 平安科技(深圳)有限公司 User information classification method, device, computer equipment and storage medium
CN110705602A (en) * 2019-09-06 2020-01-17 平安科技(深圳)有限公司 Large-scale data clustering method and device and computer readable storage medium
CN110807487A (en) * 2019-10-31 2020-02-18 北京邮电大学 Method and device for identifying user based on domain name system flow record data
WO2021203854A1 (en) * 2020-04-09 2021-10-14 深圳壹账通智能科技有限公司 User classification method and apparatus, computer device and storage medium
CN113837780A (en) * 2020-06-23 2021-12-24 上海莉莉丝科技股份有限公司 Information delivery method, system, device and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681973A (en) * 2018-05-14 2018-10-19 广州供电局有限公司 Sorting technique, device, computer equipment and the storage medium of power consumer
CN109376759A (en) * 2018-09-10 2019-02-22 平安科技(深圳)有限公司 User information classification method, device, computer equipment and storage medium
CN110705602A (en) * 2019-09-06 2020-01-17 平安科技(深圳)有限公司 Large-scale data clustering method and device and computer readable storage medium
CN110807487A (en) * 2019-10-31 2020-02-18 北京邮电大学 Method and device for identifying user based on domain name system flow record data
WO2021203854A1 (en) * 2020-04-09 2021-10-14 深圳壹账通智能科技有限公司 User classification method and apparatus, computer device and storage medium
CN113837780A (en) * 2020-06-23 2021-12-24 上海莉莉丝科技股份有限公司 Information delivery method, system, device and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田诗宵;丁立新;郑金秋;: "基于密度峰值优化的K-means文本聚类算法", 计算机工程与设计, no. 04, 16 April 2017 (2017-04-16) *

Similar Documents

Publication Publication Date Title
CN111950723B (en) Neural network model training method, image processing method, device and terminal equipment
CN109766925B (en) Feature fusion method and device, electronic equipment and storage medium
CN109214543B (en) Data processing method and device
WO2022027917A1 (en) Image processing method, apparatus and system, and electronic device and readable storage medium
CN115147265B (en) Avatar generation method, apparatus, electronic device, and storage medium
CN112163637A (en) Image classification model training method and device based on unbalanced data
CN111860841A (en) Quantization model optimization method, device, terminal and storage medium
CN110689136A (en) Deep learning model obtaining method, device, equipment and storage medium
CN112348079A (en) Data dimension reduction processing method and device, computer equipment and storage medium
CN109508087A (en) Brain line signal recognition method and terminal device
CN114386502A (en) Method, apparatus and storage medium for cluster analysis of fast-applying users
CN112614110A (en) Method and device for evaluating image quality and terminal equipment
CN111736988A (en) Heterogeneous acceleration method, equipment and device and computer readable storage medium
CN115795355A (en) Classification model training method, device and equipment
CN115546554A (en) Sensitive image identification method, device, equipment and computer readable storage medium
CN115359171A (en) Virtual image processing method and device, electronic equipment and storage medium
CN113010759B (en) Cluster set processing method and device, computer readable medium and electronic equipment
CN112329890B (en) Image processing method and device, electronic device and storage medium
CN114065913A (en) Model quantization method and device and terminal equipment
CN113989121A (en) Normalization processing method and device, electronic equipment and storage medium
CN112766362A (en) Data processing method, device and equipment
CN114358102A (en) Data classification method, device, equipment and storage medium
CN113780324A (en) Data processing method and device, electronic equipment and storage medium
CN114882232A (en) Attention mechanism method and system based on feature discrete degree
US20230214646A1 (en) Method and system for searching deep neural network architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination