CN112231196B - APP embedded point behavior classification method and device, computer equipment and storage medium - Google Patents
APP embedded point behavior classification method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN112231196B CN112231196B CN202011462475.4A CN202011462475A CN112231196B CN 112231196 B CN112231196 B CN 112231196B CN 202011462475 A CN202011462475 A CN 202011462475A CN 112231196 B CN112231196 B CN 112231196B
- Authority
- CN
- China
- Prior art keywords
- character string
- buried point
- point
- buried
- embedded
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3438—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of artificial intelligence, and provides an APP embedded point behavior classification method, device, computer equipment and storage medium, which comprises the following steps: determining a first buried point character string corresponding to each first buried point row in a first buried point data sequence of a plurality of users; generating a mask picture according to the plurality of first buried point character strings, and generating a first buried point character string picture according to each first buried point character string and the mask picture; segmenting each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures; calculating a first information entropy of each first buried point character string sub-picture and calculating a first similarity between any two first buried point character string pictures according to the first information entropy; and classifying the plurality of first embedded point behaviors according to the plurality of first similarity degrees to obtain a plurality of embedded point behavior categories. According to the method, a buried point analysis table does not need to be prepared, automatic classification of buried point behaviors is achieved by classifying all buried point character strings, and classification efficiency of the buried point behaviors is high.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an APP buried point behavior classification method and device, computer equipment and a storage medium.
Background
The APP end has a plurality of operation behaviors, and the operation behaviors of the APP can be used for predicting other behaviors of the user. Conventionally, operating behaviors of a user within a period of time are obtained by setting a buried point, semantic analysis is performed on the operating behaviors, so that behavior characteristics are extracted, a model is trained, and other behaviors of the user are predicted through the model.
The inventor finds that in the prior art, a buried point analysis table is needed for semantic analysis of operation behaviors, and the semantic analysis cannot be performed on operation behaviors which do not exist in the buried point analysis table, so that behavior features cannot be extracted; secondly, the APP page becomes faster, the generalization capability of the model obtained by using the extracted behavior features for training is poor, and the model cannot keep up with the change speed of the APP, so that the subsequent behavior prediction accuracy is poor.
Disclosure of Invention
In view of the above, there is a need for an APP embedded point behavior classification method, apparatus, computer device, and storage medium, which can realize automatic classification of embedded point behaviors by classifying all embedded point character strings without preparing an embedded point analysis table, and which has high efficiency in classifying the embedded point behaviors.
The invention provides an APP buried point behavior classification method, which comprises the following steps:
acquiring first buried point data sequences of a plurality of users, and determining a first buried point character string corresponding to each first buried point in the first buried point data sequences;
generating a mask picture according to the first embedded point character strings, and generating a first embedded point character string picture according to each first embedded point character string and the mask picture;
segmenting each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures;
calculating a first information entropy of each first buried point character string sub-picture, and calculating a first similarity between any two first buried point character string pictures according to the first information entropy;
and classifying the plurality of first embedded point behaviors according to the plurality of first similarity degrees to obtain a plurality of embedded point behavior categories.
In an optional embodiment, the generating a mask picture according to a plurality of first buried point character strings and generating a first buried point character string picture according to each first buried point character string and the mask picture includes:
calculating the character string length of each buried point character string;
generating a mask picture according to the maximum value of the lengths of the character strings;
positioning a target point in the mask picture;
and adding the first embedded point character string into the mask picture by taking the target point as an initial point to obtain a first embedded point character string picture.
In an optional embodiment, the splitting each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures includes:
calculating a string length difference between a maximum value of the plurality of string lengths and a minimum value of the plurality of string lengths;
judging whether the character string length difference is larger than a preset length threshold value or not;
when the character string length difference is larger than or equal to the preset length threshold, acquiring a preset first cut score, and uniformly cutting each first buried point character string picture according to the preset first cut score to obtain a plurality of first buried point character string sub-pictures;
and when the character string length difference is smaller than the preset length threshold, acquiring a preset second segmentation number, and uniformly segmenting each first embedded point character string picture according to the preset second segmentation number to obtain a plurality of first embedded point character string sub-pictures.
In an optional embodiment, the calculating a first similarity between any two first fixed-point character string pictures according to the first information entropy includes:
corresponding first buried point character string sub-pictures of any two first buried point character string pictures;
calculating a first difference value and a first mean value of first information entropies of any corresponding two first embedded point character string sub-pictures;
calculating to obtain a first sub-similarity according to the first difference and the corresponding first mean value;
and calculating the first similarity between any two first embedded point character string pictures according to the plurality of first sub-similarities.
In an optional embodiment, the classifying the plurality of first buried point rows according to the plurality of first similarities to obtain a plurality of buried point row categories includes:
selecting a first buried point behavior for the first time from a plurality of first buried point behaviors;
acquiring a first target similarity with a first similarity of the behavior of a first buried point selected for the first time greater than a preset similarity threshold;
classifying the first selected embedded point row and other first embedded point rows corresponding to the similarity of the first target into the same embedded point row category;
selecting a first buried point behavior for the second time from the remaining plurality of first buried point behaviors;
acquiring a second target similarity with a first similarity of the behavior of the second selected first embedded point larger than the preset similarity threshold;
classifying the first embedded point row selected for the second time and other first embedded point rows corresponding to the similarity of the second target into the same embedded point row category;
and repeating the process until all the first embedded point behaviors are classified to obtain a plurality of embedded point behavior categories.
In an optional embodiment, the method further comprises:
determining a second embedded point character string corresponding to each second embedded point in a second embedded point data sequence of the user to be tested;
generating a second buried point character string picture according to each second buried point character string and the mask picture;
segmenting each second buried point character string picture to obtain a plurality of second buried point character string sub-pictures;
calculating a second information entropy of each second buried point character string sub-picture, and calculating a second similarity between the second buried point character string picture and each first buried point character picture according to the second information entropy and the first information entropy;
and determining the behavior category of the embedded point of the user to be tested according to the plurality of second similarities.
In an optional embodiment, the method further comprises:
defining a category identification for each embedded point behavior category;
generating a data set according to each first buried point character string and the corresponding category identification;
training the XGBOOST based on the data set to obtain a buried point behavior classification model;
and classifying the second embedded point behavior sequence of the user to be detected by using the embedded point behavior classification model to obtain the embedded point behavior classification of the user to be detected.
A second aspect of the present invention provides an APP embedded point behavior classification apparatus, comprising:
the device comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring first buried point data sequences of a plurality of users and determining a first buried point character string corresponding to each first buried point in the first buried point data sequences;
the generating module is used for generating a mask picture according to the plurality of first embedded point character strings and generating a first embedded point character string picture according to each first embedded point character string and the mask picture;
the segmentation module is used for segmenting each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures;
the calculation module is used for calculating a first information entropy of each first buried point character string sub-picture and calculating a first similarity between any two first buried point character string pictures according to the first information entropy;
and the classification module is used for classifying the plurality of first embedded point behaviors according to the plurality of first similarities to obtain a plurality of embedded point behavior categories.
A third aspect of the invention provides a computer device comprising a processor for implementing the APP embedded point behavior classification method when executing a computer program stored in a memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the APP embedded point behavior classification method.
In summary, according to the APP embedded point behavior classification method, apparatus, computer device and storage medium of the present invention, an embedded point character string is determined for each first embedded point behavior in first embedded point data sequences of multiple users, and a mask picture is generated according to multiple first embedded point character strings, so that a first embedded point character string picture can be generated according to each first embedded point character string and the mask picture, and then each first embedded point character string picture is split to obtain multiple first embedded point character string sub-pictures; calculating a first information entropy of each first buried point character string sub-picture, and calculating a first similarity between any two first buried point character string pictures according to the first information entropy; and classifying the plurality of first embedded point behaviors according to the plurality of first similarity degrees to obtain a plurality of embedded point behavior categories. According to the method, a buried point analysis table does not need to be prepared, the specific meaning of each buried point character string does not need to be known, all the buried point character strings can be automatically classified, the buried point character strings belonging to the same buried point behavior category are determined, and the classification efficiency of the buried point behaviors is high.
Drawings
Fig. 1 is a flowchart of an APP embedded point behavior classification method according to an embodiment of the present invention.
Fig. 2 is a structural diagram of an APP embedded point behavior classification apparatus according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The APP embedded point behavior classification method provided by the embodiment of the invention is executed by computer equipment, and correspondingly, the APP embedded point behavior classification device runs in the computer equipment.
Fig. 1 is a flowchart of an APP embedded point behavior classification method according to an embodiment of the present invention. The APP buried point behavior classification method specifically comprises the following steps, and according to different requirements, the sequence of the steps in the flowchart can be changed, and some steps can be omitted.
S11, collecting first buried point data sequences of a plurality of users, and determining a first buried point character string corresponding to each first buried point in the first buried point data sequences.
Burying a recorder in any page or any button of the APP is called a buried point. The embedded point refers to a page or a statistical tool of buttons in the APP, and when a user clicks a button or stays on a page, the button or the embedded point corresponding to the page is triggered, and the triggered embedded point can automatically report the embedded point data.
The computer device may obtain a plurality of first buried point data of different users within a period of time, and concatenate the plurality of first buried point data of the same user to obtain a first buried point data sequence, where the first buried point data may include, but is not limited to: the system comprises a first buried point behavior, a first buried point ID, a first buried point timestamp, a first buried point description and a first user ID.
The first embedded behavior refers to an operation behavior of a user on a button in the APP, such as clicking, sharing, praise, closing and the like. Different pages and different buttons in the same page correspond to different first embedded point IDs, and the first embedded point timestamp is time information of the embedded point triggered. The first embedded point ID is a character string composed of a string of letters, numbers, or special symbols, and the like, different first embedded point IDs correspond to different first embedded point character strings, and lengths of the first embedded point character strings under different modules may be different.
And S12, generating a mask picture according to the plurality of first embedded point character strings, and generating a first embedded point character string picture according to each first embedded point character string and the mask picture.
The computer device generates a mask picture and processes each first buried point character string into a buried point character string picture of the same size according to the mask picture, and each first buried point data sequence can be converted into a set of the buried point character string pictures.
The first buried point character string is processed into the buried point character string picture, the buried point character string can be processed and analyzed in an image processing mode, the first similarity between any two first buried point character strings can be conveniently calculated subsequently, the calculation accuracy of the first similarity is improved, and therefore the classification accuracy of the first buried point character strings is improved.
In an optional embodiment, the generating a mask picture according to a plurality of first buried point character strings and generating a first buried point character string picture according to each first buried point character string and the mask picture includes:
calculating the character string length of each buried point character string;
generating a mask picture according to the maximum value of the lengths of the character strings;
positioning a target point in the mask picture;
and adding the first embedded point character string into the mask picture by taking the target point as an initial point to obtain a first embedded point character string picture.
And the computer equipment calculates the number of the embedded point characters included in each embedded point character string to obtain the length of the character string, and sorts all the calculated lengths of the character string from large to small by adopting a pre-stored sorting algorithm, so that the length of the character string sorted at the first is the maximum value of the length of the character string. The pre-stored ranking algorithm may be bubble ranking, hill ranking, simple insert ranking, quick ranking, or the like.
The computer device generates a mask picture according to the maximum value of the length of the character string, where the length of the mask picture may be the same as the maximum value of the length of the character string, or may be greater than the maximum value of the length of the character string by a preset value, for example, the maximum value of the length of the character string is 10, and the length of the generated mask picture may be 10 or 12.
After generating a mask picture, the computer device calculates a central point of the mask picture, and determines a point which is in the same direction with the central point and is a preset distance away from an edge in the mask picture as a target point. Wherein, the direction is the length direction of the mask picture.
In the optional embodiment, the mask picture is generated according to the maximum value of the length of the character string, so that the first embedded point character string with any length can be added to the mask picture, the information of the first embedded point character string is prevented from being lost, and the generated first embedded point character string pictures can be ensured to have the same size; by positioning the target point of the mask picture and adding any first embedded point character string to the mask picture by taking the target point as the starting point, the obtained first embedded point character string picture can have comparability after being segmented, and therefore the first similarity between any two first embedded point character strings is improved and calculated.
And S13, segmenting each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures.
Generally speaking, the names of the embedded point IDs in the same large module in the APP page are regular, so the embedded point character strings corresponding to the embedded point IDs in the same large module are approximately similar, but the character string lengths of the embedded point character strings are different. In order to improve the calculation speed of the first similarity between any two first buried point character string pictures, the computer device divides each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures, and calculates the first similarity between any two first buried point character string pictures by calculating the similarity between the plurality of first buried point character string sub-pictures.
In an optional embodiment, the splitting each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures includes:
calculating a string length difference between a maximum value of the plurality of string lengths and a minimum value of the plurality of string lengths;
judging whether the character string length difference is larger than a preset length threshold value or not;
when the character string length difference is larger than or equal to the preset length threshold, acquiring a preset first cut score, and uniformly cutting each first buried point character string picture according to the preset first cut score to obtain a plurality of first buried point character string sub-pictures;
and when the character string length difference is smaller than the preset length threshold, acquiring a preset second segmentation number, and uniformly segmenting each first embedded point character string picture according to the preset second segmentation number to obtain a plurality of first embedded point character string sub-pictures.
Wherein the preset first cut number is smaller than the preset second cut number. For example, the preset first cut number may be 5, and the preset second cut number may be 10.
When the difference value of the character string lengths between the maximum value of the character string lengths and the minimum value of the character string lengths is larger than or equal to the preset length threshold value, it is indicated that the difference between the first embedded point character strings is large, the difference between the first embedded point character string pictures is also large, the preset first cut fraction is adopted for segmentation, the number of the obtained first embedded point character string sub-pictures is relatively small, the calculation speed of the similarity between the first embedded point character string sub-pictures can be improved, and therefore the calculation efficiency of the first similarity between the first embedded point character string pictures is improved.
When the difference value of the string length between the maximum value of the string length and the minimum value of the string length is smaller than the preset length threshold value, the difference between the first embedded point character strings is small, the difference between the first embedded point character string pictures is small, the preset second segmentation number is adopted for segmentation, the number of the obtained first embedded point character string sub-pictures is relatively large, the calculation accuracy of the similarity between the first embedded point character string sub-pictures can be improved, and therefore the calculation accuracy of the first similarity between the first embedded point character string pictures is improved.
In this optional embodiment, how to split the first buried point character string picture is determined by a character string length difference between a maximum value of the plurality of character string lengths and a minimum value of the plurality of character string lengths, and the calculation speed and the calculation accuracy of the first similarity between the first buried point character string pictures can be effectively balanced.
S14, calculating a first information entropy of each first buried point character string sub-picture, and calculating a first similarity between any two first buried point character string pictures according to the first information entropy.
Since the content in the first buried dot character string picture is the first buried dot character string, the first buried dot character string picture is a grayscale picture, and the first buried dot character string sub-picture is also a grayscale picture.
The information entropy is a statistical form of image characteristics and reflects the average information amount in the image. The more similar the two pictures are, the closer the information entropy is; the more dissimilar the two pictures are, the less the information entropy is.
In the gray-scale picture, the value range of the gray-scale value of each pixel point is [0, 255], and a one-dimensional array containing 256 elements can be obtained according to the number of pixel points corresponding to each gray-scale value in the statistical picture from small to large of the gray-scale value. The first information entropy of the first buried dot character string sub-picture is calculated by the following formula (1):
pi represents the proportion of pixels with the gray scale value i in the first embedded point character string sub-picture, and H represents the first information entropy of the first embedded point character string sub-picture.
In an optional embodiment, the calculating a first similarity between any two first fixed-point character string pictures according to the first information entropy includes:
corresponding first buried point character string sub-pictures of any two first buried point character string pictures;
calculating a first difference value and a first mean value of first information entropies of any corresponding two first embedded point character string sub-pictures;
calculating to obtain a first sub-similarity according to the first difference and the corresponding first mean value;
and calculating the first similarity between any two first embedded point character string pictures according to the plurality of first sub-similarities.
Assuming that each first buried point character string picture is divided into N first buried point character string sub-pictures, the first buried point character string picture 1 is X, the first buried point character string picture 2 is Y, the computer device corresponds the 1 st first buried point character string sub-picture X1 of X with the 1 st first buried point character string sub-picture Y1 of Y, and then calculates the first information entropy of X1First information entropy with Y1A first sub-similarity (a first difference between the two first information entropies divided by a first mean value of the two first information entropies); the 2 nd first buried point character string sub-picture X2 of X is corresponding to the 2 nd first buried point character string sub-picture Y2 of Y, and then the first information entropy of X2 is calculatedFirst information entropy with Y2A first sub-similarity therebetween; and so on; the Nth first buried point character string sub-picture XN of X is corresponding to the Nth first buried point character string sub-picture YN of Y, and then the first information entropy of XN is calculatedFirst information entropy of YNA first sub-similarity therebetween; and finally, summing the N first sub-similarities to obtain a first similarity between the first embedded point character string picture 1 and the first embedded point character string picture 2.
Expressed by the following formula (2):
wherein D represents the first similarity.
In this optional embodiment, compared with the method of calculating the similarity between any two first buried point character strings, the method of the invention segments each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures, and calculates the first similarity between any two first buried point character string pictures according to the first information entropy of the plurality of first buried point character string sub-pictures, so that the accuracy of the first similarity is higher. The reason is that, assuming that two first buried point character strings under the same large module are 12367, 12345, respectively, the two first buried point character strings should be classified into the same buried point behavior category in principle, but if the similarity between the two first buried point character strings is calculated, the difference lies in "67" and "45", but if, by way of image processing, the difference lies in the difference between the first information entropies of "67" and "45", the first mean of these two first information entropies is less than 1, from the above equation (2), it can be seen that the difference between "67" and "45" is enhanced by the first mean value of the two first information entropies, therefore, the calculated first similarity is larger than the similarity between the two first buried point character strings, and the two first buried point character strings are more easily classified into the same buried point behavior category.
And S15, classifying the first embedded point row behaviors according to the first similarity to obtain a plurality of embedded point row behavior categories.
The computer device divides the plurality of first buried point row behaviors into a plurality of buried point row behavior categories by taking the first similarity as a classification basis.
In an optional embodiment, the classifying the plurality of first buried point rows according to the plurality of first similarities to obtain a plurality of buried point row categories includes:
selecting a first buried point behavior for the first time from a plurality of first buried point behaviors;
acquiring a first target similarity with a first similarity of the behavior of a first buried point selected for the first time greater than a preset similarity threshold;
classifying the first selected embedded point row and other first embedded point rows corresponding to the similarity of the first target into the same embedded point row category;
selecting a first buried point behavior for the second time from the remaining plurality of first buried point behaviors;
acquiring a second target similarity with a first similarity of the behavior of the second selected first embedded point larger than the preset similarity threshold;
classifying the first embedded point row selected for the second time and other first embedded point rows corresponding to the similarity of the second target into the same embedded point row category;
and repeating the process until all the first embedded point behaviors are classified to obtain a plurality of embedded point behavior categories.
After the computer device classifies a plurality of first buried point behaviors, the same buried point behavior category comprises one or more first buried point behaviors, the number of the first buried point behaviors in any buried point behavior category can be calculated, the operation behavior number in any buried point behavior category can be obtained, or the operation behavior time of the first buried point behavior in any buried point behavior category is calculated, the operation behavior number or the operation behavior time is determined as the buried point behavior feature of the user, then big data analysis can be carried out on the buried point behavior feature of the user, and commodity recommendation or behavior prediction can be carried out according to the analysis result.
In an optional embodiment, the method further comprises:
determining a second embedded point character string corresponding to each second embedded point in a second embedded point data sequence of the user to be tested;
generating a second buried point character string picture according to each second buried point character string and the mask picture;
segmenting each second buried point character string picture to obtain a plurality of second buried point character string sub-pictures;
calculating a second information entropy of each second buried point character string sub-picture, and calculating a second similarity between the second buried point character string picture and each first buried point character picture according to the second information entropy and the first information entropy;
and determining the behavior category of the embedded point of the user to be tested according to the plurality of second similarities.
If the embedded point behavior category of the user to be detected is to be determined, the computer device first obtains a second embedded point data sequence of the user to be detected within a period of time, where the second embedded point data sequence includes a plurality of second embedded point data, where the second embedded point data may include, but is not limited to: the second embedded point behavior, the second embedded point ID, the second embedded point timestamp, the second embedded point description and the second user ID of the user to be detected. The second buried point ID is a character string consisting of a string of letters, numbers, or special symbols.
And the computer equipment adds the second embedded point character string to the mask picture by taking the target point as an initial point to obtain a second embedded point character string picture, and segments the second embedded point character string picture according to the segmentation number for segmenting each first embedded point character string to obtain a plurality of second embedded point character string sub-pictures.
The process of calculating the second similarity between any second buried point character string picture and any first buried point character string picture by the computer device is as follows: calculating a second information entropy of any second buried point character string sub-picture by adopting the formula (1), corresponding a plurality of second buried point character string sub-pictures of any second buried point character string picture with a plurality of first buried point character string sub-pictures of any first buried point character string picture, calculating a second difference value and a second mean value of the second information entropy of any corresponding second buried point character string sub-picture and the first information entropy of the first buried point character string sub-pictures, calculating a second sub-similarity by adopting the formula (2) based on the second difference value and the corresponding second mean value, and calculating a second similarity between any second buried point character string picture and any first buried point character string picture according to the plurality of second sub-similarities.
And the computer equipment determines the first embedded point character string picture corresponding to the maximum value of the second similarity as the most similar picture of the second embedded point character string picture, determines the first embedded point behavior category corresponding to the most similar picture as the embedded point behavior category of the second embedded point behavior, and determines the embedded point behavior category of the user to be detected according to the embedded point behavior categories of all the second embedded point behaviors.
It should be understood that, for any one second buried point behavior, the method of the present invention may also be used to determine the buried point behavior class of any one second buried point behavior, and after determining the buried point behavior class of any one second buried point behavior, any one second buried point behavior and all first buried point behaviors under the corresponding buried point behavior class may be stored in one folder, so as to facilitate subsequent analysis and research.
In an optional embodiment, the method further comprises:
defining a category identification for each embedded point behavior category;
generating a data set according to each first buried point character string and the corresponding category identification;
training the XGBOOST based on the data set to obtain a buried point behavior classification model;
and classifying the second embedded point behavior sequence of the user to be detected by using the embedded point behavior classification model to obtain the embedded point behavior classification of the user to be detected.
And the computer equipment takes each second buried point character string in the second buried point behavior sequence of the user to be detected as the input of the buried point behavior classification model, obtains the buried point behavior category identification of each second buried point character string through the classification of the buried point behavior classification model, and counts all the obtained buried point behavior category identifications to determine the buried point behavior category of the user to be detected.
The computer equipment can also calculate the number of the embedded point behavior category identifications corresponding to the obtained embedded point behavior categories, sort the number, determine the embedded point behavior categories corresponding to the number of the top K as the recommendation categories of the users to be tested, and recommend the commodities for the users to be tested according to the recommendation categories.
In the optional embodiment, the classification model is trained by using the idea of machine learning to determine the classification of the embedded point behavior of the user to be detected, instead of determining the classification of the embedded point behavior of the user to be detected by using an image processing mode, so that the classification efficiency of the classification of the embedded point behavior of the user to be detected can be improved.
According to the method, a buried point analysis table does not need to be prepared, the specific meaning of each buried point character string does not need to be known, all buried point character strings can be automatically classified, and the buried point character strings belonging to the same buried point behavior category are determined; after a page is newly added in the APP, if the embedded point character strings in the newly added page can be automatically classified into the existing embedded point behavior categories, the application of the model is not influenced; if the character strings of the embedded points in the newly added page cannot be automatically classified into the existing embedded point behavior categories, the embedded point behavior categories can be newly added, and the embedded point behavior classification model is iteratively updated based on the newly added embedded point behavior categories.
It is emphasized that to further ensure privacy and security of the above-described buried point behavior classification model, the above-described buried point behavior classification model may be stored in a node of the blockchain.
Fig. 2 is a structural diagram of an APP embedded point behavior classification apparatus according to a second embodiment of the present invention.
In some embodiments, the APP embedded point behavior classification apparatus 20 may include a plurality of functional modules composed of computer program segments. The computer programs of the respective program segments in the APP buried point behavior classification apparatus 20 may be stored in a memory of a computer device and executed by at least one processor to perform (see fig. 1 for details) the functions of APP buried point behavior classification.
In this embodiment, the APP embedded point behavior classification apparatus 20 may be divided into a plurality of functional modules according to the functions executed by the apparatus. The functional module may include: the system comprises an acquisition module 201, a generation module 202, a segmentation module 203, a calculation module 204, a classification module 205 and a training module 206. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.
The acquisition module 201 is configured to acquire first buried point data sequences of multiple users, and determine a first buried point character string corresponding to each first buried point row in the first buried point data sequences.
Burying a recorder in any page or any button of the APP is called a buried point. The embedded point refers to a page or a statistical tool of buttons in the APP, and when a user clicks a button or stays on a page, the button or the embedded point corresponding to the page is triggered, and the triggered embedded point can automatically report the embedded point data.
The computer device may obtain a plurality of first buried point data of different users within a period of time, and concatenate the plurality of first buried point data of the same user to obtain a first buried point data sequence, where the first buried point data may include, but is not limited to: the system comprises a first buried point behavior, a first buried point ID, a first buried point timestamp, a first buried point description and a first user ID.
The first embedded behavior refers to an operation behavior of a user on a button in the APP, such as clicking, sharing, praise, closing and the like. Different pages and different buttons in the same page correspond to different first embedded point IDs, and the first embedded point timestamp is time information of the embedded point triggered. The first embedded point ID is a character string composed of a string of letters, numbers, or special symbols, and the like, different first embedded point IDs correspond to different first embedded point character strings, and lengths of the first embedded point character strings under different modules may be different.
The generating module 202 is configured to generate a mask picture according to the plurality of first embedded point character strings, and generate a first embedded point character string picture according to each first embedded point character string and the mask picture.
The computer device generates a mask picture and processes each first buried point character string into a buried point character string picture of the same size according to the mask picture, and each first buried point data sequence can be converted into a set of the buried point character string pictures.
The first buried point character string is processed into the buried point character string picture, the buried point character string can be processed and analyzed in an image processing mode, the first similarity between any two first buried point character strings can be conveniently calculated subsequently, the calculation accuracy of the first similarity is improved, and therefore the classification accuracy of the first buried point character strings is improved.
In an optional embodiment, the generating module 202 generates a mask picture according to a plurality of the first buried point character strings, and generating a first buried point character string picture according to each of the first buried point character strings and the mask picture includes:
calculating the character string length of each buried point character string;
generating a mask picture according to the maximum value of the lengths of the character strings;
positioning a target point in the mask picture;
and adding the first embedded point character string into the mask picture by taking the target point as an initial point to obtain a first embedded point character string picture.
And the computer equipment calculates the number of the embedded point characters included in each embedded point character string to obtain the length of the character string, and sorts all the calculated lengths of the character string from large to small by adopting a pre-stored sorting algorithm, so that the length of the character string sorted at the first is the maximum value of the length of the character string. The pre-stored ranking algorithm may be bubble ranking, hill ranking, simple insert ranking, quick ranking, or the like.
The computer device generates a mask picture according to the maximum value of the length of the character string, where the length of the mask picture may be the same as the maximum value of the length of the character string, or may be greater than the maximum value of the length of the character string by a preset value, for example, the maximum value of the length of the character string is 10, and the length of the generated mask picture may be 10 or 12.
After generating a mask picture, the computer device calculates a central point of the mask picture, and determines a point which is in the same direction with the central point and is a preset distance away from an edge in the mask picture as a target point. Wherein, the direction is the length direction of the mask picture.
In the optional embodiment, the mask picture is generated according to the maximum value of the length of the character string, so that the first embedded point character string with any length can be added to the mask picture, the information of the first embedded point character string is prevented from being lost, and the generated first embedded point character string pictures can be ensured to have the same size; by positioning the target point of the mask picture and adding any first embedded point character string to the mask picture by taking the target point as the starting point, the obtained first embedded point character string picture can have comparability after being segmented, and therefore the first similarity between any two first embedded point character strings is improved and calculated.
The segmentation module 203 is configured to segment each first buried dot character string picture to obtain a plurality of first buried dot character string sub-pictures.
Generally speaking, the names of the embedded point IDs in the same large module in the APP page are regular, so the embedded point character strings corresponding to the embedded point IDs in the same large module are approximately similar, but the character string lengths of the embedded point character strings are different. In order to improve the calculation speed of the first similarity between any two first buried point character string pictures, the computer device divides each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures, and calculates the first similarity between any two first buried point character string pictures by calculating the similarity between the plurality of first buried point character string sub-pictures.
In an optional embodiment, the splitting module 203 splits each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures, including:
calculating a string length difference between a maximum value of the plurality of string lengths and a minimum value of the plurality of string lengths;
judging whether the character string length difference is larger than a preset length threshold value or not;
when the character string length difference is larger than or equal to the preset length threshold, acquiring a preset first cut score, and uniformly cutting each first buried point character string picture according to the preset first cut score to obtain a plurality of first buried point character string sub-pictures;
and when the character string length difference is smaller than the preset length threshold, acquiring a preset second segmentation number, and uniformly segmenting each first embedded point character string picture according to the preset second segmentation number to obtain a plurality of first embedded point character string sub-pictures.
Wherein the preset first cut number is smaller than the preset second cut number. For example, the preset first cut number may be 5, and the preset second cut number may be 10.
When the difference value of the character string lengths between the maximum value of the character string lengths and the minimum value of the character string lengths is larger than or equal to the preset length threshold value, it is indicated that the difference between the first embedded point character strings is large, the difference between the first embedded point character string pictures is also large, the preset first cut fraction is adopted for segmentation, the number of the obtained first embedded point character string sub-pictures is relatively small, the calculation speed of the similarity between the first embedded point character string sub-pictures can be improved, and therefore the calculation efficiency of the first similarity between the first embedded point character string pictures is improved.
When the difference value of the string length between the maximum value of the string length and the minimum value of the string length is smaller than the preset length threshold value, the difference between the first embedded point character strings is small, the difference between the first embedded point character string pictures is small, the preset second segmentation number is adopted for segmentation, the number of the obtained first embedded point character string sub-pictures is relatively large, the calculation accuracy of the similarity between the first embedded point character string sub-pictures can be improved, and therefore the calculation accuracy of the first similarity between the first embedded point character string pictures is improved.
In this optional embodiment, how to split the first buried point character string picture is determined by a character string length difference between a maximum value of the plurality of character string lengths and a minimum value of the plurality of character string lengths, and the calculation speed and the calculation accuracy of the first similarity between the first buried point character string pictures can be effectively balanced.
The calculating module 204 is configured to calculate a first information entropy of each first buried point character string sub-picture, and calculate a first similarity between any two first buried point character string pictures according to the first information entropy.
Since the content in the first buried dot character string picture is the first buried dot character string, the first buried dot character string picture is a grayscale picture, and the first buried dot character string sub-picture is also a grayscale picture.
The information entropy is a statistical form of image characteristics and reflects the average information amount in the image. The more similar the two pictures are, the closer the information entropy is; the more dissimilar the two pictures are, the less the information entropy is.
In the gray-scale picture, the value range of the gray-scale value of each pixel point is [0, 255], and a one-dimensional array containing 256 elements can be obtained according to the number of pixel points corresponding to each gray-scale value in the statistical picture from small to large of the gray-scale value. The first information entropy of the first buried dot character string sub-picture is calculated by the following formula (1):
pi represents the proportion of pixels with the gray scale value i in the first embedded point character string sub-picture, and H represents the first information entropy of the first embedded point character string sub-picture.
In an optional embodiment, the calculating module 204 calculates a first similarity between any two first buried point character string pictures according to the first information entropy includes:
corresponding first buried point character string sub-pictures of any two first buried point character string pictures;
calculating a first difference value and a first mean value of first information entropies of any corresponding two first embedded point character string sub-pictures;
calculating to obtain a first sub-similarity according to the first difference and the corresponding first mean value;
and calculating the first similarity between any two first embedded point character string pictures according to the plurality of first sub-similarities.
Assuming that each first buried point character string picture is divided into N first buried point character string sub-pictures, the first buried point character string picture 1 is X, the first buried point character string picture 2 is Y, the computer device corresponds the 1 st first buried point character string sub-picture X1 of X with the 1 st first buried point character string sub-picture Y1 of Y, and then calculates the first information entropy of X1First information entropy with Y1A first sub-similarity (a first difference between the two first information entropies divided by a first mean value of the two first information entropies); the 2 nd first dotted character string sub-picture X2 of X is corresponding to the 2 nd first dotted character string sub-picture Y2 of Y, howeverPost-computing first information entropy of X2First information entropy with Y2A first sub-similarity therebetween; and so on; the Nth first buried point character string sub-picture XN of X is corresponding to the Nth first buried point character string sub-picture YN of Y, and then the first information entropy of XN is calculatedFirst information entropy of YNA first sub-similarity therebetween; and finally, summing the N first sub-similarities to obtain a first similarity between the first embedded point character string picture 1 and the first embedded point character string picture 2.
Expressed by the following formula (2):
wherein D represents the first similarity.
In this optional embodiment, compared with the method of calculating the similarity between any two first buried point character strings, the method of the invention segments each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures, and calculates the first similarity between any two first buried point character string pictures according to the first information entropy of the plurality of first buried point character string sub-pictures, so that the accuracy of the first similarity is higher. The reason is that, assuming that two first buried point character strings under the same large module are 12367, 12345, respectively, the two first buried point character strings should be classified into the same buried point behavior category in principle, but if the similarity between the two first buried point character strings is calculated, the difference lies in "67" and "45", but if, by way of image processing, the difference lies in the difference between the first information entropies of "67" and "45", the first mean of these two first information entropies is less than 1, from the above equation (2), it can be seen that the difference between "67" and "45" is enhanced by the first mean value of the two first information entropies, therefore, the calculated first similarity is larger than the similarity between the two first buried point character strings, and the two first buried point character strings are more easily classified into the same buried point behavior category.
The classifying module 205 is configured to classify the plurality of first embedded point behaviors according to the plurality of first similarities to obtain a plurality of embedded point behavior classes.
The computer device divides the plurality of first buried point row behaviors into a plurality of buried point row behavior categories by taking the first similarity as a classification basis.
In an optional embodiment, the classifying module 205 classifies the first buried point behaviors according to the first similarities, and obtaining a plurality of buried point behavior categories includes:
selecting a first buried point behavior for the first time from a plurality of first buried point behaviors;
acquiring a first target similarity with a first similarity of the behavior of a first buried point selected for the first time greater than a preset similarity threshold;
classifying the first selected embedded point row and other first embedded point rows corresponding to the similarity of the first target into the same embedded point row category;
selecting a first buried point behavior for the second time from the remaining plurality of first buried point behaviors;
acquiring a second target similarity with a first similarity of the behavior of the second selected first embedded point larger than the preset similarity threshold;
classifying the first embedded point row selected for the second time and other first embedded point rows corresponding to the similarity of the second target into the same embedded point row category;
and repeating the process until all the first embedded point behaviors are classified to obtain a plurality of embedded point behavior categories.
After the computer device classifies a plurality of first buried point behaviors, the same buried point behavior category comprises one or more first buried point behaviors, the number of the first buried point behaviors in any buried point behavior category can be calculated, the operation behavior number in any buried point behavior category can be obtained, or the operation behavior time of the first buried point behavior in any buried point behavior category is calculated, the operation behavior number or the operation behavior time is determined as the buried point behavior feature of the user, then big data analysis can be carried out on the buried point behavior feature of the user, and commodity recommendation or behavior prediction can be carried out according to the analysis result.
In an optional embodiment, the acquisition module 201 is further configured to determine a second buried point character string corresponding to each second buried point row in a second buried point data sequence of the user to be tested.
The generating module 202 is further configured to generate a second embedded point character string picture according to each second embedded point character string and the mask picture.
The segmentation module 203 is further configured to segment each second buried point character string picture to obtain a plurality of second buried point character string sub-pictures.
The calculating module 204 is further configured to calculate a second information entropy of each second buried point character string sub-picture, and calculate a second similarity between the second buried point character string picture and each first buried point character picture according to the second information entropy and the first information entropy.
The classification module 205 is further configured to determine the behavior category of the embedded point of the user to be tested according to the plurality of second similarities.
If the embedded point behavior category of the user to be detected is to be determined, the computer device first obtains a second embedded point data sequence of the user to be detected within a period of time, where the second embedded point data sequence includes a plurality of second embedded point data, where the second embedded point data may include, but is not limited to: the second embedded point behavior, the second embedded point ID, the second embedded point timestamp, the second embedded point description and the second user ID of the user to be detected. The second buried point ID is a character string consisting of a string of letters, numbers, or special symbols.
And the computer equipment adds the second embedded point character string to the mask picture by taking the target point as an initial point to obtain a second embedded point character string picture, and segments the second embedded point character string picture according to the segmentation number for segmenting each first embedded point character string to obtain a plurality of second embedded point character string sub-pictures.
The process of calculating the second similarity between any second buried point character string picture and any first buried point character string picture by the computer device is as follows: calculating a second information entropy of any second buried point character string sub-picture by adopting the formula (1), corresponding a plurality of second buried point character string sub-pictures of any second buried point character string picture with a plurality of first buried point character string sub-pictures of any first buried point character string picture, calculating a second difference value and a second mean value of the second information entropy of any corresponding second buried point character string sub-picture and the first information entropy of the first buried point character string sub-pictures, calculating a second sub-similarity by adopting the formula (2) based on the second difference value and the corresponding second mean value, and calculating a second similarity between any second buried point character string picture and any first buried point character string picture according to the plurality of second sub-similarities.
And the computer equipment determines the first embedded point character string picture corresponding to the maximum value of the second similarity as the most similar picture of the second embedded point character string picture, determines the first embedded point behavior category corresponding to the most similar picture as the embedded point behavior category of the second embedded point behavior, and determines the embedded point behavior category of the user to be detected according to the embedded point behavior categories of all the second embedded point behaviors.
It should be understood that, for any one second buried point behavior, the device of the present invention may also be used to determine the buried point behavior class of any one second buried point behavior, and after determining the buried point behavior class of any one second buried point behavior, any one second buried point behavior and all first buried point behaviors under the corresponding buried point behavior class may be stored in one folder, so as to facilitate subsequent analysis and research.
The training module 206 is further configured to train a buried point behavior classification model.
In an alternative embodiment, the training module 206 training the buried point behavior classification model includes:
defining a category identification for each embedded point behavior category;
generating a data set according to each first buried point character string and the corresponding category identification;
training the XGBOOST based on the data set to obtain a buried point behavior classification model;
and classifying the second embedded point behavior sequence of the user to be detected by using the embedded point behavior classification model to obtain the embedded point behavior classification of the user to be detected.
And the computer equipment takes each second buried point character string in the second buried point behavior sequence of the user to be detected as the input of the buried point behavior classification model, obtains the buried point behavior category identification of each second buried point character string through the classification of the buried point behavior classification model, and counts all the obtained buried point behavior category identifications to determine the buried point behavior category of the user to be detected.
The computer equipment can also calculate the number of the embedded point behavior category identifications corresponding to the obtained embedded point behavior categories, sort the number, determine the embedded point behavior categories corresponding to the number of the top K as the recommendation categories of the users to be tested, and recommend the commodities for the users to be tested according to the recommendation categories.
In the optional embodiment, the classification model is trained by using the idea of machine learning to determine the classification of the embedded point behavior of the user to be detected, instead of determining the classification of the embedded point behavior of the user to be detected by using an image processing mode, so that the classification efficiency of the classification of the embedded point behavior of the user to be detected can be improved.
According to the method, a buried point analysis table does not need to be prepared, the specific meaning of each buried point character string does not need to be known, all buried point character strings can be automatically classified, and the buried point character strings belonging to the same buried point behavior category are determined; after a page is newly added in the APP, if the embedded point character strings in the newly added page can be automatically classified into the existing embedded point behavior categories, the application of the model is not influenced; if the character strings of the embedded points in the newly added page cannot be automatically classified into the existing embedded point behavior categories, the embedded point behavior categories can be newly added, and the embedded point behavior classification model is iteratively updated based on the newly added embedded point behavior categories.
It is emphasized that to further ensure privacy and security of the above-described buried point behavior classification model, the above-described buried point behavior classification model may be stored in a node of the blockchain.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the computer device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the computer device shown in fig. 3 does not constitute a limitation of the embodiments of the present invention, and may be a bus-type configuration or a star-type configuration, and that the computer device 3 may include more or less hardware or software than those shown, or a different arrangement of components.
In some embodiments, the computer device 3 is a device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The computer device 3 may also include a client device, which includes, but is not limited to, any electronic product capable of interacting with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, etc.
It should be noted that the computer device 3 is only an example, and other electronic products that are currently available or may come into existence in the future, such as electronic products that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
In some embodiments, the memory 31 has stored therein a computer program which, when executed by the at least one processor 32, implements all or part of the steps of the APP buried point behavior classification method as described. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In some embodiments, the at least one processor 32 is a Control Unit (Control Unit) of the computer device 3, connects various components of the entire computer device 3 by using various interfaces and lines, and executes various functions and processes data of the computer device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31. For example, the at least one processor 32, when executing the computer program stored in the memory, implements all or part of the steps of the APP embedded point behavior classification method described in the embodiments of the present invention; or all or part of functions of the APP buried point behavior classification device are realized. The at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips.
In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the computer device 3 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The computer device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the specification may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (10)
1. An APP buried point behavior classification method is characterized by comprising the following steps:
acquiring first buried point data sequences of a plurality of users, and determining a first buried point character string corresponding to each first buried point in the first buried point data sequences;
generating a mask picture according to the first embedded point character strings, and generating a first embedded point character string picture according to each first embedded point character string and the mask picture;
segmenting each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures;
calculating a first information entropy of each first buried point character string sub-picture, and calculating a first similarity between any two first buried point character string pictures according to the first information entropy;
and classifying the plurality of first embedded point behaviors according to the plurality of first similarity degrees to obtain a plurality of embedded point behavior categories.
2. The APP buried point behavior classification method of claim 1, wherein the generating a mask picture from a plurality of the first buried point character strings and generating a first buried point character string picture from each of the first buried point character strings and the mask picture comprises:
calculating the character string length of each buried point character string;
generating a mask picture according to the maximum value of the lengths of the character strings;
positioning a target point in the mask picture;
and adding the first embedded point character string into the mask picture by taking the target point as an initial point to obtain a first embedded point character string picture.
3. The APP embedded point behavior classification method of claim 1, wherein the splitting each first embedded point character string picture to obtain a plurality of first embedded point character string sub-pictures comprises:
calculating a string length difference between a maximum value of the plurality of string lengths and a minimum value of the plurality of string lengths;
judging whether the character string length difference is larger than a preset length threshold value or not;
when the character string length difference is larger than or equal to the preset length threshold, acquiring a preset first cut score, and uniformly cutting each first buried point character string picture according to the preset first cut score to obtain a plurality of first buried point character string sub-pictures;
and when the character string length difference is smaller than the preset length threshold, acquiring a preset second segmentation number, and uniformly segmenting each first embedded point character string picture according to the preset second segmentation number to obtain a plurality of first embedded point character string sub-pictures.
4. The APP buried point behavior classification method of claim 1, wherein the calculating of the first similarity between any two first buried point character string pictures according to the first information entropy includes:
corresponding first buried point character string sub-pictures of any two first buried point character string pictures;
calculating a first difference value and a first mean value of first information entropies of any corresponding two first embedded point character string sub-pictures;
calculating to obtain a first sub-similarity according to the first difference and the corresponding first mean value;
and calculating the first similarity between any two first embedded point character string pictures according to the plurality of first sub-similarities.
5. The APP buried point behavior classification method of claim 1, wherein said classifying the first plurality of buried point behaviors according to the first plurality of similarities to obtain a plurality of buried point behavior classes comprises:
(a) selecting a first buried point behavior for the first time from a plurality of first buried point behaviors;
(b) acquiring a first target similarity with a first similarity of the behavior of a first buried point selected for the first time greater than a preset similarity threshold;
(c) classifying the first selected embedded point row and other first embedded point rows corresponding to the similarity of the first target into the same embedded point row category;
(d) selecting a first buried point behavior for the second time from the remaining plurality of first buried point behaviors;
(e) acquiring a second target similarity with a first similarity of the behavior of the second selected first embedded point larger than the preset similarity threshold;
(f) classifying the first embedded point row selected for the second time and other first embedded point rows corresponding to the similarity of the second target into the same embedded point row category;
(g) repeating the processes (d) to (f) until all the first embedded point behaviors are classified to obtain a plurality of embedded point behavior classes.
6. The APP buried point behavior classification method of any one of claims 1 to 5, further comprising:
determining a second embedded point character string corresponding to each second embedded point in a second embedded point data sequence of the user to be tested;
generating a second buried point character string picture according to each second buried point character string and the mask picture;
segmenting each second buried point character string picture to obtain a plurality of second buried point character string sub-pictures;
calculating a second information entropy of each second buried point character string sub-picture, and calculating a second similarity between the second buried point character string picture and each first buried point character picture according to the second information entropy and the first information entropy;
and determining the behavior category of the embedded point of the user to be tested according to the plurality of second similarities.
7. The APP buried point behavior classification method of any one of claims 1 to 5, further comprising:
defining a category identification for each embedded point behavior category;
generating a data set according to each first buried point character string and the corresponding category identification;
training the XGBOOST based on the data set to obtain a buried point behavior classification model;
and classifying the second embedded point behavior sequence of the user to be detected by using the embedded point behavior classification model to obtain the embedded point behavior classification of the user to be detected.
8. The utility model provides a APP buried point behavior classification device which characterized in that, the device includes:
the device comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring first buried point data sequences of a plurality of users and determining a first buried point character string corresponding to each first buried point in the first buried point data sequences;
the generating module is used for generating a mask picture according to the plurality of first embedded point character strings and generating a first embedded point character string picture according to each first embedded point character string and the mask picture;
the segmentation module is used for segmenting each first buried point character string picture to obtain a plurality of first buried point character string sub-pictures;
the calculation module is used for calculating a first information entropy of each first buried point character string sub-picture and calculating a first similarity between any two first buried point character string pictures according to the first information entropy;
and the classification module is used for classifying the plurality of first embedded point behaviors according to the plurality of first similarities to obtain a plurality of embedded point behavior categories.
9. A computer device, characterized in that it comprises a processor for implementing the APP buried point behavior classification method according to any one of claims 1 to 7 when executing a computer program stored in a memory.
10. A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the APP buried point behavior classification method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011462475.4A CN112231196B (en) | 2020-12-14 | 2020-12-14 | APP embedded point behavior classification method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011462475.4A CN112231196B (en) | 2020-12-14 | 2020-12-14 | APP embedded point behavior classification method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112231196A CN112231196A (en) | 2021-01-15 |
CN112231196B true CN112231196B (en) | 2021-03-16 |
Family
ID=74124630
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011462475.4A Active CN112231196B (en) | 2020-12-14 | 2020-12-14 | APP embedded point behavior classification method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112231196B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113961437B (en) * | 2021-10-20 | 2024-10-29 | 中国平安财产保险股份有限公司 | Security monitoring method and device based on artificial intelligence, electronic equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102999535A (en) * | 2011-09-19 | 2013-03-27 | 阿里巴巴集团控股有限公司 | Information display method, information acquisition method, client terminal and server |
CN103593285A (en) * | 2013-10-17 | 2014-02-19 | 北京控制工程研究所 | Image software unit test method based on judgment recognition |
CN111984896A (en) * | 2019-05-24 | 2020-11-24 | 上海哔哩哔哩科技有限公司 | Buried point data acquisition method and device, computer equipment and readable storage medium |
CN112035111A (en) * | 2020-09-01 | 2020-12-04 | 平安健康保险股份有限公司 | Page editing method, system, computer device and computer readable storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ATE468572T1 (en) * | 2008-01-18 | 2010-06-15 | Mvtec Software Gmbh | SYSTEM AND METHOD FOR DETECTING DEFORMABLE OBJECTS |
JP2009260736A (en) * | 2008-03-24 | 2009-11-05 | Fujitsu Ltd | Encoding apparatus, decoding apparatus, moving image processing method, moving image processing system, encoding program, and decoding program |
CN108121816B (en) * | 2017-12-28 | 2020-09-08 | Oppo广东移动通信有限公司 | Picture classification method and device, storage medium and electronic equipment |
CN111858383A (en) * | 2020-07-31 | 2020-10-30 | 平安普惠企业管理有限公司 | Mobile APP data point burying method and system, terminal equipment and storage medium |
CN112000587B (en) * | 2020-10-29 | 2021-11-23 | 四川新网银行股份有限公司 | Test man-hour automatic statistical method based on associated object operation statistics |
-
2020
- 2020-12-14 CN CN202011462475.4A patent/CN112231196B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102999535A (en) * | 2011-09-19 | 2013-03-27 | 阿里巴巴集团控股有限公司 | Information display method, information acquisition method, client terminal and server |
CN103593285A (en) * | 2013-10-17 | 2014-02-19 | 北京控制工程研究所 | Image software unit test method based on judgment recognition |
CN111984896A (en) * | 2019-05-24 | 2020-11-24 | 上海哔哩哔哩科技有限公司 | Buried point data acquisition method and device, computer equipment and readable storage medium |
CN112035111A (en) * | 2020-09-01 | 2020-12-04 | 平安健康保险股份有限公司 | Page editing method, system, computer device and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112231196A (en) | 2021-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112447189A (en) | Voice event detection method and device, electronic equipment and computer storage medium | |
CN112231586A (en) | Course recommendation method, device, equipment and medium based on transfer learning | |
CN111666415A (en) | Topic clustering method and device, electronic equipment and storage medium | |
CN111860377A (en) | Live broadcast method and device based on artificial intelligence, electronic equipment and storage medium | |
CN111402579A (en) | Road congestion degree prediction method, electronic device and readable storage medium | |
CN115034315B (en) | Service processing method and device based on artificial intelligence, computer equipment and medium | |
CN113570286B (en) | Resource allocation method and device based on artificial intelligence, electronic equipment and medium | |
CN113626607A (en) | Abnormal work order identification method and device, electronic equipment and readable storage medium | |
CN114138784A (en) | Information tracing method and device based on storage library, electronic equipment and medium | |
CN113065607A (en) | Image detection method, image detection device, electronic device, and medium | |
CN113704623A (en) | Data recommendation method, device, equipment and storage medium | |
CN112231196B (en) | APP embedded point behavior classification method and device, computer equipment and storage medium | |
CN115222443A (en) | Client group division method, device, equipment and storage medium | |
CN113591881A (en) | Intention recognition method and device based on model fusion, electronic equipment and medium | |
US20240311931A1 (en) | Method, apparatus, device, and storage medium for clustering extraction of entity relationships | |
CN114003704A (en) | Method and device for creating designated tag guest group, electronic equipment and storage medium | |
CN110717432B (en) | Article detection method, apparatus and computer storage medium | |
CN113139381A (en) | Unbalanced sample classification method and device, electronic equipment and storage medium | |
CN112101191A (en) | Expression recognition method, device, equipment and medium based on frame attention network | |
CN111651625A (en) | Image retrieval method, image retrieval device, electronic equipment and storage medium | |
CN113515591B (en) | Text defect information identification method and device, electronic equipment and storage medium | |
CN113269190B (en) | Data classification method and device based on artificial intelligence, computer equipment and medium | |
CN116340537A (en) | Character relation extraction method and device, electronic equipment and storage medium | |
CN111860661B (en) | Data analysis method and device based on user behaviors, electronic equipment and medium | |
CN114996386A (en) | Business role identification method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |