CN114860918A

CN114860918A - Mobile application recommendation method and device fusing multi-source reliable information

Info

Publication number: CN114860918A
Application number: CN202210574977.9A
Authority: CN
Inventors: 胡阳雨; 祝清意; 张亮; 余桀; 毛美玲
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2022-08-05

Abstract

The invention relates to the field of recommendation, in particular to a mobile application recommendation method and device fusing multi-source reliable information, wherein the method comprises the steps of obtaining a mobile application sample and corresponding extension data thereof; extracting function information, permission information and context text information of each interface of an application sample, inputting a subfunction classifier to obtain subfunction classification, and obtaining a preference value of a user to the application by combining time information and position information of the application used by the user; acquiring popularity information of the application, acquiring the reliability of each popularity data through a reliability classifier, and calculating the popularity of the application; recommending applications to the users by integrating the preference values of the users to the applications and the popularity of the applications; according to the method, various sub-functions contained in the application are extracted, so that the problems of incomplete situation information and coarse granularity are solved, and the accuracy and precision of application recommendation are improved; by extracting features of the cross-market level, the false popularity data of the application is more comprehensively discovered, and the influence of the false data on the recommendation model is reduced.

Description

Mobile application recommendation method and device fusing multi-source reliable information

Technical Field

The invention relates to the field of recommendation, in particular to a mobile application recommendation method and device fusing multi-source reliable information.

Background

At present, a large amount of research is carried out by relevant scholars at home and abroad in the field of mobile application recommendation, and effective recommendation models are provided. The model mainly comprises the following components from the perspective of a recommendation method: the recommendation model based on collaborative filtering, the recommendation model based on interactive information and the recommendation model based on extended information.

Collaborative filtering is a more traditional method in a recommendation system, and is widely applied in the fields of commodity recommendation, video recommendation and the like. Therefore, part of the scholars directly migrate the idea to the mobile application recommendation system. Collaborative filtering is mainly based on the assumption that users with similar item experience often have similar preferences, and item recommendation is performed by calculating the similarity between users and items. In mobile application recommendation, the identification of users with the same preference by existing research mainly comprises: calculating the similarity of semantic relations among mobile applications used by users, mining the similarity of application logs used by users, analyzing the similarity of comment contents of users, analyzing the similarity of functional requirements or safety requirements of users and the like.

The recommendation model based on the interactive information generally utilizes rich information generated in the interactive process of the mobile user and the application, including time information, position information and the like of the application used by the user, the related interactive information can also be understood as a context log of the user, and the association between the user preference and the context is established through a matrix decomposition model, a semantic model or a deep learning model to obtain the behavior habit of the user, so that the recommendation of the mobile application is realized.

The extension information mainly comprises user comment information, application version information, application permission information, application description information, picture information and the like, and is usually added into a model training process to help enrich, filter and fill an interaction model of a user and an application, so that the preference probability of the user to the mobile application under a specific situation is obtained, and a recommendation method based on the interaction information is better explained.

In the process of constructing a similarity model, vectors of mobile users or applications are often very sparse, so that the model is very difficult to construct, and the accuracy and precision of recommendation are low. The other two methods consider the interaction information of the user and the application and other extension information, but neglect the influence of the reliability and the integrity of the related information on the model. For example, the related model usually introduces extended information such as application description, application version, etc. to help enrich the context log of the user, however, one mobile application usually includes a plurality of sub-functions, and the application functions in the application description provided by the developer are often incomplete, resulting in inaccuracy of the context information. In addition, the relevant model generally ranks the recommendation results in combination with the popularity information of the application, and false comment information, score information and the like generated by ranking fraudulent behaviors common in mobile applications have a great influence on the accuracy of the recommendation results.

Disclosure of Invention

In order to improve the reliability and integrity of input data and help to recommend mobile applications more accurately and reliably, the invention provides a mobile application recommendation method and device fusing multi-source reliable information, wherein the method comprises the following steps:

acquiring a mobile application sample and corresponding extension data thereof;

extracting function information, permission information and context text information of each interface, and inputting the information into a subfunction classifier, wherein the output of the subfunction classifier is the name of each type of subfunction and the corresponding interface number;

the output of the sub-function classifier and the time information and the position information of the application used by the user are used as input to obtain a preference value of the user for the application;

inputting the popularity data into a reliability classifier to perform reliability calculation, and distributing reliability weight to the data according to the reliability;

calculating the popularity of the application according to the popularity data of the application and the corresponding reliability weight;

and comprehensively sorting according to the preference value of the user to the application and the popularity of the application, and recommending the top N sorted applications to the user.

Further, the acquiring process of the sub-function classifier comprises:

for different types of mobile applications, respectively selecting m applications, manually trying each application and traversing all interfaces of the application, and labeling the function of each interface;

acquiring a function name used under each interface and an applied authority name, and positioning an entry point of each interface, namely an ID (identity) of an entry component according to a callback function used by a system component;

dynamically running the application, traversing all interfaces of the application, acquiring Activity names of all the interfaces and contained text information as context information, and adding the text of the entry component into the context information;

preprocessing the context text, sorting the context text from top to bottom after word segmentation and word deactivation, and generating a text vector;

and training the subfunction classifier by taking the text vector of the interface corresponding to each subfunction as a positive set of a training set and the text information of other interfaces as a negative set of the training set to obtain the subfunction classifier which completes training.

Further, the preference value of the user for the application a is expressed as:

Prefer _a ＝λ ₁ PR _time +λ ₂ PR _position +λ ₃ SIM _func ，λ ₁ +λ ₂ +λ ₃ ＝1

among them, preferer _a Representing a user's preference value, PR, for application a _time Probability of using the application for the user in the current time period, PR _position Probability of using this type of application for the user of the current location, SIM _func Functional similarity for applications and the same type of application used by the user; lambda [ alpha ] ₁ 、λ ₂ 、λ ₃ Are each PR _time 、PR _position 、SIM _func The influence factor of (c).

Further, the SIM with the same application type used by the user _func Expressed as:

wherein n is the number of applications of the same type used by the user,

representing the application a and some application b of the same type used by the user _i The same functional ratio of (a) to (b),

for application a and application b _i Total number of interfaces corresponding to the same sub-function, SUM _ UI _a The total number of interfaces to application a.

Further, the training process of the reliability classifier comprises:

crawling popularity data of malicious applications in different application stores as a positive set of a training set, and crawling popularity data of benign applications as a negative set of the training set;

extracting comment data of each application in the training set, and acquiring comment user ID, comment content, score and comment time information from the comment data;

acquiring popularity data of each application store, extracting features, and extracting 5-dimensional features including a grading change vector, a ranking change vector, a good-comment quantity change vector, the same comment content ratio and a similar comment content ratio;

performing cross-store feature extraction by combining popularity data applied to different application stores, and extracting features with 5 dimensions including application score change rate deviation values in different stores, application ranking change rate deviation values in different stores, application comment quantity change rate deviation values in different stores, the same comment content ratio and the similar comment content ratio;

and taking the acquired 10-dimensional features as input of a reliability classifier, and selecting a proper classification algorithm according to the accuracy and recall rate of the classification result of the reliability classifier to obtain the trained reliability classifier.

Further, when the reliability classifier performs reliability classification on the data, the probabilities that the data are reliable data and unreliable data are obtained respectively, a user sets a reliable threshold and an unreliable threshold, when the reliable probability is greater than the set reliable threshold and the unreliable probability is less than the set unreliable threshold, the data are set as reliable data, and the data that the reliable probability is less than the set reliable threshold and the unreliable probability is greater than the set unreliable threshold are set as unreliable data; otherwise, setting the data as suspicious data, and when the weight is distributed to each kind of data, enabling the weight of reliable data > the weight of suspicious data > the weight of unreliable data; in a preferred embodiment, the weights assigned to the reliable, suspicious and unreliable levels are 1, 0.5 and 0, respectively, and in a more preferred embodiment, the average value of the unreliable probabilities determined to be reliable data is used as a fluctuation value, a value is obtained between the fluctuation values by a random function, and the weight of the suspicious data is set to be 1

Is taken as

The average value of the unreliable probabilities determined as reliable data or the median of the unreliable probabilities determined as reliable data; as a more preferable mode, the weight of the unreliable data may be set to a negative number.

Further, according to the popularity data corresponding to the current application sample and the corresponding weight, the popularity of the current sample is calculated, and the popularity is expressed as:

wherein, POP _A Represents the popularity of application a; beta is a beta _m Expressed as the mth popularity Data _m (e.g., a comment) assigned a reliability weight, and the Rank function calculates the Rank of that type of popularity data in the same preference value application, λ _k The influence factors of various types of popularity data on the popularity value, such as the types of download amount, scores, comments and the like, are used.

The invention also provides a mobile application recommendation device fusing multi-source reliable information, which comprises an input module, an application function program extraction module, a data reliability measurement module and an application recommendation module, wherein the input module comprises a mobile program unit and an extended data unit, the application function program extraction module comprises an application analysis module, a context information extraction module and a function classification module, the data reliability measurement module comprises a multi-data collection module, a reliability analysis module and a popularity calculation module, and the mobile application recommendation device comprises:

a mobile program unit for storing application samples;

the system comprises an extended data unit, a data processing unit and a data processing unit, wherein the extended data unit is used for acquiring application description information, application categories and popularity information of an application sample, and the popularity information comprises user comment information, scoring information and ranking information;

the application analysis module is used for applying function information used by each interface of the application sample and application authority information corresponding to the function;

the context information extraction module is used for acquiring context information of the application sample;

the function classification module is used for acquiring subfunctions of the application samples;

the mobile application store comprises a multi-metadata collection module, a database module and a database module, wherein the multi-metadata collection module is used for crawling popularity data of different application types in different time periods in different mobile application stores;

the reliability analysis module is used for measuring the reliability of the related data according to the popularity data crawled by the multi-element data collection module;

the popularity calculating module is used for calculating the popularity of each application according to the popularity data and the reliability;

and the application recommending module is used for acquiring comprehensive ranking according to the popularity of the applications and the preference of the user to each type of application and recommending the N applications with the top ranking to the user.

According to the method, a natural language processing technology and a program analysis technology are combined, on one hand, the sub-functions corresponding to each interface in the application are automatically deduced by analyzing application context information and function information, and the situation information with insufficient description is perfected; on the other hand, the method introduces the reliability measurement of the popularity information, and reduces the influence of false information on the recommendation result sequencing; in conclusion, the reliability and the integrity of the input data of the relevant recommendation model are improved, so that mobile application recommendation can be performed more accurately and reliably.

Drawings

FIG. 1 is a schematic flow chart of a mobile application recommendation method fusing multi-source reliable information according to the present invention;

FIG. 2 is a schematic structural diagram of a mobile application recommendation device fusing multi-source reliable information according to the present invention;

FIG. 3 is a schematic diagram of a calculation preference value in the mobile application recommendation method fusing multi-source reliable information according to the present invention;

FIG. 4 is a schematic diagram of a calculation preference value in the mobile application recommendation method fusing multi-source reliable information according to the present invention;

FIG. 5 is a schematic diagram of a training process of a reliability classifier in the mobile application recommendation method fusing multi-source reliable information according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a mobile application recommendation method fusing multi-source reliable information, which specifically comprises the following steps:

acquiring a mobile application sample and corresponding extension data thereof;

taking the expansion data corresponding to the application sample as an input sub-function classifier, wherein the output of the sub-function classifier is the sub-function name corresponding to each section of the current sample and the interface number corresponding to the name change;

taking the output of the sub-function classifier and the time information and the position information of the application corresponding to the use of the user as input to obtain a preference value of the user for the application;

inputting the application name and package name information of the application sample into a crawler, and acquiring popularity data of the application sample in a third-party application store and an ASO detection platform through the crawler;

inputting the popularity data into a reliability classifier to perform reliability prediction, wherein the prediction result comprises three levels of reliability, suspicion and unreliability, and the weights distributed by the three levels are 1, 0.5 and 0 respectively;

calculating the popularity of the current sample according to the popularity data corresponding to the current application sample and the corresponding weight;

and comprehensively sorting according to the preference value of the user and the popularity of the applications, and recommending the top N sorted applications to the user.

The structure of the model adopted by the invention or the mobile application recommendation device fusing multi-source reliable information is shown in figure 1, and the model can be divided into 4 modules according to core functions: the device comprises an input module, an application function extraction module, a data reliability measurement module and an application recommendation module. The main functions of each module comprise:

1) the input module comprises a plurality of mobile application samples and corresponding extension data. The extension data comprises application metadata such as application description information and application categories, and application popularity data such as user comment information, scoring information and ranking information.

2) The application function extraction module firstly utilizes the application analysis module and the context information extraction module to obtain function information, permission information and context information used by each interface of the application. Then, based on the trained classifier, the relevant data is input to obtain all sub-functions contained in the application. The division of the subfunctions comprises two levels, wherein the first level is the application category, and the second level is the name of the subfunction. For example, the sub-function "game-pay" means that the in-application pay function is included in the game-class application "angry birds".

3) The data reliability measurement module is mainly oriented to application popularity data, and comprises application scoring, application ranking and application comments. Firstly, based on application names and application package name information, various types of popularity data applied to different application stores within a certain time (for example, 3 months) are acquired from domestic mainstream mobile application stores (for example, pea pods, Huashi application stores, application treasures and the like) and ASO monitoring platforms (for example, Kuwa, cicada majors and the like) by using a multi-source data collecting module. And then, measuring the reliability of the related data by using a reliability analysis module, and classifying the data into three grades of reliable, suspicious and unreliable. And finally, inputting the data into a popularity calculation module to obtain the popularity value of the application.

4) And the application recommendation module inputs the processed extension data by using the existing recommendation model based on the extension information to obtain the preference value of the user for the application.

The flow of the mobile application recommendation method fusing multi-source reliable information for recommendation is shown in fig. 2, and specifically comprises the following steps:

s1, inputting: and inputting the applied sample file and the corresponding extension data.

S2, application analysis: and statically analyzing the application sample by using an application reverse tool, and extracting function information used by each application interface and application authority information corresponding to the function from the application sample.

S3, extracting context information: and automatically running the application, traversing the application interfaces, acquiring XML format layout information of each application interface, and extracting a 'Text' attribute value from the XML format layout information as context information of the current interface. In addition, for the interface using the picture type assembly, screenshot is performed in the traversal process, text recognition is performed by using an OCR technology, and characters are extracted to serve as context information of the current interface.

S4, sub-function classification: and inputting the data of S2 and S3 into the trained subfunction classifier to obtain subfunctions corresponding to each currently applied interface, and outputting the subfunctions according to a two-stage division mode. The output result comprises two columns, the first column is the name of the subfunction, and the second column is the number of interfaces corresponding to the subfunction. For example, "game-pay, 2" refers to a current game-like application, containing two different in-application pay-function interfaces.

S5, multi-source data acquisition: the method comprises the steps of inputting application name and package name information of an application into a crawler module, crawling popularity data change conditions of the application from a third-party mobile application store and an ASO monitoring platform which are mainstream in China, and setting a collection cycle range according to actual requirements, for example, collecting popularity data of the application in nearly 3 months.

S6, reliability analysis: and inputting the collected multi-source data into a trained reliability classifier, outputting a reliability prediction result of each piece of data, dividing the reliability prediction result into a reliable level, a suspicious level and an unreliable level, and correspondingly distributing weights 1, 0.5 and 0.

S7, popularity calculation: based on the output data of S6, calculating the popularity value POP of the application, wherein the calculation formula is as follows:

wherein, beta _m Data representing popularity _m Assigned reliability weight, λ _k The influence parameters of various types of popularity data on the popularity value can be distributed evenly or can be configured manually by an application store or a user.

S8, calculating a preference value: the output data of S4 is input as a part of the context log by using the existing recommendation model based on the extended information, and the preference value of the user for the application is obtained by combining with other model input data (for example, time and location data of the application used by the user). And then, based on the popularity value calculated in the step S7, comprehensively sorting the related applications according to the preference value and the popularity value, and recommending the applications for the user.

The training process of the classifier is shown in fig. 3, and specifically includes the following steps:

s41, training set construction: and constructing a training set in a manual marking mode. For different classes of mobile applications, m applications (e.g., 50) are selected as training sets, respectively. And manually trying all the applications and traversing all the interfaces of the applications, and labeling the functions of each interface. For example, interface 1(Activity name: LoginActivity) of application A is a login function.

S42, function and authority information extraction: and reversing the applications in the training set, and extracting the function name used under each interface (Activity) and the authority name applied. Meanwhile, according to the callback function used by the system component, the entry point of each interface, namely the ID of the system component, is positioned.

S43, extracting context information: and (3) dynamically running the application by using an automatic testing tool such as UIAutomator and the like, traversing all interfaces of the application, and acquiring Activity names of all the interfaces and context text information contained in the Activity names. Meanwhile, the text of the entry component is also added to the context information according to the entry point of the interface acquired in S42.

S44, text preprocessing: and preprocessing the acquired context information, and performing word segmentation and word stop by using a third-party library such as jieba. And sequencing the context information from top to bottom to generate a text vector.

S45, a function classifier: for each sub-function in each type of application, the corresponding interface information (i.e., the output information of S42 and S44) is used as a positive set, and the other interface information is used as a negative set, and a machine learning algorithm (e.g., SVM binary algorithm) is used to train the classifier.

In this embodiment, the preference value of the user for the application a is expressed as:

among them, preferer _a Representing a user's preference value, PR, for application a _time Probability of using this type of application for the user in the current time period, PR _position Probability of using this type of application for the user of the current location, SIM _fumc Functional similarity for applications and the same type of application used by the user; lambda [ alpha ] ₁ 、λ ₂ 、λ ₃ Are each PR _time 、PR _position 、SIM _fumc When the influence factor is set, the value of one influence factor may be set to 1, which means that only the current influence factor is calculated.

SIM (subscriber identity Module) with function similarity of application and same type of application used by user _fumc Expressed as:

wherein n is the number of applications of the same type used by the user,

The training process of the step S6 on the reliability classifier in the invention is shown in fig. 4, and specifically includes the following steps:

s61, training set construction: considering that a published ranking fraud application data set is not used for training at home and abroad temporarily, the method is based on the assumption that most of popularity data (such as good scores and high scores) of malicious applications are false and manipulated data, and crawls popularity data of related malicious applications in different application stores from a malicious application knowledge base disclosed by Androo Zoo and the like to be used as a positive set of a training set, and crawls popularity data of benign applications to be used as a negative set of the training set.

S62, data extraction: and extracting the comment data, and acquiring comment user ID, comment content, score and comment time information from the comment data.

S63, in-store feature extraction: and respectively extracting features of the popularity data applied to each application store, and extracting 5 dimensions of features including a grading variation vector, a ranking variation vector, a good comment quantity variation vector, the same comment content ratio and a similar comment content ratio.

S64, cross-store feature extraction: and performing cross-store feature extraction by combining popularity data applied to different application stores, and extracting features including application score change rate deviation values in different stores, application ranking change rate deviation values in different stores, application comment quantity change rate deviation values in different stores, the same comment content ratio and the similar comment content ratio for 5 dimensions.

S65, classifier training: the 10-dimensional features extracted in S63 and S64 are input, a machine learning algorithm (such as an SVM classification algorithm, a random forest algorithm and the like) is used for classifier training, and a classification model more suitable for current data is selected according to the accuracy and recall rate of classification results.

a mobile program unit for storing application samples;

the reliability analysis module is used for measuring the reliability of the related data according to the popularity data crawled by the multi-metadata collection module;

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A mobile application recommendation method fusing multi-source reliable information is characterized by specifically comprising the following steps:

acquiring a mobile application sample and corresponding extension data thereof;

2. The method for recommending mobile applications by fusing multi-source reliable information according to claim 1, wherein the obtaining process of the sub-function classifier comprises:

3. The method for recommending mobile applications by fusing multi-source reliable information according to claim 1, wherein the preference value of the user for the application a is represented as:

among them, preferer _a Representing a user's preference value, PR, for application a _time Probability of using this type of application for the user in the current time period, PR _position Probability of using this type of application for the user of the current location, SIM _func Functional similarity for applications and the same type of application used by the user; lambda [ alpha ] ₁ 、λ ₂ 、λ ₃ Are each PR _time 、PR _position 、SIM _func The influence factor of (c).

4. The method of claim 3, wherein the SIM is a functional similarity between the application and the same type of application used by the user _func Expressed as:

wherein n is the number of applications of the same type used by the user,

5. The method for recommending mobile applications by fusing multi-source reliable information according to claim 1, wherein the training process of the reliability classifier comprises:

acquiring popularity data of each application store, extracting characteristics, and extracting 5 dimensionalities of characteristics including a grading change vector, a ranking change vector, a good comment quantity change vector, the same comment content ratio and a similar comment content ratio;

performing cross-store feature extraction by combining popularity data applied to different application stores, and extracting features including application score change rate deviation values in different stores, application ranking change rate deviation values in different stores, application comment quantity change rate deviation values in different stores, the same comment content ratio and the similar comment content ratio for 5 dimensions;

6. The method for recommending mobile applications by fusing multi-source reliable information according to claim 5, wherein when the reliability classifier performs reliability classification on the data, the probabilities that the data are reliable data and unreliable data are obtained respectively, a user sets a reliable threshold and an unreliable threshold, when the reliable probability is greater than the set reliable threshold and the unreliable probability is less than the set unreliable threshold, the data with the reliable probability less than the set reliable threshold and the unreliable probability greater than the set unreliable threshold is set as reliable data, and the data with the reliable probability less than the set reliable threshold and the unreliable probability greater than the set unreliable threshold is set as unreliable data; otherwise, setting the data as suspicious data, and when the weight is distributed to each kind of data, enabling the weight of reliable data > the weight of suspicious data > the weight of unreliable data.

7. The method for recommending mobile applications by fusing multi-source reliable information according to claim 1, wherein the popularity of the current sample is calculated according to the popularity data corresponding to the current application sample and the corresponding weight thereof, and is represented as:

POP _A ＝∑ _k λ _k Rank(∑β _m Data _m )，λ ₁ +λ ₂ +…+λ _k ＝1

wherein, POP _A Represents the popularity of application a; beta is a _m Expressed as the mth popularity Data _m The distributed reliability weight and the Rank function are used for calculating the Rank of the popularity data of the type in the application of the same preference value, namely lambda _k The influence factor of each type of popularity data on the popularity value is shown.

8. The utility model provides a fuse mobile application recommendation device of multisource reliable information, its characterized in that, includes input module, application function program extraction module, data reliability measurement module and application recommendation module, and input module includes mobile program unit and extension data unit, and application function program extraction module includes that application analyzes module, context information extraction module and function classification module, and data reliability measurement module includes that the multivariate data gathers module, reliability analysis module and popularity calculation module, wherein:

a mobile program unit for storing application samples;

the multi-metadata collection module is used for crawling popularity data of different application types in different mobile application stores in different time periods;

9. The device of claim 8, wherein the function classification module is a sub-function classifier, and the sub-function classifier is obtained by:

for different types of mobile applications, respectively selecting m applications, respectively manually trying each application and traversing all interfaces for use, and labeling the function of each interface;

acquiring a function name used under each interface and an applied authority name, and positioning an entry point of each node, namely an ID of an entry component according to a callback function used by a system component;

dynamically running the application, traversing all interfaces of the application, acquiring Activity names of all the interfaces and contained text information as context information, and adding texts of system components into the context information;

10. The device of claim 8, wherein the training process of the reliability classifier comprises:

and taking the acquired 10-dimensional features as input of a reliability classifier, and training by taking the accuracy and recall rate of the classification result of the reliability classifier as a standard for measuring the reliability classifier to obtain the trained reliability classifier.